Home | Must See HTTP:// | The Daily Brad | About Brad | The Cute List | Other Words | Colophon |

Thursday, September 25, 2003

The weight of spam

The weight of spam: 71,120 spam messagesAs of this writing, I've received 71,120 unsolicited commercial e-mail messages in 2003, and that count is growing by approximately 500 messages per day. Keep in mind that I'm only tracking mail on my primary, personal e-mail account; the total would be a bit higher were I tracking receipts on all domains and addresses which I manage. (Some poor sots get 3,000 spam messages every day.)

Since January 1, I've not only been counting my spam receipts, I've been saving them. Right now, I've got an archive totalling some 297 megabtyes of crap. My original thought was that the aggregate file would be handy to use as fodder for a spam filter, as a means of training it to recognize the dreck. But, truly, SpamAssassin has been doing a bang-up job for me since I began using it, and I can't fathom that feeding it 71,000 junk messages is going to make it appreciatively better.

So, what should I do with all this spam? Come the end of the year, I'm likely to have 100,000 messages or more. Are there any fun analyses you'd like me to do? Any word frequency charts? Weird pattern searches? I'm open to suggestions (and have provided a comments link for this post), but keep in mind that my programming chops are pretty meager, so if you can't explain how to do what you're proposing in BBEdit or a few lines of script, it's probably beyond my ken.
Posted by Brad on September 25, 2003 at 1:39 PM | (5) |
Categories: Spam

Comments:

What's your spam-to-virus ratio these days? Does that say anything about the economics of spam?

What's the proportion where you'd say "no WAY would anyone ever BUY this"? Has that proportion gone up recently? Does that say anything about the economics of spam?


Comment by Hugh  on  September 25, 2003  at  6:57 PM

Of course, there is the trivially easy "How long and/or thick would Little Brad be if all of the manhood-enhancement spam were answered and actually worked"...but you might have done that analysis already. grin


Comment by Chris Talbott  on  September 26, 2003  at  9:32 AM

I think Chris has pegged it. That's the data inquiring minds are clamouring for.


Comment by Dinah  on  September 26, 2003  at  12:16 PM

I'm the original developer of SpamAssassin (now just one of several). And I must say, wow, that's a lot of spam wink

Have you got an always-on UNIX machine handy, with plenty of CPU time free? And do you use a sensible mail app, like pretty much anything apart from Outlook? Because a great way to give back to SA, especially with a collection of mail like that, is to run "mass-check" over it and help out with our rescoring runs. it's pretty easy -- but very UNIX-based.

The bonus, of course, is that SA optimises itself to handle your mail wink

It's more-or-less impossible to figure out what your mail might be about -- pretty much the only thing made visible in the uploaded results, is the names of your mail folders.

More details are at:
http://spamassassin.org/dist/masses/README
http://spamassassin.org/dist/masses/CORPUS_SUBMIT

we also do nightly runs as well -- ie. you set up a crontab to run SA over your mail collection once per night, and it uploads the results for ongoing rule QA...

http://spamassassin.org/dist/masses/CORPUS_SUBMIT_NIGHTLY

cheers!


Comment by Justin  on  September 26, 2003  at  6:43 PM

The only spam I seem to get is either from someone trying to sell me a new mortgages or a penis enlargement. I can't help but wonder whether one could finance a penis enlargement with a second mortgage?

Would your programming skills allow you to extract from your large spam collection additional syngergies that today's e-marketers haven't yet thought of?


Comment by Gregg  on  October 26, 2003  at  11:35 AM

Page 1 of 1 pages

Next entry: Safari Enhancer

Previous entry: G5 in da hizzouse!

<< Back to Must See HTTP://