This CNet story, released just a couple weeks after we released our new spam filtering services, talks about the difficulty of "canning spam without eating up real mail." Though our system doesn't use anything as rudimentary as blocking entire Internet addresses or portions of the Internet, it is nevertheless, difficult to block only spam.
This may seem mind boggling, as most people have no problem telling the difference between an e-mail from their aunt or a customer from a viagra advertisement. To a certain extent, this is true. We generally don't have a problem. However, as introducing our new spam filtering solution has reinforced, not everybody has the same definition of spam.
To illustrate, I get movie show times from Amazon.com e-mailed to my account every thursday. I didn't ask for them, so they are -- technically -- unsolicited. Nevertheless, it saves me a trip to Yahoo movies every week and I'm glad to receive them. I know how Amazon got my e-mail address since I purchased music and books from their store in the past. To me, their mail, though unsolicited, is not spam.
Then, take someone else, someone who bought only one thing from Amazon several years ago, was dissatisfied with the experience, and returned the product. This person starts receiving unsolicited e-mail from Amazon.com. To make matters worse, she is simply not interested in dropping 10 dollars to see a movie in the theater. She is a big believer in the VCR. To her, the weekly movie show times from Amazon.com are spam.
That's just a simple example: it gets more complicated. A lot of "legitimate" companies have, in the wake of an ecomonic downturn, resorted to sending out spam. It's cheap. It's legal. Why not? So, we have to be careful to determine whether or not a message from a bank is an over draft notification sent to a current customer or an unsolicited advertisement for a free checking account. And we have to do it without ever seeing the message!
That's the really tricky part. As an Internet Service Provider (ISP), we want to protect our customers from the burden of spam without inconveniencing them or invading their privacy. The great thing about computers is that they make this possible. A computer can examine the mail and "forget" about it the moment it is done. In instances such as this, computers can actually provide a higher degree of privacy than would be available otherwise.
However, despite how stupid our computers can make us feel sometimes, they simply can't do anything they haven't been told to do. Their ability to learn is less than that of the family pet. We have to tell the computer what spam looks like. Actually, we have to tell them what spam looks like to us and the bulk of our customers. Furthermore, we can't tell the computer what spam looks like if we haven't seen it before ourselves.
In the end, even the most advanced spam filtering solutions are simply racing to keep up with the spammers. In our case, we've blocked about 45,000 messages in the past month with less than 10 reports of "false positives," messages we believed were spam but were, in fact, legitimate mail. For those without a calculator, that's .02%, and it's a number we can live with. The trick is going to be increasing the amount of spam caught without increasing the reports of false positives, canning more spam "without eating up real mail."