E-mail spam is a huge problem. As such, it is desirable to filter out these e-mails and either discard or reject spam.

Techniques

There are various places this spam filtering can be put:

  • When receiving message via SMTP.
  • After receiving message via SMTP.

Spam filtering can be done one a global basis or per user basis.

A spam filter can also conduct extra checks. This could mean, for example, performing a virus scan on the email.

When a spam is identified, there are several actions that can be taken:

  • Reject the spam in the SMTP session.
  • Send (bounce) an error message to the sender. Note the sender's address could be forged.
  • Send an error message to the recipient. Note this will mean the recipient still receives the spam.
  • Change the message in somewhere, e.g. by changing the subject line or by adding a header, so the recipient will know it might be spam. Note that this means the recipient still receives the spam.
  • Save the message in quarantine for later checking.
  • Discard the message.

There are many solutions to this problem. For example:

Examples

Blocking SPAM at SMTP level.

Controversies

Unfortunately, most solutions involve tradeoffs and are often very controversial as a result. Some side effects of implementing these solutions may include:

  • If spam is is detected and the email is bounced back to the sender, the sender's address is probably forged. This means random users will get spammed by these annoying bounce messages.
  • If spam is detected, but it is too late to reject the mail in the SMTP session, then an alternative approach would be to store the spam in a special spam folder, for later manual checking. Unfortunately, this job is often forgotten. As a result, ham that is incorrectly detected as spam can fall into a black hole, never to be seen again, without any errors.
  • Some people argue as a result that all spam checking has to be done in the SMTP session, where it can be rejected to the sender instead of relying on the sender's address^. Unfortunately, this means the checks have to be quick and use little memory, or there exists the risk that the software will fail under load^.
  • Some methods will slow down mail delivery, e.g. grey filtering.
  • Some methods make fundemental changes to the SMTP protocol, resulting in mail getting rejected that shouldn't be (e.g. grey listing or SPF checks). The counter argument is that any rejections will result in the sender being notified, and mail won't go down a black hole.
  • Spam filters still have room for improvement. Also see Slashdot article.
  • Some rule based systems need to be constantly updated, such as SpamAssassin. It has been argued that spammers will check with many common programs first to ensure their spam will be classified as ham. As such, it is argued that such checks are useless, and only increase the chance of detecting ham as spam.
  • Email callbacks: problems and workarounds.
  • Some methods may initially seem simple, but require extra infrastructure. For example:
    • SPF may require implementation of SMTP authentication where it wasn't previously required.
    • SPF may require Sender Rewriting Scheme (SRS) for mail forwarding.