You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

Experimental and Theoretical ways to get rid of Spam

This page is for the development of ideas and projects to move SpamAssassin and related spam fighting projects forward. The idea here is to write about ideas and projects the represent the cutting edge ideas that are in development related to spam fighting.

New Black/White/Yellow List Technologies

This article describes experimental technologies used at [http://www.junkemailfilter.com Junk Email Filter]. I (Marc Perkel) am writing this in the hopes that other people will pick up on these ideas and improve them. Although it is working very well for us so far, it can be improved and expanded. It is my hope that this will inspire others to work on this and make it far better than what we have developed. Information about our Host Karma lists can be found [http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists here]

We are all familiar with black lists. Black lists are DNS lists that where IP addresses of spammer are looked up. You send a request to the list with the IP and if it is in the blacklist then you bounce the email. Sounds simple enough except for the false positives.

White lists help reduce false positives. If an IP address is white listed then you can just pass it and not look at black lists.

Yellow lists are lists of hosts that send a mix of spam and non-spam. Email services like Yahoo, Gmail, and Hotmail are examples of mixed source servers. These servers should never be either black or white listed. These are servers where the IP address yields no useful information as to whether the message is spam or ham.

Multi-Color List Processing Logic

When looking at lists from a black/white/yellow perspective there is an order to the list processing. First the yellow lists are tested. If the message is yellow listed then checking black and white lists isn't necessary. The the white lists are checked. If the message is white listed it can be passed without having to run it through SA or SA can short circuit the tests and declare the message ham. Then the black lists are checked and if it is in a trusted blacklist or several less trusted blacklists the message can be rejected.

Reducing Lookups by using multiple return codes to indicate the result

Many DNS lists return and code to indicate yes and nothing to indicate no. This was fine for only black list lookups but if you are looking up multiple states you would have to make a separate DNS call for each one.

 yellowlist.junkemailfilter.com
 whitelist.junkemailfilter.com
 blacklist.junkemailfilter.com

But why do three lookups when you can do one? That's the way the HostKarma list works. It returns a different value to indicate black/white/yellow.

 127.0.0.1 white
 127.0.0.2 black
 127.0.0.3 yellow

Forward Confirmed rDNS (FCrDNS)

Forward Confirmed reverse DNS is an important concept for the ideas I'm about to introduce below. One of the ways to separate spam from ham is to find things that spammers can't spoof. One of those things is forward confirmed rDNS.

Reverse DNS is straight forward. An IP has a PTR record so that when you look up the IP it returns a name associated with the IP. Unfortunately a spammer can put any name they want in a PTR record. But when that name is looked up, it points somewhere else.

Forward Confirmed rDNS means that when the name returned by a PTR lookup is checked it will point back to the original IP address that was looked up.

 1.2.3.4 -> mail.mydomain.com
 mail.mydomain.com -> 1.2.3.4

Spammers can't spoof this because even though PTR can be faked the spammer can't create an A record to point back to the original IP because they don't have control over the faked domain.

FCrDNS is very reliable for detecting white and yellow domains. It can also be used for black domains but generally spammers aren't that stable. So this is mostly for actively detecting ham and avoiding false positives than detecting spam.

Is your host name DNS set up correctly? Here's a [http://ipadmin.junkemailfilter.com/rdns.php FCrDNS checking tool] to test it.

Host Name based Lists based on FCrDNS host names

In addition to IP based black/white/yellow lists the [http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists HostKarma lists] also contain host names that are also listed. A host name lookup would look like this:

 dig mydomain.com.hostkarma.junkemailfilter.com

The FCrDNS of the sending host name (which can't be spoofed) is looked up in the same database that is used for IP based lookups. The Junk Email Filter HostKarma DNS list supports name based lookups as well as IP based lookups. If the lookup succeeds the IP base tests need not be done. The name based test are actually more powerful than IP based tests for white and yellow listed servers where IP based lookups are better for black listed hosts.

For example, if the FCrDNS of the sending host resolves to yahoo.com then no other DNS tests need to be done. Yahoo is neither a certified spam or ham domain and once the name returns yellow no other lookups need be done. The message then can go to content testing to figure out if it is spam or ham. If you get a message from your bank, like Wells Fargo Bank, and you see that the sending host is mail.wellsfargo.com, then it will look up as white listed and the message can be declared ham without any other testing.

Using name base lookups to build IP based lookup lists

The white and yellow lists at HostKarma IP entries are driven by the name based lookups. A message is received and the host is looked up and verified. The host name is then looked up and if the name is found the IP address is added to the list. Thus if the name is whitelisted the IP address is sent to the DNS server and it is added as a white listed IP. That make it available to the world for those admins who can only do IP based lookups. The advantage of this is that one does not need to know the IP addresses of all of yahoo's servers. These lists can be created dynamically as Yahoo sends out email.

Expanding the Project

If this system were implemented on a more massive scale then it would be both more accurate and more comprehensive. Although black lists are more dynamic because spammers are constantly on the move, white and yellow lists are very stable. It would be fairly easy to accurately list 95% of the world's servers that should be either white or yellow listed. If this were done then it would greatly reduce false positives and drastically reduce IP lookups to black lists and cut processing time for good email. I am therefore inviting the smart people in the spam filtering community to pick up on the idea that works great for us and make it massively bigger that it is.

Here's what needs to be done.

# Need to massively expand the list of names for white, black, and yellow hosts. With enough people participating this could be automated. Hosts that send only spam has they host name black listed. Hosts that sent only hame have their host name white listed. Hosts that send a mixture has their host names yellow listed.

# Once we have a good name based list then through automation of a few big players once a name is found then the IP associated with that name is automatically added to the IP lists. So whenever the name Yahoo.com is confirmed as yellow listed the IP address is sent in to be yellow listed as well. Thus all IPs for Yahoo servers would be listed with enough participants and the list would follow IP changes Yahoo makes.

  • No labels