You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

Experimental and Theoretical ways to get rid of Spam

This page is for the development of ideas and projects to move SpamAssassin and related spam fighting projects forward. The idea here is to write about ideas and projects the represent the cutting edge ideas that are in development related to spam fighting.

New Black/White/Yellow List Technologies

This article describes experimental technologies used at [http://www.junkemailfilter.com Junk Email Filter]. I (Marc Perkel) am writing this in the hopes that other people will pick up on these ideas and improve them. Although it is working very well for us so far, it can be improved and expanded. It is my hope that this will inspire others to work on this and make it far better than what we have developed. Information about our Host Karma lists can be found [http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists here]

We are all familiar with black lists. Black lists are DNS lists that where IP addresses of spammer are looked up. You send a request to the list with the IP and if it is in the blacklist then you bounce the email. Sounds simple enough except for the false positives.

White lists help reduce false positives. If an IP address is white listed then you can just pass it and not look at black lists.

Yellow lists are lists of hosts that send a mix of spam and non-spam. Email services like Yahoo, Gmail, and Hotmail are examples of mixed source servers. These servers should never be either black or white listed. These are servers where the IP address yields no useful information as to whether the message is spam or ham.

Multi-Color List Processing Logic

When looking at lists from a black/white/yellow perspective there is an order to the list processing. First the yellow lists are tested. If the message is yellow listed then checking black and white lists isn't necessary. The the white lists are checked. If the message is white listed it can be passed without having to run it through SA or SA can short circuit the tests and declare the message ham. Then the black lists are checked and if it is in a trusted blacklist or several less trusted blacklists the message can be rejected.

Reducing Lookups by using multiple return codes to indicate the result

Many DNS lists return and code to indicate yes and nothing to indicate no. This was fine for only black list lookups but if you are looking up multiple states you would have to make a separate DNS call for each one.

 yellowlist.junkemailfilter.com
 whitelist.junkemailfilter.com
 blacklist.junkemailfilter.com

But why do three lookups when you can do one? That's the way the HostKarma list works. It returns a different value to indicate black/white/yellow.

 127.0.0.1 white
 127.0.0.2 black
 127.0.0.3 yellow

Forward Confirmed rDNS (FCrDNS)

Forward Confirmed reverse DNS is an important concept for the ideas I'm about to introduce below. One of the ways to separate spam from ham is to find things that spammers can't spoof. One of those things is forward confirmed rDNS.

Reverse DNS is straight forward. An IP has a PTR record so that when you look up the IP it returns a name associated with the IP. Unfortunately a spammer can put any name they want in a PTR record. But when that name is looked up, it points somewhere else.

Forward Confirmed rDNS means that when the name returned by a PTR lookup is checked it will point back to the original IP address that was looked up.

 1.2.3.4 -> mail.mydomain.com
 mail.mydomain.com -> 1.2.3.4

Spammers can't spoof this because even though PTR can be faked the spammer can't create an A record to point back to the original IP because they don't have control over the faked domain.

FCrDNS is very reliable for detecting white and yellow domains. It can also be used for black domains but generally spammers aren't that stable. So this is mostly for actively detecting ham and avoiding false positives than detecting spam.

Is your host name DNS set up correctly? Here's a [http://ipadmin.junkemailfilter.com/rdns.php FCrDNS checking tool] to test it.

Host Name based Lists based on FCrDNS host names

In addition to IP based black/white/yellow lists the [http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists HostKarma lists] also contain host names that are also listed. A host name lookup would look like this:

 dig mydomain.com.hostkarma.junkemailfilter.com

The FCrDNS of the sending host name (which can't be spoofed) is looked up in the same database that is used for IP based lookups. The Junk Email Filter HostKarma DNS list supports name based lookups as well as IP based lookups. If the lookup succeeds the IP base tests need not be done. The name based test are actually more powerful than IP based tests for white and yellow listed servers where IP based lookups are better for black listed hosts.

For example, if the FCrDNS of the sending host resolves to yahoo.com then no other DNS tests need to be done. Yahoo is neither a certified spam or ham domain and once the name returns yellow no other lookups need be done. The message then can go to content testing to figure out if it is spam or ham. If you get a message from your bank, like Wells Fargo Bank, and you see that the sending host is mail.wellsfargo.com, then it will look up as white listed and the message can be declared ham without any other testing.

Using name base lookups to build IP based lookup lists

The white and yellow lists at HostKarma IP entries are driven by the name based lookups. A message is received and the host is looked up and verified. The host name is then looked up and if the name is found the IP address is added to the list. Thus if the name is whitelisted the IP address is sent to the DNS server and it is added as a white listed IP. That make it available to the world for those admins who can only do IP based lookups. The advantage of this is that one does not need to know the IP addresses of all of yahoo's servers. These lists can be created dynamically as Yahoo sends out email.

Expanding the Project

If this system were implemented on a more massive scale then it would be both more accurate and more comprehensive. Although black lists are more dynamic because spammers are constantly on the move, white and yellow lists are very stable. It would be fairly easy to accurately list 95% of the world's servers that should be either white or yellow listed. If this were done then it would greatly reduce false positives and drastically reduce IP lookups to black lists and cut processing time for good email. I am therefore inviting the smart people in the spam filtering community to pick up on the idea that works great for us and make it massively bigger that it is.

Here's what needs to be done.

  1. Need to massively expand the list of names for white, black, and yellow hosts. With enough people participating this could be automated. Hosts that send only spam has they host name black listed. Hosts that sent only ham have their host name white listed. Hosts that send a mixture have their host names yellow listed.
    2. Once we have a good name based list then through automation of a few big players once a name is found then the IP associated with that name is automatically added to the IP lists. So whenever the name Yahoo.com is confirmed as yellow listed the IP address is sent in to be yellow listed as well. Thus all IPs for Yahoo servers would be listed with enough participants and the list would follow IP changes Yahoo makes.

Other Kinds of DNS Lists

There are a number of other DNS lists that would be useful for fighting spam besides just black/white/yellow lists. This DNS lists can be black lists for content strings that work the same way as URI blacklists. Or it can be lists that provide information about classifications of hosts that have a significant characteristic in detecting spam.

Registrar Barrier List

We at Junk Email Filter host a [http://wiki.junkemailfilter.com/index.php/Registrar_Barrier_DNS_List Registrar Barrier List]. This list returns a code to indicate where the registrar barrier is so you can separate the domain par out of the host name.

dig example.com.rb.junkemailfilter.com   - returns 127.0.0.1
dig example.co.uk.rb.junkemailfilter.com - returns 127.0.0.2

Freemail Domain List

This list would be a lookup to determine if the sending host is a free email provider like Yahoo, Gmail, or Hotmail. This information is often useful in determining how to process messages. Often, for example, phishers will have a different from address than the reply to because the from address is often shut down by the reply to address still works. When the from address and the reply to address doesn't match and both addresses are from freemail providers it's almost always a phishing scam. I intend to post a freemail DNS list soon.

Anti Spam Educational Video for End Users

This article by Marc Perkel.

Spam is about money. It is driven by people taking advantage of people who for some reason are fooled by spam. People who understand that all spam is fraud never fall for it. What I worked for the [http://www.eff.org Electronic Frontier Foundation] I remember one staff meeting we talked about a spammer who went to jail for fraud relating to selling penis enlargement products through spam. This guy had made millions by the time he went to jail. I joked that maybe I was on the wrong side of the spam wars. I also sometimes watch the TV show Judge Judy where someone gets a spam and gets a check in the mail and sends back the money before the check bounces.

People really do fall for this stuff and if these spammers weren't making money they wouldn't do it. The problem is driven by consumer ignorance. So how do we fix that? We educate the consumer.

I suggest the production of many YouTube grade videos in many languages that educate end users about the dangers of spam. One might start with a picture of Africa. The narrator says, "this is Africa. Nobody here is going to transfer 56 million dollars into your account." Then they put up a map of the UK and say, "This is the United Kingdom. There is no UK Lottery. And you have not won."

The videos would be fairly short, under 10 minutes, and after these videos are produced then we can encourage ISPs to require the user to watch a video as part of the process of signing up for a new account. The more people are educated about spam scams the less profitable spam becomes. And when spam is less profitable there will be less spam.

I'm looking for as many people as possible to just do it and post the videos. Maybe we can make a contest out of it and some of us in the spam filtering business can kick in some bucks (or euros) for a prize for the winners? I think a good set of educational videos about hot to not get ripped off could lead to a measurable reduction in spam.

  • No labels