CeasNotesJustin

Writing from CEAS with quick notes on each talk. Proceedings are at http://www.ceas.cc/papers-2004/papersbytopic.htm .

All ports except 80 and 443 are blocked! Very annoying

Chung-Kwei:

Teiresias is IBM's pattern-discovery tool from bioinformatics
looks *directly* transferrable to SpamAssassin's "regexp rules" approach
probably heavily patented and hard to license though
but a Google search for "pattern discovery algorithm" looks like a promising source

Social network talk:

pretty useless spamfiltering-wise at least; not any spam orientation at all

Joshua Goodman Received talk:

talking about parsing Received lines
basically reimplementing spamcop algorithm
looking for "last external IP address"
thinks this will be useful for SenderID
SenderID example uses HELO data, looks like, instead of PRA or SMTP MAIL FROM; due to multiple intervening hops
try to use heuristics to find last external IP address:
- using MX data fails due to load-balancing edge router
- also the msn.com/hotmail.com problem
proposed algo:
- IP addr is 192.168
- HELO matches user's domain and forward DNS lookup of HELO matches IP address
- find an IP that matches MX record, next is external
Bob Atk suggested putting external IP addrs in a DNS record?!
interesting that they'd never checked SpamAssassin or Spamcop's algorithms, but that's MS for you

Brett Watson: beyond identity: problems even with sender id

economics of whitelisting/blacklisting based on a reliable sender identification (ie. forging is no longer possible)
mostly a philsophical talk

Multiple email addresses:

about 50% of surveyed users had multiple email addresses
"identities"; separation of work, personal, social groups; pseudo-anonymity; affiliation, status, prestige (alumni accts)
mobility (available on the road)
people now frequently have different "role" accounts
typically once people go over 3 accts, they set them up to forward to a smaller number
20-30% of all email addrs change annually
this talk is really oriented towards MUA UI developers
another talk with not a whole lot of antispam relevance

Panel discussion of monetary spam filtering:

Cynthia Dwork's talk:
- 16 seconds per message computation time doubles spam cost
- 56 seconds per message " means $36 per message for spammers
- cycle theft arguments (zombies are illegal; spyware can be combatted with user @+ education) *already don't work* in the real world
MailFrontier:
- some kind of marketroid noise about how they're "third generation" because they have grey areas, or something; combination of multiple tests means "definitely spam, no false positives". riiight
- "Reverse Turing Test": C-R as usual, with pictures of puppies
- except the C-R page has some kind of plugin which will burn CPU cycles instead, woo
The naysayer:
- http://www.cl.cam.ac.uk/~rnc1/
- going rate to solve puzzles is about $.11/hr in South India
- Real Money systems: people will regulate it; EU Directive on E-Money (2000/46/EC)
- people will walk away with 2.5% of it (cost of running + greed)
- people will steal it (e.g. sysadmin skimming x% of incoming mails and stealing their tokens)
- Payment systems: settlement: see taugh.com
- also compares with the telco system (~1200mill ham mails/day, ~2000mill phone calls per day) – much fewer calls on telco system, most local, diff trust model
- how much payment:
  - 30 responses per mill: .1c/mail mean $33 per sale to be viable
  - if .05c/mail, $16
  - at a 0.7% response rate, $33 profit means 23c/mail
questions:
- to Ironport: "why can't I nominate a charity?" to avoid interested parties
- Dan Kohn to Ironport: "how much bonds debited?" not very much
- question from an Indian querier: "any documented cases of South Indian kids clicking on CAPTCHAs?" MailFrontier guy, naturally, says "nope". In reality, the answer is "yes", but that was in Thailand
- Yahoo! guy on CAPTCHAs: "seen everything: porn sites, people paid to type them; sites in Russia with full pages of CAPTCHAs, 10 hour turnaround after a new fix is deployed"
- Vanquish guy says they use CMU's CAPTCHA code
- question on CPU time stamp inflation: Cynthia Dwork says "memory cycles much more stable over time"
- Daniel: annoyed about senders having to "prove they are real" when they're doing the recipient a favour: MailFrontier guy: "we just want the problem to go away"
- Dave Crocker: "why didn't anyone on the panel take any notice of the naysayer's presentation and its points?"
- panel: "but we have only 5 minutes!"
- Vanquish guy: "he doesn't understand how PKI works" (!!!) then some advertising for Vanquish (again)
- Ironport: "Bonded Sender is working right now"
- MailFrontiers guy: "mostly agreed with his presentation, but we'll do whatever works (titters from audience); C-R is an atomic bomb against spam, but with some collateral damage against ham, but it can be turned off"
- naysayer on pay-to-send: "not only is my machine insecure, my email is insecure, but I don't want my *money* to be insecure" (applause)
- panel mod: there will be coevolution between attacker and defender, a lesson from the Cold War

MailFrontier presentation: anatomy of a phishing email

Bank of America sends email from bankofamerica1.com, Sony from sonystar.com; this screws up the notion of a trusted domain name
the MSIE %00 vulnerability
high-numbered ports mean that websites can be run unnoticed, even if a HTTP server is already running
the fake address-bar window trick
fraudulent pop-ups over real site: goes to fraud site, create popup, go to fraud site: pop-ups are a phishing risk (yay!)
"your submitted information will be verified by eBay staff within 24 hours"; buys more time
A survey, based on results from over 83,450 respondents (subset of total responses), in diagnosing which sites were frauds and which were real:
- 26.7% got everything wrong
- only 13.8% of respondents got all correct
da.ru is a frequent hosting site for phishing scams
hasn't looked at the Active/X malware on the phishing sites, for some reason!
Consumer Reports sends from some domain called "d1sub.com"; Fortune 500's should really improve their practices
q: "are we getting to a stage where we won't be able to tell phish from ham?" a from audience: "use pine"
q: "why haven't the arrests of phishers been publicised better?" suggests including some support in web browser for a "trusted logos" area on-screen, for certifications
Dave Crocker: don't map to domain names, "domain names are not good enough, they do *not* map to trademarks".

Geoff Hulten, MS: Trends in Spam Products and Exploits

corpus analysis, from Hotmail's feedback loop
- volunteers classify random samples of their mail as spam or good; tens of thousands of hand-classified messages per day; large "unbiased" (???) sample of spam
additional analysis on two sets of spam:
- about a year between the two
- products sold, exploits used, trends
viagra types: 17% 2003, to 34% 2004
graphic porn down: 13% to 7%
exploits: increasing rapidly, 1.33 exploits 2003 to 1.73 in 2004
word obscuring: up to 20% in 2004
URL chaffing, adding good URLs to spam: not there in 2003, 10% in 2004 – anti-SURBL attack
Spammers are putting more work into each spam

Introducing the Enron Corpus:

1.3million messages originally; removed msgs with "integrity problems", replaced usernames etc
http://www-2.cs.cmu.edu/~enron
200,399 useful, non-dupe messages
158 messages, 1,268 msgs/user
missing message headers, so not much use for spam filtering; Exchange-mangled; no HTML. still, maybe good for "body" rules and FP avoidance
no mention how much of the corpus was spam

Larry Lessig:

extraordinary amount going to tech fixes; very little going to how the law could address it
compares govt attention to "pirate radio" creating static for large commercial stations, vs the spam problem
multiple types of regulators: the law, social norms, the market, and architecture (example: windows in lecture theatre are closed to enforce paying attention to speakers)
the law also regulates the other three
(that was the wrong talk! starts again!)
1. "regulation is always multiple modalities"
2. "interests will react"
3. "special interests defeat general interests"
in the old days, we had norms to defeat spam; that failed
using code to fix; so far that's failed
"the market will fix the problem"; ISPs trying to be the spam-free email provider; that's also failed
CAN-SPAM: totally failed – even displaced effective state legislation
not any single modality alone can fix it
regulation is a restriction, plus somebody to enforce it
CAN-SPAM: wanted truthful headers
opt-out doesn't provide any way for you to know if you've really been opted-out
enforcement: state AGs, ISPs, federal - centralised; too big though. they have better things to do with their time than bust spammers
solution: marries legal/architectural/market
legal: has two parts: (1) labels ("ADV" in the subject line)
(2) a bounty
(q: SEXUALLY-EXPLICIT tag is a label, already massively flouted by spammers. other labels would be flouted just as much.)
architecture: filter code then blocks mails with "ADV"
market: spammers would then have to incentivise people to receive their mail by sending offers they want (yeah right
enforcement: spam will only be sent if you can be paid, so "follow the money" – part of CAN-SPAM states "the business that benefits is responsible"
market in enforcement: bounty hunters who identify label-less spam (ah). amateurs, not law enforcement, large population
during CAN-SPAM development: labels were undesirable. Reason: "labels are too effective", because e.g. Amazon would have to have labelled their ads (because there was no distinction between opt-in and opt-out) and would be filtered
fundamental problem: corruption due to vested interests lobbying (cf CAN-SPAM)
sees difficulties in differentiating
q: tracing spam to the business that benefits often involves getting forwarding addresses from e.g. a CGI script running on a server in the Ukraine. *needs* law-enforcement power to get that IMO. a: "yes, and law-enforcement power is available, and jurisdiction problems are easy" (not sure about that! at least for the non-LE bounty-hunter case)
q: opt-in would have fixed it, like it has in Australia; but DMA keeps emasculating the laws into YOU-CAN-SPAM. a: agrees that there are multiple answers, but prefers not requiring opt-in across the board and uses the UCE definition as it allows political speech without adding to their costs. (I disagree, personally; the "UBE" definition works for me --jm)
Jon Praed: enforcement requires tremendous resources, and in some cases you've got to get to that IP address within 7 days to get those logs, with LE power. This is not easy. Notes that spammer margins are incredibly low, and those bounties as a result would be small and/or hard to get.
JP again: also suggests labels to label "good" commercial mail, personal mail, and then leave over "unknown" mail – which is then suspect. also suggests that the *headers* are the labelling, in reality.
q: "special interests always seem to wipe out general interest on this issue in laws. what can we do, law-wise?" "my brand is pessimism", "there was this moment, when they passed CAN-SPAM, when legislators were keen to fix it – then the special interests came in".
observation from audience: spots the parallel between UK and Pirate radio in the late 60's, which also passed a McCain anti-advertiser provision to deal with it.
Dave Crocker: believes that the suggestion would result in little real effect on spammers, and quite a heavy hit on legit businesses

Child pages

CeasNotesJustin