Design Considerations for a High Volume Mail Server

These notes (originated by Craig Mattson), are based upon his experience supporting large mailing lists with up to a few million subscribers.

The purpose of this page is to discussion the issues of supporting large mailing lists and high volume delivery in an industrial strength mail server, in hopes that the functionality can be designed the right way from the get-go. As a side note, one of the requirements for Apache to begin using James as its mail server include efficient outbound email. Large mailing list support is a much harder problem, so if these design considerations can be resolved, efficient outbound email in general should be well-supported.

Efficiency at all levels becomes important when you deal with very large mailing lists. Typically, the first problem you'll see is that when you want to send a message and it has to go out to let's say a million people, that hits your machine pretty hard. Therefore the first three key pieces of functionality include:

    Once you solve the raw delivery problem, the next thing that you notice is that deliveries to some domains are bouncing . Let's say you have 100,000 AOL subscribers on your list, and AOL's mail servers are down (this has been known to happen!) All of the sudden, you're dealing with bounces like crazy, and if your delivery machines aren't set up to handle it, you're dead. Therefore you must be able to:

It's sad but true that many SMTP servers out there don't conform properly to all of the standards. So you have to be able to account for all kinds of wierd things (like being unable to specify multiple recipients for a message, etc...). Therefore, it is best right from the start to be able to:

In order to be able to monitor what's going on so that you can configure these elements properly, you must:

Next is a set of features that corporate users will start requesting when they begin using your mailing list manager as a CRM (Customer Relationship Management) tool. Invariably, they will want to store their subscribers in a relational database. They will have lots more information on each subscriber than just an email address, and they will want to be able to use that information to (1) select "target groups" of subscribers to receive mailings, and (2) personalize mail messages. On the personalization point in particular, many personalizations will be sophisticated enough that you will want to acutally embed some kind of code in the email message, to be processed "just-in-time" (that is, right before delivery), where "variables" in this code are substituted by information that comes out of the subscriber database. This would be similar to JSP on the web, except for email. Logging delivery histories and MLM requests in this environment becomes important.

There is more to say, but this is getting pretty extensive already. A product with this level of functionality is sorely missing in the open source community. James is an interesting possibility, because its inherently multithreaded architecture could be used to deal well with many of these problems.

JamesV3/HighVolume (last edited 2009-09-20 22:58:33 by localhost)