whateverblog.
IMAP server progress
Friday, February 07, 2003 12:27 PM
I found out today that there are users out there that have in the hundreds of thousands of e-mails--perhaps even millions?--stored in a single IMAP account.

What the heck!

If I want to support those users it looks like I'll have to, at the very least, only hold metadata in memory for selected mailboxes. If a single mailbox has 1,000,000 records, a (very) rough estimate would be 40MB of RAM simply to hold the e-mail metadata in memory. Luckily, if ten users all have the same mailbox open, it only counts once, not ten times.

One encouraging point is that given a reasonably fast disk subsystem, the serialization/deserialization should not take terribly long. On an unloaded server, reading and deserializing 100,000 messages worth of metadata could take as little as 300ms using a midrange Pentium 4 with any modern ATA hard drive. (I've only tested so far on my laptop, whose hard drive is much slower than even the slowest of today's desktop drives.)

Furthermore, as expected, the metadata searching is blazingly fast. On a P4-M 2.0GHz, checking for the presence of 5 flags across 1,000,000 messages takes much less than 100ms. Unfortunately, doing a fulltext search against those same 1,000,000 messages will be impractically slow unless I add Lucene to the mix. Anyone know if people out there actually do server-side IMAP searches?

I was also able to wire up and test the POP3 retrieval and mail storage mechanism (i.e. calling out to a POP3 server, fetching the messages, and adding them to an IMAP mailstore). It works flawlessly--as it should... POP3 is a simple protocol and my mailstore is a simple implementation.

Still, good to know that "all" I have left to do is add Prevayler metadata persistence, implement the IMAP protocol, provide user management tools, come up with an easy install/configuration tool, add server side mail filtering... ;)