- Easy to program
- Upgrade-durable (i.e. the data format can survive through upgrades of the software)
- E-mails should be browseable/recoverable by the user
- Good performance for most common tasks
Right now I'm leaning towards use a filesystem-based scheme where IMAP folders are represented by directories, and each message gets its own text file (named by UID) containing only the raw RFC822 message. All of the flags for both the folder and its messages could be stored in one data structure that is persisted using Java serialization.
I'm not sure yet how this approach will scale when folders start holding many thousands of messages. And if many mail messages are very small, then it starts being significantly wasteful of disk space (well, maybe not "significantly" when you consider the size of modern hard drives).
I'm trying to resist getting a database involved, but I have to admit it would make a lot of things easier...
*Yawn*... one to sleep on, perhaps.