Question

So my case is a closed email system.

The emails are HTML enabled.

What is needed:
Full text searching (there are over 1 million emails in database, but they are usually pre-filtered based on users active in recent time)
Archiving - How can I archive emails that are old (older than 1-2 years).

Which is a better way to save these emails? As files on the server, or inside the database table. Or is it a combination of the two (due to archiving)?

After the above question - what are the specific tools/plugins that I can use to make the job easier. I remember hearing about Solr a little bit, but I am not sure what are other options / possibilities.

Was it helpful?

Solution

Solr would help you on the search side, but has nothing to do with archiving. Look at Solr DIH, there was a contrib module (I think) that reads IMAP sources.

Regarding archiving, that is a very large area... there are many questions you must answer:

  • you want to store mails as a whole, or decompose its parts so you can deduplicate also parts that are repeated in different mails.
  • I would lean towards storing on filesystem, but watch out for:
  • you need to devise a way to detect deduplication
  • store smartly over a tree of dirs so you dont slow each dir browsing.
  • compress when needed (not smaller ones or imcompressible ones)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top