Frage

I'm looking for the best solution for caching thousands of web pages. Right now I am using flat files, which works great until there's many thousands of flat files, then the entire file system slows down (a lot) when accessing the cache of files (Running on CentOS with EXT3 under OpenVZ). I'd like to explore other options such as Redis or MongoDB as a substitute, but would they be any faster? And if not, what would be the best suggestion?

My system dynamically creates over 40K pages per website, so it's not feasible to do a memory cache either.

Thanks!!

War es hilfreich?

Lösung

A file cache is fine, you just have to be smart about it. I'd aim to keep directorys to, say, 500 entries or less. with 40k entries, just hashing the url and using the first 2 bytes of the hash will give you 255 folders, each of which should contain on average ~150 files.

Andere Tipps

Well I know StackExchange uses Redis on CentOS, so it should be even better from a LAMP stack. Redis seems to be optimized for that sort of thing, whereas MongoDB is more of an actual database. You could also use memcached.

I would suggest spreading the files in subdirectories, possibly grouped by the first two or three characters of the md5 hash of the name of the cache file (or just the first two or five characters of the file name). This takes a bit of load off the file system.

Have you looked at using something like Varnish? Depending on what you're caching and how complicated your invalidation is, it could work for you. You would create your pages dynamically and let proxy layer handle any duplicate requests.

https://www.varnish-cache.org/

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top