Questions about caching in high-traffic website

https://stackoverflow.com/questions/2246111

20-09-2019
|

Question

Suppose we are building an E-commerce site that allows consumers to search for products by typing in keywords. Say there are at most 200,000 products, and there are millions of consumers using the system. Let’s say the product table is updated fairly frequently. Since the number of products is not that high and we can probably store the entire product table in memory and search against it instead of hitting the database. We are hoping to create distributed caches that store the same data but reside in different servers (for high availability and performance reason) and we need to be able to synchronize data among these caches and invalidate caches when product table is modified.

Our application is built using ASP.NET MVC and NHibernate. I am trying to understand whether NHibernate’s level-2 caching would help with my situation. I would really appreciate if you guys can shed some light on this.

I understand that level-2 caching will help cache query result so if two different users are searching using the same keyword, the L2 Cache will serve the result from the cache instead of from the database. But it doesn’t help us much since the product table is updated frequently and the cached result will be stale. My question is am I understanding L2 caching correctly and is there exists anything that help manage cache the way I would like to do (multiple caches, the same data, synchronize between cache and invalidate cache). Any thoughts is highly appreciated.

Solution

Having used both the second-level cache (using the memcached provider) and the NHibernate.Search add-on it seems to me you could benefit from both.

The NHibernate.Search component depends on Lucene.Net and keyword search is decoupled from the Database it self. A different index file is created per class mapped and optimizations can be set on the property level using attributes, giving you an extra level of granularity. Additionally, you can implement best match and propositions (check Lucene in Action and/or Hibernate Search in action). As a note, you don't have to maintain the index (unless you explicitly request an index rebuild); the implementation manages everything behind the scenes although you can manipulate the index if you wish to do so. So, adding/deleting/updating a product will automatically update the according index.

For the second-level cache you get instant performance boost. On a test environment with a data set of approx 2 mil rows i had more than 20% improvement even on an extremely low request count. The performance boost is gradually larger as the request count increases - the application first hits the 2nd level cache and if it does not find it then hits the DB to fetch the required rows and inserts them on the cache for future queries. Again you can manage stuff like cache duration and other configuration settings, as well as explicitly clear the cache (all of it, a part of it, or particular entries) if you wish to do so. Note that cache state is managed by the application during save/update/delete.

For scallability * the 2nd level cache depends on the provider (ie memcached is highly performant and scalable and supports distributed instances). * for the Lucene.Net/NHibernate.Search you will need to set up a specific place that the indexes will reside and that place must be accessible for read/write by all web-application instances. Note here that the sensitive link is I/O and file contention, so setting up a machine with a faster than light file system will prevent that from happening (i am speaking for your scenario with many thousands of search requests per second)

As a side note i would highly recommend NHibernate.Search since it is extremely faster than LIKE queries and is easier to use than implementing SQL-Server's FullText search inside the application (which i have done).

OTHER TIPS

Whether a second level cache will help depends on exactly how frequently your product table is updated in relation to cache hits. If you add 100 new products an hour but receive 10,000 queries an hour, even a 10% cache hit rate will make a big difference. If the rates are reversed, a second level cache will be of almost no value.

I suggest you set up a stress test environment that closely approximates your production environment and perform benchmarking on the various second level cache providers.

Also check that your DB is configured properly for an update-heavy scenario.

I recommend using NHibernate.Search w/ Lucene. It works together with the 2nd level cache. Lucene can do sophisticated text searching ripping fast and then return back the entity keys to NHibernate which pulls the full entity out of its 2nd level cache. The NHibernate.Search extension does the work of keeping your Lucene index in sync.

TekPub did a recent episode on your exact scenario of searching product descriptions. The episode compares NHibernate queries, SQL Full-text indexing and Lucene w/ NHibernate.Search.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow