What is distributed caching in terms of caching persistent data for multiple applications/ multiple instances of an application

StackOverflow https://stackoverflow.com/questions/17615418

  •  02-06-2022
  •  | 
  •  

Question

I've been trying to study distributed caching for some time and have not been to clarify certain concerns listed below:

  1. Does distributed caching means that the cache should be distributed or distributed applications making use of caching of persistent data in a consistent manner.
  2. In case the caches too are distributed do they need to be disjoint in terms of cache entries or they can share them or the cache entries in each cache should be the same.
  3. Where should the cache reside. Inside the application process or externalized and if both approaches are viable then which one should be preferred in what scenario.
  4. In case of in process distributed caches how does the coherence communication takes place.
  5. In case the cache sits outside the application instances, is there a benefit of distributing the caches and maintaining coherence between them in a non disjoint manner OR is it better to maintain one single cache(or multiple disjoints).
  6. In case of the cache instance(s) being externalized from the application instances how significant the network overheads could be. Or how concerned one should be about the network overheads(process to cache communication) while the trying to build a distributed caching solution.

I am a little novice on things so certain concerns listed above might even make no sense. Solutions/ Corrections are also suggested on this.

Was it helpful?

Solution

My best attempt for answers:

  1. I would say both. A distributed cache means that the logical idea of a cache is spread across multiple distinct machines. For example, you might have 5 nodes in your cache, and each node is on it's on machine/VM.

    Usually once you need a distributed cache, your application is also distributed. Small website = one server, maybe one cache node. Big website = many web servers, distributed cache.

  2. Most distributed caches distribute the cache entries evenly amongst nodes. If you write an entry to one node, it will get replicated to all other nodes. The idea is that each cache node can be taken out of the "cluster" and you don't lose any data.

    The idea of having one cache entry on one machine is called sharding. It means you look at the cache key and then decide which cache node to store it on.

    For existing distributed caches, you shouldn't have to manage/worry about any of this though.

  3. In regards to distributed caches, they should be on their own machines with no other processes running. Caches usually reside in memory, so you don't want other things competing for that precious RAM.

    You could technically put a web server on the same machine as a cache node, but just be aware they will compete for physical resources.

  4. Don't worry about it. =) Each distributed cache behaves differently, so it's good to read up on it, but they all handle the replication of data on their own. You shouldn't have to worry about it/manage it.

  5. I would maintain one logical cache that is distributed across many machines. Again the reason for this is in case a node goes down. If your cache node goes down and it had values that don't exist anywhere else, then you're in big trouble. (Database might get overwhelmed serving requests that the cache was handling.)

  6. Good question. =) If the boxes are on the same internal network, the cost is really, really low. As long as the cache isn't on the west coast and the web servers are on the east coast, you should be fine. There is a price to pay of course, but there are creative ways to get around it.

Hope that helps!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top