문제

I might be asking very basic question, but could not find a clear answer by googling, so putting it here.

Memcached caches information in a separate Process. Thus in order to get the cached information requires inter-process communication (which is generally serialization in java). That means, generally, to fetch a cached object, we need to get a serialized object and generally transport it to network.

Both, serialization and network communication are costly operations. if memcached needs to use both of these (generally speaking, there might be cases when network communication is not required), then how Memcached is fast? Is not replication a better solution?

Or this is a tradeoff of distribution/platform independency/scalability vs performance?

도움이 되었습니까?

해결책

You are right that looking something up in a shared cache (like memcached) is slower than looking it up in a local cache (which is what i think you mean by "replication").

However, the advantage of a shared cache is that it is shared, which means each user of the cache has access to more cache than if the memory was used for a local cache.

Consider an application with a 50 GB database, with ten app servers, each dedicating 1 GB of memory to caching. If you used local caches, then each machine would have 1 GB of cache, equal to 2% of the total database size. If you used a shared cache, then you have 10 GB of cache, equal to 20% of the total database size. Cache hits would be somewhat faster with the local caches, but the cache hit rate would be much higher with the shared cache. Since cache misses are astronomically more expensive than either kind of cache hit, slightly slower hits are a price worth paying to reduce the number of misses.

Now, the exact tradeoff does depend on the exact ratio of the costs of a local hit, a shared hit, and a miss, and also on the distribution of accesses over the database. For example, if all the accesses were to a set of 'hot' records that were under 1 GB in size, then the local caches would give a 100% hit rate, and would be just as good as a shared cache. Less extreme distributions could still tilt the balance.

In practice, the optimum configuration will usually (IMHO!) be to have a small but very fast local cache for the hottest data, then a larger and slower cache for the long tail. You will probably recognise that as the shape of other cache hierarchies: consider the way that processors have small, fast L1 caches for each core, then slower L2/L3 caches shared between all the cores on a single die, then perhaps yet slower off-chip caches shared by all the dies in a system (do any current processors actually use off-chip caches?).

다른 팁

You are neglecting the cost of disk i/o in your your consideration, which is generally going to be the slowest part of any process, and is the main driver IMO for utilizing in-memory caching like memcached.

Memory caches use ram memory over the network. Replication uses both ram-memory as well as persistent disk memory to fetch data. Their purposes are very different.

If you're only thinking of using Memcached to store easily obtainable data such as 1-1 mapping for table records :you-re-gonna-have-a-bad-time:.

On the other hand if your data is the entire result-set of a complex SQL query that may even oveflow the SQL memory pool (and need to be temporarily written to disk to be fetched) you're going to see a big speed-up.

The previous example mentions needing to write data to disk for a read operation - yes it happens if the result set is too big for memory (imagine a CROSS JOIN) that means that you both read and write to that drive (thrashing comes to mind).

In A highly optimized application written in C for example you may have a total processing time of 1microsec and may need to wait for networking and/or serialization/deserialization (marshaling/unmarshaling) for a much longer time than the app execution time itself. That's when you'll begin too feel the limitations of memory-caching over the network.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top