twemproxy (nutcracker) performance degradation with .net ServiceStack.Redis client

https://stackoverflow.com//questions/20014030

21-12-2019
|

Question

Setup redis and nutcracker on CentOS 6.4. and trying to connect using ServiceStack.Redis client. Found major performance issue.

For testing left only 1 redis instance

beta:
  listen: 0.0.0.0:22122
  hash: fnv1a_64
  distribution: ketama
  auto_eject_hosts: true
  #timeout: 5000
  #server_retry_timeout: 2000
  #server_failure_limit: 3
  redis: true
  servers:
  #- 127.0.0.1:6379:1
   - 127.0.0.1:6380:1

In the following unit test I'm trying to send 100k strings to redis via nutcracker.

[TestClass]
public class RedisProxyTest
{
    public string host = "192.168.56.112";
    //public int port = 6379;
    public int port = 22122;

    [TestMethod]
    public void TestMethod1()
    {
        var key = "l2";
        var count = 100000;
        using (var redisClient = new RedisClient(host, port))
        {
            var list = new List<string>();
            for (int i = 0; i < count; i++)
            {
                list.Add(Guid.NewGuid().ToString());
            }

            Utils.TimeLog("Remove", () => redisClient.Remove(key));

            Utils.TimeLog("AddRangeToList", () => redisClient.AddRangeToList(key, list));
        }

        using (var redisClient = new RedisClient(host, port))
        {
            redisClient.GetListCount(key);

            Utils.TimeLog("GetRangeFromList", () =>
            {
                var ret = redisClient.GetRangeFromList(key, count / 2, count - 1);
                Console.WriteLine(ret.Count);
            });
        }

    }
}

On first few runs after nutcracker restarted AddRangeToList works with 1-2 sec. But with subsequent runs AddRangeToList performance drops significantly from few minutes even more than 20 mins (if no timeout configured). I cannot reproduce same when using redis directly. I didn't try any other client yet. Any ideas why?

This what I see in console after unit test run:

Test Name:  TestMethod1
Test Outcome:   Passed  
Remove: 0.0331171
AddRangeToList: 806.8219166
50000
GetRangeFromList: 1.741737

Solution

If nutcracker is proxing several tens of thousands of connections or sending multi-get request with several thousands of keys, you should use mbuf size of 512

The following link talks about how to interpret mbuf size? - https://github.com/twitter/twemproxy/issues/141

Every client connection consumes at least one mbuf. To service a request we need two connections (one from client to proxy and another from proxy to server). So we would need two mbufs.

A fragmentable request like 'get foo bar\r\n', which btw gets fragmented to 'get foo\r\n' and 'get bar\r\n' would consume two mbuf for request and two mbuf for response. So a fragmentable request with N fragments needs N * 2 mbufs

The good thing about mbuf is that the memory comes from a reuse pool. Once a mbuf is allocated, it is never freed but just put back into the reuse pool. The bad thing is that once mbuf is allocated it is never freed, since a freed mbuf always goes back to the reuse pool - https://github.com/twitter/twemproxy/blob/master/src/nc_mbuf.c#L23-L24 (this can be fixed by putting a threshold parameter on the reuse pool)

So, if nutcracker is handling say 1K client connections and 100 server connections, it would consume (max(1000, 100) * 2 * mbuf-size) memory for mbuf. If we assume that clients are sending non-pipelined request, then with default mbuf-size of 16K this would in total consume 32M.

Furthermore, if on average every requests has 10 fragments, then the memory consumption would be 320M. Instead of handling 1K client connections, lets say you were handling 10K, then the memory consumption would be 3.2G. Now instead of using a default mbuf-size of 16K, you used 512 bytes, then memory consumption for the same scenario would drop to 1000 * 2 * 512 * 10 = 10M

This is the reason why for 'large number' of connection you want to choose a small value for mbuf-size like 512

OTHER TIPS

Looks like the issue is related to high memory usage when transferring that amount of data.

By default nutcracker allocates 16k buffer size for each key. In my case it is going to be 16k*100000 = 1.5Gb. I saw around 2Gb peak when watching nutcracker process. My Cent OS VM was overloaded and there was no enough memory to handle that spike.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow