Question

In my HFT trading application I have several places where I receive data from network. In most cases this is just a thread that only receives and process data. Below is part of such processing:

    public Reciver(IPAddress mcastGroup, int mcastPort, IPAddress ipSource)
    {

        thread = new Thread(ReceiveData);

        s = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp);
        s.ReceiveBufferSize = ReceiveBufferSize;

        var ipPort = new IPEndPoint(LISTEN_INTERFACE/* IPAddress.Any*/, mcastPort);
        s.Bind(ipPort);

        option = new byte[12];
        Buffer.BlockCopy(mcastGroup.GetAddressBytes(), 0, option, 0, 4);
        Buffer.BlockCopy(ipSource.GetAddressBytes(), 0, option, 4, 4);
        Buffer.BlockCopy(/*IPAddress.Any.GetAddressBytes()*/LISTEN_INTERFACE.GetAddressBytes(), 0, option, 8, 4);
    }

    public void ReceiveData()
    {
        byte[] byteIn = new byte[4096];
        while (needReceive)
        {
            if (IsConnected)
            {
                int count = 0;
                try
                {
                    count = s.Receive(byteIn);
                }
                catch (Exception e6)
                {
                    Console.WriteLine(e6.Message);
                    Log.Push(LogItemType.Error, e6.Message);
                    return;
                }
                if (count > 0)
                {
                    OnNewMessage(new NewMessageEventArgs(byteIn, count));
                }
            }
        }
    }

This thread works forever once created. I just wonder if I should configure this thread to run on certain core? As I need lowest latency I want to avoid context switch. As I want to avoid context switch I better to run the same thread on the same processor core, right?

Taking into account that i need lowest latency is that correct that:

  • It would be better to set "thread afinity" for the most part of the "long-running" threads?
  • It would be better to set "thread afinity" for the thread from my example above?

I rewriting above code to c++ right now to port to Linux later if this is important however I assume that my question is more about hardware than language or OS.

Was it helpful?

Solution

I think the algorithm that has as little latency as possible would be to pin your threads to one core and set them to realtime priority (or whatever is the highest one).

This will cause the OS to evict any other thread which happens to use that core.

Hopefully the CPU cache will still contain useful data when your thread gets scheduled there. For that reason I like the idea of pinning to a core.

You should probably set your entire process to a high priority class and minimize other activity on your box. Also turn off unused hardware because it might generate interrupts. Fix your NIC's interrupts to a different CPU core (some better NICs can do that).

OTHER TIPS

As I want to avoid context switch I better to run the same thread on the same processor core, right?

No. A context switch will not necessarily be avoided by setting affinity to one CPU. You have no control over context switches, they are in the hands of the OS thread scheduler. They occur when a thread quantum (time slice) has elapsed or when a higher priority thread interrupts your thread.

Latency you talk about, I assume is network or memory latency, is not at all avoided by setting thread affinity. Memory latency can be avoided by making your code cache friendly (ie it can all be in the L1 - L2 caches, for example). Network latency is really just part of any network, and not something I suspect you can do much about.

As Tony The Lion has already answered your question, I would like to address your comment:

"why not setting thread afinity to my code? why thread from my example need to travel between cores?"

Your thread doesn't travel anywhere.

Context switch happens when OS thread scheduler decides to give your thread a slice of time to execute. Then the environment is prepared for your thread, e.g. the CPU registers are set up to correct values etc. This is called context switch.

So regardless of thread affinity, the same CPU setup work has to be done, whether it is the same CPU/core which was used in previous slice when your thread was running or another one. And at this moments, your computer has more info to do it properly then you do at compile time.

You seem to believe that thread somehow resides on the CPU, but it is not so. What you use is a logical thread and there can be hundreds or even thousands of them. Common CPUs, OTOH, usually have 1 or 2 hardware threads per core, and your logical thread gets mapped to one of these every time it is scheduled, even if OS always picks the same HW thread.

EDIT: it seems that you have already picked the answer you want to hear and I don't like long discussion threads on answers so I will put it here.

  • you should try and measure it. I believe that you will be dissapointed
  • running some threads on high priority thread might easily mess up other processes
  • you are worried about context switch latency, but you have no problems that GC thread will freeze your thread? BTW, on which core will your GC thread run? :)
  • what if your highest priority thread blocks GC thread? memory leaks? do you know what is priority of that thread so you are sure it would work?
  • really, why not C or hand optimized assembly if microseconds are important?
  • as someone suggested, you should use an RTOS if you want to control this aspect of execution
  • it doesn't seem likely that your data travels through data center just 4-5 times slower than it takes to setup a thread context on one machine, but who knows...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top