Question

I came across a problem in multithreading, Model of multithreading is 1 Producer - N Consumer.

Producer produces the data (character data around 200bytes each), put it in fixed size cache ( i.e 2Mil). The data is not relevent to all the threads. It apply the filter ( configured ) and determines no of threads qualify for the produced data.

Producer pushes the pointer to data into the queue of qualifying threads ( only pointer to the data to avoid data copy). Threads will deque and send it over TCP/IP to their clients.

Problem: Because of only pointer to data is given to multiple threads, When cache becomes full, Produces wants to delete the first item(old one). possibility of any thread still referring to the data.

Feasible Way : Use Atomic granularity, When producer determines the number of qualifying threads, It can update the counter and list of thread ids.

class InUseCounter
{
    int           m_count;
    set<thread_t> m_in_use_threads;
    Mutex         m_mutex;
    Condition     m_cond;

public:
    // This constructor used by Producer
    InUseCounter(int count, set<thread_t> tlist)
    {
        m_count          = count;
        m_in_use_threads = tlist;
    } 

    // This function is called by each threads
    // When they are done with the data, 
    // Informing that I no longer use the reference to the data.
    void decrement(thread_t tid)
    {
        Gaurd<Mutex> lock(m_mutex);
        --m_count;
        m_in_use_threads.erease(tid);
    }

    int get_count() const { return m_count; }
};

master chache

map<seqnum, Data>
              |
              v
             pair<CharData, InUseCounter>

When producer removes the element it checks the counter, is more than 0, it sends action to release the reference to threads in m_in_use_threads set.

Question

  1. If there are 2Mil records in master cache, there will be equal number of InUseCounter, so the Mutex varibles, Is this advisable to have 2Mil mutex varible in one single process.
  2. Having big single data structure to maintain the InUseCounter will cause more locking time to find and decrement
  3. What would be the best alternative to my approach to find out the references, and who all have the references with very less locking time.

Advance thanks for you advices.

Was it helpful?

Solution

  1. 2 million mutexes is a bit much. Even if they are lightweight locks, they still take up some overhead.
  2. Putting the InUseCounter in a single structure would end up involving contention between threads when they release a record; if the threads do not execute in lockstep, this might be negligible. If they are frequently releasing records and the contention rate goes up, this is obviously a performance sink.
  3. You can improve performance by having one thread responsible for maintaining the record reference counts (the producer thread) and having the other threads send back record release events over a separate queue, in effect, turning the producer into a record release event consumer. When you need to flush an entry, process all the release queues first, then run your release logic. You will have some latency to deal with, as you are now queueing up release events instead of attempting to process them immediately, but the performance should be much better.

Incidentally, this is similar to how the Disruptor framework works. It's a high performance Java(!) concurrency framework for high frequency trading. Yes, I did say high performance Java and concurrency in the same sentence. There is a lot of valuable insight into high performance concurrency design and implementation.

OTHER TIPS

Since you already have a Producer->Consumer queue, one very simple system consists in having a "feedback" queue (Consumer->Producer).

After having consumed an item, the consumer feeds the pointer back to the Producer so that the Producer can remove the item and updates the "free-list" of the cache.

This way, only the Producer ever touches the cache innards, and no synchronization is necessary there: only the queues need be synchronized.

  1. Yes, 2000000 mutexes are an overkill.
  2. 1 big structure will be locked longer, but will require much less lock/unlocks.
  3. the best approach would be to use shared_ptr smart pointers: they seem to be tailor made for this. You don't check the counter yourself, you just clean up your pointer. shared_ptr is thread-safe, not the data it points to, but for 1 producer (writer) / N consumer (readers), this should not be an issue.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top