Question

I am facing an issue where I have a data stream that sends unordered data. I'm trying to find a way to receive the data in random order, but send it in order.

As an example, I'll receive object4 and then object3 and then object1. I'll need my system to store object4 and object3 when they arrive and immediately send object1. In the future, when object2 arrives, I'll need the system to immediately send object2 and then recheck the array to send object3 and object4 and so on.

Some more info:

  • The data is sure to be received fully, so there's no missing data.
  • The data is numbered (e.g: object1, object20).

My current solution is:

  • When receiving a new object...
    • if the new object is in order, send it immediately.
    • If the new object is not in order
      • Store it in a list
      • Check the list if it contains the next object to send
  • After sending an object...
    • Check the list if it contains the next object to send

So this system is rechecking the list for items to send on two events:

  1. When a new not-in-order object is added.
  2. After a successful send

As for sending

After a successful send, the sent object will be removed from the list

As for concurrency

For sake of argument, assume its a producer-consumer relationship where the list is concurrently accessed from both players:

  • The producer thread is pushing new data to the list.
  • The consumer thread is checking the list, sending and deleting the sent data.

My question is that, is this a good mechanism? Is there a better data structure to help me with this issue?

Was it helpful?

Solution

I think your approach is fairly spot-on, though I would recommend that you keep items received thus far into a sorted list using insertion sort.

Insertion sort works best when the item is being inserted into a list that has already been sorted. By having a sorted list, you have the added advantage that after sending an object, you can immediately check the first item in the list to see if it is the next, without having to check every object each and every time.

In other words, you dedicate O(n) time towards keeping the list sorted, and then every other operation is O(1).

@Telastyn makes a good point in the comments. Do be mindful of keeping it thread-safe if potentially this array is being accessed by multiple threads and multiple requests!

OTHER TIPS

If the elements to be received are numbered 1..N, with no missing values, you can use a straight-up array of pointers to received elements. Initialize the array elements to all empty, and initialize a next_to_transmit variable to the start of the array.

When you receive an element, if its index is equal to the next_to_transmit, you will transmit it, and then walk the array forward from next_to_transmit, sending and releasing full elements and updating next_to_transmit until you encounter an empty one or you run out of array.

When you receive an element, and its index is not equal to next_to_transmit, you stuff it in the table.

The advantage of this over a sorted list is that writing to a known array slot is O(1), as opposed to O(N) for insertion into a sorted list. If the data object size is large compared to the size of a pointer, the average storage requirement will not be any worse. (The worst-case requirement is the same: N-1 for elements 2..N, and then receive element 1 to trigger bulk transmission of the whole data set.)

I have not done a careful enough analysis to say what concurrency issues you might encounter with this approach. My gut feeling is that you will have minimal issues.

Licensed under: CC-BY-SA with attribution
scroll top