Pregunta

The desire is to keep a set of numbers sorted (ascending or descending, but examples below show only ascending). The data structure representation for utmost speed is the question.

Say an aggregating program continously getting packets of numbers from many different monitoring agents, for example through a network. The idea is to keep them sorted at all times fast. As an example, you might get these packets (using ints but double is the actual case) in sequence:

A = [1, 3, 4, 6]
B = [1, 2, 3]
C = [2, 3, 5]
A = [2, 4, 7, 8]

and so on. After the first packet, your data structure in your aggregator will be already sorted (the data structure remembers what source each number in the sort refers to) :

[1, 3, 4, 6] => Event

after the next packet, since it is a new source, the data structure will look like this

[1, 1, 2, 3, 3, 4, 6] => Event

after the next packet,

[1, 1, 2, 2, 3, 3, 3, 4, 5, 6] => Event

and now since A sent new packet, we have to get find the old values of A, and replacing them with the new, finally ending up with a new sort. The replacing and sorting can happen seperately or not (inplace), the goal is extreme speed:

[1, 2, 2, 2, 3, 3, 4, 5, 7, 8] => Event

Note that when you get the second A, all the old As have to be "replaced" by the new As packet while maintaining a sort. After each packet is sorted into the data structure, it is copied and needs to be sent as an "event". These packets are coming furiously and continously at the merging-sorting algorithm every few microseconds.

* What is the best data structure to do this? Maybe Splay Tree or AVL tree? *

¿Fue útil?

Solución

This is not going to be the fastest data structure & algorithm for your particular purpose I guess, but it may be fast enough. Test it yourself.

Note that a std::forward_list or even a std::vector might be faster depending on the actual scenario (-> constant factors in big-O-notation).

tmyklebu mentioned another approach in the comments: depending on the scenario, it might be faster to merge on-demand, e.g. storing all data sets individually and merging them into a vector to pass to the event handler, or even using a "merging" iterator (whose increment gets the next element of the individual data sets).

Further performance improvements may be achieved by using a custom memory pool -> custom allocator.

#include <set>
#include <iostream>
#include <iterator>
#include <algorithm>

// inserts a sorted range into the `to` container
template < typename To, typename InputIt >
void insert_new_sorted(To& to,
                       InputIt beg_old, InputIt end_old,
                       InputIt beg_new, InputIt end_new)
{
    auto const& comp = to.value_comp();
    typename To::iterator i = to.begin();

    // might improve performance: don't remove elements which are in both
    // ranges (old and new)
    while(beg_old != end_old && beg_new != end_new)
    {
        if(comp(*beg_old, *beg_new))
        {
            // remove old element
            i = to.find(*beg_old);  // "slow", no hint :(
            i = to.erase(i);
            ++beg_old;
        }else if(comp(*beg_new, *beg_old))
        {
            // insert new element
            // using the hint to achieve better performance
            i = to.insert(i, *beg_new);
            ++beg_new;
        }else
        {
            // both equal, do nothing
            ++beg_new;
            ++beg_old;
        }
    }

    // remove remaining old elements
    for(; beg_old != end_old; ++beg_old)
    {
        to.erase(to.find(*beg_old));  // "slow", no hint :(
    }

    // insert remaining new elements
    for(; beg_new != end_new; ++beg_new)
    {
        i = to.insert(i, *beg_new);
    }

    std::copy(to.begin(), to.end(),
        std::ostream_iterator<typename To::value_type>(std::cout, ", "));
    std::cout << std::endl;
}

int main()
{
    using set_t = std::multiset<double>;

    set_t const A = {1, 3, 4, 6};
    set_t const B = {1, 2, 3};
    set_t const C = {2, 3, 5};
    set_t const A2 = {2, 4, 7, 8};

    set_t result;
    insert_new_sorted(result, A.end(), A.end(), A.begin(), A.end());
    insert_new_sorted(result, B.end(), B.end(), B.begin(), B.end());
    insert_new_sorted(result, C.end(), C.end(), C.begin(), C.end());
    insert_new_sorted(result, A.begin(), A.end(), A2.begin(), A2.end());
}

Output:

1, 3, 4, 6,
1, 1, 2, 3, 3, 4, 6,
1, 1, 2, 2, 3, 3, 3, 4, 5, 6,
1, 2, 2, 2, 3, 3, 4, 5, 7, 8,


A different approach: store the iterators of the inserted elements, to speed up erasing.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top