Question

I wonder if anybody could help me out.

I look for a data structure (such as list, queue, stack, array, vector, binary tree etc.) supporting these four operations:

  • isEmpty (true/false)
  • insert single element
  • pop (i.e. get&remove) single element
  • split into two structures e.g. take a approximately half (let's say +/- 20%) of elements and move them to another structure

Note that I don't care about order of elements at all.

Insert/pop example:

A.insert(1), A.insert(2), A.insert(3), A.insert(4), A.insert(5) // contains 1,2,3,4,5 in any order
A.pop() // 3
A.pop() // 2
A.pop() // 5
A.pop() // 1
A.pop() // 4

and the split example:

A.insert(1), A.insert(2), A.insert(3), A.insert(4), A.insert(5)
A.split(B)
// A = {1,4,3}, B={2,5} in any order

I need the structure to be be fast as possible - preferably all four operations in O(1). I doubt it have been already implemented in std so I will implement it by myself (in C++11, so std::move can be used).

Note that insert, pop and isEmpty are called about ten times more frequently than split.

I tried some coding with list and vector but with no success:

#include <vector>
#include <iostream>

// g++ -Wall -g -std=c++11
/*
output:
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
5 6 7 8 9
*/

int main ()
{
        std::vector<int> v1;

        for (int i = 0; i < 10; ++i) v1.push_back(i);

        for (auto i : v1) std::cout << i << " ";
        std::cout << std::endl;

        auto halfway = v1.begin() + v1.size() / 2;
        auto endItr  = v1.end();

        std::vector<int> v2;
        v2.insert(v2.end(),
                std::make_move_iterator(halfway),
                std::make_move_iterator(endItr));

        // sigsegv
        /*
        auto halfway2 = v1.begin() + v1.size() / 2;
        auto endItr2  = v1.end();
        v2.erase(halfway2, endItr2);
        */

        for (auto i : v1) std::cout << i << " ";
        std::cout << std::endl;

        for (auto i : v2) std::cout << i << " ";
        std::cout << std::endl;

        return 0;
}

Any sample code, ideas, links or whatever useful? Thanks

Related literature:

Was it helpful?

Solution

Your problems with the deletion aare due to a bug in your code.

// sigsegv
auto halfway2 = v1.begin() + v1.size() / 2;
auto endItr2  = v1.end();
v2.erase(halfway2, endItr2);

You try to erase from v2 with iterators pointing into v1. That won't work and you probably wanted to callerase on v1.

That fixes your deletion problem when splitting the vector, and vector seems to be the best container for what you want.

Note that everything except split can be done in O(1) on a vector if you insert at the end only, but since order doesn't matter for you I don't see any problem with it, split would be O(n) in your implemention once you fixed it, but that should be pretty fast since the data is right next to eachother in the vector and that's very cache friendly.

OTHER TIPS

I can’t think of a solution with all operations in O(1).

With a list you can have push and pop in O(1), and split in O(n) (due to the fact that you need to find the middle of the list).

With a balanced binary tree (not a search tree) you can have all operations in O(log n).

edit

There have been some suggestions that keeping the middle of the list would produce O(1). This is not the case as when you split the function you have to compute the middle of the left list and the middle of the right list resulting in O(n).

Some other suggestion is that a vector is preferred simply because it is cache-friendly. I totally agree with this.

For fun, I implemented a balanced binary tree container that performs all operations in O(log n). The insert and pop are obviously in O(log n). The actual split is in O(1), however we are left with the root node which we have to insert in one of the halves resulting in O(log n) for split also. No copying is involved however.

Here is my attempt at the said container (I haven’t thoroughly tested for correctness, and it can be further optimized (like transforming the recursion in a loop)).

#include <memory>
#include <iostream>
#include <utility>
#include <exception>

template <class T>
class BalancedBinaryTree {
  private:
    class Node;

    std::unique_ptr<Node> root_;

  public:
    void insert(const T &data) {
      if (!root_) {
        root_ = std::unique_ptr<Node>(new Node(data));
        return;
      }
      root_->insert(data);
    }

    std::size_t getSize() const {
      if (!root_) {
        return 0;
      }
      return 1 + root_->getLeftCount() + root_->getRightCount();
    }

    // Tree must not be empty!!
    T pop() {
      if (root_->isLeaf()) {
        T temp = root_->getData();
        root_ = nullptr;
        return temp;
      }
      return root_->pop()->getData();
    }

    BalancedBinaryTree split() {
      if (!root_) {
        return BalancedBinaryTree();
      }

      BalancedBinaryTree left_half;
      T root_data = root_->getData();
      bool left_is_bigger = root_->getLeftCount() > root_->getRightCount();

      left_half.root_ = std::move(root_->getLeftChild());
      root_ = std::move(root_->getRightChild());

      if (left_is_bigger) {
        insert(root_data);
      } else {
        left_half.insert(root_data);
      }

      return std::move(left_half);
    }
};


template <class T>
class BalancedBinaryTree<T>::Node {
  private:
    T data_;
    std::unique_ptr<Node> left_child_, right_child_;
    std::size_t left_count_ = 0;
    std::size_t right_count_ = 0;

  public:
    Node() = default;
    Node(const T &data, std::unique_ptr<Node> left_child = nullptr,
         std::unique_ptr<Node> right_child = nullptr)
        : data_(data), left_child_(std::move(left_child)),
         right_child_(std::move(right_child)) {
    }

    bool isLeaf() const {
      return left_count_ + right_count_ == 0;
    }

    const T& getData() const {
      return data_;
    }
    T& getData() {
      return data_;
    }

    std::size_t getLeftCount() const {
      return left_count_;
    }

    std::size_t getRightCount() const {
      return right_count_;
    }

    std::unique_ptr<Node> &getLeftChild() {
      return left_child_;
    }
    const std::unique_ptr<Node> &getLeftChild() const {
      return left_child_;
    }
    std::unique_ptr<Node> &getRightChild() {
      return right_child_;
    }
    const std::unique_ptr<Node> &getRightChild() const {
      return right_child_;
    }

    void insert(const T &data) {
      if (left_count_ <= right_count_) {
        ++left_count_;
        if (left_child_) {
          left_child_->insert(data);
        } else {
          left_child_ = std::unique_ptr<Node>(new Node(data));
        }
      } else {
        ++right_count_;
        if (right_child_) {
          right_child_->insert(data);
        } else {
          right_child_ = std::unique_ptr<Node>(new Node(data));
        }
      }
    }

    std::unique_ptr<Node> pop() {
      if (isLeaf()) {
        throw std::logic_error("pop invalid path");
      }

      if (left_count_ > right_count_) {
        --left_count_;
        if (left_child_->isLeaf()) {
          return std::move(left_child_);
        }
        return left_child_->pop();
      }

      --right_count_;
      if (right_child_->left_count_ == 0 && right_child_->right_count_ == 0) {
        return std::move(right_child_);
      }
      return right_child_->pop();
    }
};

usage:

  BalancedBinaryTree<int> t;
  BalancedBinaryTree<int> t2;

  t.insert(3);
  t.insert(7);
  t.insert(17);
  t.insert(37);
  t.insert(1);

  t2 = t.split();

  while (t.getSize() != 0) {
    std::cout << t.pop() << " ";
  }
  std::cout << std::endl;

  while (t2.getSize() != 0) {
    std::cout << t2.pop() << " ";
  }
  std::cout << std::endl;

output:

1 17
3 37 7

If the number of elements/bytes stored at any one time in your container is large, the solution of Youda008 (using a list and keeping track of the middle) may not be as efficient as you hope.

Alternatively, you could have a list<vector<T>> or even list<array<T,Capacity>> and keep track of the middle of the list, i.e. split only between two sub-containers, but never split a sub-container. This should give you both O(1) on all operations and reasonable cache efficiency. Use array<T,Capacity> if a single value for Capacity serves your needs at all times (for Capacity=1, this reverts to an ordinary list). Otherwise, use vector<T> and adapt the capacity for new vectors according to demand.

bolov's points out correctly that finding the middles of the lists emerging from splitting one list is not O(1). This implies that keeping track of the middle is not useful. However, using a list<sub_container> is still faster than list, because the split only costs O(n/Capacity) not O(n). The price you pay for this is that the split has a graininess of Capacity rather than 1. Thus, you must compromise between the accuracy and cost of a split.

Another option is to implement own container using a linked list and a pointer to that middle element, at which you want to split it. This pointer will be updated on every modifying operation. This way you can achieve O(1) complexicity on all operations.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top