Question

I need to handle a data table (2D) of a maximum of 100 million rows and 30 columns. The table elements contain only 0s, 1s and dashes (-). The table is given to me in a text file. I have to store this data into some data structure and perform operations like sorting the table, deleting rows, comparing elements (both in different rows and columns), etc.

Currently I am using a 2D vector for this. I tried using an integer vector and a bit vector. Both work for 10mil rows and 25 columns but neither works for the above said maximum limit (I have a bad alloc).

I am assuming that this is because the vector needs contiguous memory. What would be the right data structure for this? Timing is also one of my considerations and would want a low access time as well. Also, if a vector is actually the right data structure, is there a better implementation I could use to make this work? I have to stick to C++ and can't use databases and stuff.

Thanks a lot!

Was it helpful?

Solution

Check out Nstate.

It is a library for Packed Arrays for an arbitrary radix (so tristates qualify).

http://nstate.hostilefork.com

Disclaimer: I wrote nstate.

OTHER TIPS

You can consider an array of boost::tribool

See

You can also consider a wrapper over std::bitset such that for every tribool there are 2 bits in bitset.

tribool of size N requires bitset of size 2*N

Mapping:

tribool(i) maps to bitset(2*i) and bitset(2*i+1)

if tribool(i) == indeterminate, then bitset(2*i) = false
if tribool(i) == true, then bitset(2*i) = true, bitset(2*i) = true
if tribool(i) == false, then bitset(2*i) = true, bitset(2*i) = false

C++ish pseudo-code:

template<size_t N>
class PackedTribool {
public:
  ??? test( i ) {
    if( bits_[ 2 * i ] == false ) {
        return indeterminate;
    }
    else {
        return( bits_[ 2 * i + i ];
    }
  }
private:
  std::bitset<2*N> bits_;     // note: size is doubled
};
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top