Question

I'm using leveldb to store records (key-value), where the key is a 64-bit hash and the value is a double. To make an analogy: think of the 64-bit hash is a unique ID of a customer and the double as an account balance (i.e. how much money they have in their account). I want to sort the database by account balance and list the customers with the highest account balance first. However, the database cannot fit into memory so I have to use some other method for sorting it in order to sort by account balance.

I'm considering using STXXL, but it requires that I make a copy of the database into a single flat file, then I can use STXXL to do an external sort (which would make a bunch of smaller files, sort them and then merge them back into another single flat file). Is there a better approach to sorting the data or should I go with the STXXL sort?

Was it helpful?

Solution

How many entries do you have? Could an unsigned 32-bit integer be used as an index (would allow 4,294,967,296 indexes) which could be used to identify how to sort the original array?

i.e. create pairs of 32-bit indexes and account balances, sort on the balances then use the 32 bit indexes to work out what order the original data should be in?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top