Question

I need to store a large number of Long values in a SortedSet implementation in a space-efficient manner. I was considering bit-set implementations and discovered Javaewah. However, the API expects int values rather than longs.

Can anyone recommend any alternatives or suggest a good way to solve this problem? I am mainly concerned with space efficiency. Upon building the set I will need to access the minimum and maximum element once. However, access time is not a huge concern (i.e. so a fully run-length encoded implementation will be fine).

EDIT

I should be clear that the implementation does not have to implement the SortedSet interface providing I can access the minimum and maximum elements of the collection.

Was it helpful?

Solution

You could use TLongArrayList which uses a long[] underneath. It supports sort() so the min and max will be the first and last value.

Or you could use a long[] with a length and do this yourself. ;)

This will use about 64 byte more than the raw values themselves. You can get more compact if you can make some assumptions about the range of long values. e.g. if they are actually limited to 48-bit.

You might consider using LongBuffer. If it is memory mapped it avoids using heap or direct memory, but you would have implement a sort routine yourself.


If they are clustered, you might be able to represent the data as a Set of ranges. The ranges could be a pure A - B, or a BitSet with a starting value. The later works well for phone numbers. ;)

OTHER TIPS

Not sure if it has Set or how efficient it is compared to regular JCF, but take a look at this:

http://commons.apache.org/primitives/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top