Question

Is there a means of sorting the entries attained from a Scanner? The problem I'm having is that I have suffix indices to alleviate duplicate row ids and when I scan I don't get a perfectly ascending ordered list. For instance, I get something that looks like the following:

RowId: 2013-08-05 15:29:45.872        Value: 0
RowId: 2013-08-05 15:29:45.879        Value: 1
RowId: 2013-08-05 15:29:45.88         Value: 2
RowId: 2013-08-05 15:29:45.881        Value: 11 
//The previous should be the following:
RowId: 2013-08-05 15:29:45.88_a       Value: 3

As you can see .881 > .88 and yet the correct row is placed some 30 entries afterwards. Is there a way to override the sort or is there a convenient means of getting a Scanner back that is correctly ordered?

Was it helpful?

Solution

The entries in Accumulo are sorted lexicographically. In ASCII, '1' sorts before '_', which is why you have '881' before '88_a'. When you're trying to preserve numeric sorting in Accumulo, one approach is to pad the numbers to a fixed length with zeros. If the largest number you have is 999, you would make all the numbers 3 characters long, so '8' would become '008' and '88' would be '088'.

OTHER TIPS

As Billie said, Accumulo sorts lexicographicaly. There is a project on GitHub called Orderly that you might want to check out

This project serializes a wide range of simple and complex key data types into a sort-order preserving byte encoding. Sorting the serialized byte arrays produces the same ordering as the natural sort order of the underlying data type

Unfortunately it hasn't been updated in 6 months. It's an interesting concept though.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top