The entries in Accumulo are sorted lexicographically. In ASCII, '1' sorts before '_', which is why you have '881' before '88_a'. When you're trying to preserve numeric sorting in Accumulo, one approach is to pad the numbers to a fixed length with zeros. If the largest number you have is 999, you would make all the numbers 3 characters long, so '8' would become '008' and '88' would be '088'.
Accumulo - Getting a properly sorted Scanner result
Question
Is there a means of sorting the entries attained from a Scanner? The problem I'm having is that I have suffix indices to alleviate duplicate row ids and when I scan I don't get a perfectly ascending ordered list. For instance, I get something that looks like the following:
RowId: 2013-08-05 15:29:45.872 Value: 0
RowId: 2013-08-05 15:29:45.879 Value: 1
RowId: 2013-08-05 15:29:45.88 Value: 2
RowId: 2013-08-05 15:29:45.881 Value: 11
//The previous should be the following:
RowId: 2013-08-05 15:29:45.88_a Value: 3
As you can see .881 > .88 and yet the correct row is placed some 30 entries afterwards. Is there a way to override the sort or is there a convenient means of getting a Scanner back that is correctly ordered?
Solution
OTHER TIPS
As Billie said, Accumulo sorts lexicographicaly. There is a project on GitHub called Orderly that you might want to check out
This project serializes a wide range of simple and complex key data types into a sort-order preserving byte encoding. Sorting the serialized byte arrays produces the same ordering as the natural sort order of the underlying data type
Unfortunately it hasn't been updated in 6 months. It's an interesting concept though.