Question

Is there a straightforward way to find a row in Accumulo that does not have a specific column family?

For example, here is some simple sample data (omitting timestamp and visibility):

r|cf|cq|v
1|A |  | 
2|A |  | 
2|B |  |  
3|A |  | 
3|B |  |
4|C |  |

I'd like to create a scanner that looks for rows without a "B" column family. In this case, it would return row 1 and 4.

Was it helpful?

Solution

There isn't a specific API call in Accumulo that you can use, but this is a great example as to why Accumulo's (SortedKeyValue)Iterator concept is cool. We can write a small amount of code and perform this filtering on the server instead of on the client.

Rather than leave you hanging, here's some code: https://github.com/joshelser/RowsWithoutColumns

Specifically, you can find the iterator: https://github.com/joshelser/RowsWithoutColumns/blob/master/src/main/java/accumulo/RowsWithoutColumnIterator.java

And some code that invokes it: https://github.com/joshelser/RowsWithoutColumns/blob/master/src/test/java/test/RowsWithoutColumnIteratorTest.java

A few things to note, the RowsWithoutColumnIterator needs to buffer an entire row in memory to accomplish what you're asking. This approach will run you out of memory if you have rows with many many columns. If you have 1,000 columns per row (each key-value being 1KB), the server will have to keep 1MB in memory. If you don't have wide columns, this isn't an issue. This example also depends on 1.5.0 but this code can run against any version of Accumulo (if you change some API calls in the test case).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top