Question

Is there a column store similar to Vertica that is built on top of Hadoop.. I am not talking about HBase as it is sparse matrix store and can not get the level of compression that can be achieved by something like Vertica?

Thanks

Was it helpful?

Solution

Are you looking for something like RCFile? It is a file type that uses a columnar store internally.

OTHER TIPS

RCFile is a good start. RCFile stores data in a PAX layout -- columnar within blocks that could be as large as HDFS's block size. There is a paper at VLDB 2011 describing another columnar storage format here and a blog post with a short comparison to RCFile here.

I haven't worked with Hadoop, but I know Vertica has been trying to integrate with Hadoop.

http://www.vertica.com/the-analytics-platform/native-bi-etl-and-hadoop-mapreduce-integration/

Look on Hadapt http://hadapt.com/

This is a commercial version of HadoopDB http://db.cs.yale.edu/hadoopdb/hadoopdb.html developed at Yale University. It can work with a column-oriented DB (which is installed on every node of a computer cluster) while leverages Hadoop for fault-tolerant execution.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top