Question

Is it possible to use Hive for querying Lucene index which is distributed over Hadoop???

Was it helpful?

Solution

As far as I know you can essentially write custom "row-extraction" code in Hive so I would guess that you could. I've never used Lucene and barely used Hive, so I can't be sure. If you find a more conclusive answer to your question, please post it!

OTHER TIPS

Hadapt is a startup whose software bridges Hadoop with a SQL front-end (like Hive) and hybrid storage engines. They offer a archival text search capability that may meet your needs.

Disclaimer: I work for Hadapt.

I know this is a fairly old post, but thought I could offer a better alternative.

In your case, instead of going through the hassle of mapping your HDFS Lucene index to hive schema, it's better to push them into pig, because pig can read flat files. Unless you want a Relational way of storing your data, you could probably process them through Pig and use, Hbase as your DB.

You could write a custom input format for Hive to access lucene index in Hadoop.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top