Question

Now I have setup my Hadoop cluster, HBase and Hive. Next step I want to setup Cloudera Impala to query data from either HBase or HDFS. I search from the Internet but cannot find a clear, understanding instruction about how to setup Cloudera Impala on top of HSDFS and HBase (maybe on top of Hive), can anybody give me a guide about setting up and configuring Cloudera Impala on top of HDFS and HBase?

Was it helpful?

Solution

First of all, it wasn't clear from your question if you have CDH or stock Apache Hadoop, HBase, etc. installed. That's important - although it will theoretically work on stock Hadoop, Impala is only tested and supported on CDH.

If you do not have Impala or CDH installed, by far the easiest way to do that is via Cloudera Manager, which will automate the install/deployment of a CDH/Impala cluster. Cloudera Express, which is free, includes everything you need to do that. You will have the choice of doing an automated single-package install, or by downloading a series of Linux packages. The options are described in detail here.

Or, if you're just looking for a demo, download and install the QuickStart VM, which contains a single-node cluster (including CDH + Impala), guest OS, and data/scripts/examples.

Downloads for any of the above can be found here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top