Question

I created a simple four-node Hadoop cluster with CDH 4.7, including Impala 1.1. I'm able to copy CSV files to HDFS and create and query Impala tables over the as described in the tutorial. But I can't query the same table on a different data node:

[example.com:21000] > select * from tab1;
Query: select * from tab1
ERROR: AnalysisException: Table does not exist: default.tab1

I thought perhaps I needed to reissue the CREATE TABLE statement on the second node, but then it suddenly knows the table's there:

[example.com:21000] > CREATE EXTERNAL TABLE tab1
                    > (
                    >    id INT,
                    >    col_1 BOOLEAN,
                    >    col_2 DOUBLE,
                    >    col_3 TIMESTAMP
                    > )
                    > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
                    > LOCATION '/user/dwheeler/sample_data/tab1';
Query: create EXTERNAL TABLE tab1
(
id INT,
col_1 BOOLEAN,
col_2 DOUBLE,
col_3 TIMESTAMP
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/theory/sample_data/tab1'
ERROR: AlreadyExistsException: Table tab1 already exists

So it knows it's there, but I can't query it --- or refresh it:

[example.com:21000] > refresh tab1;
Query: refresh tab1
ERROR: AnalysisException: Table does not exist: default.tab1

Is there some command I need to execute to get all of the impalads runnig on data nodes to make a queryable table?

Was it helpful?

Solution

I filed a bug report and got back an answer:

In Impala 1.1 and earlier you need to issue an explicit "invalidate metadata" command to make tables created on other nodes visible to the local Impala daemon.

Starting with Impala 1.2 this won't be necessary; the new catalog service will take care of metadata distribution to all impalad's in the cluster.

So it was INVALIDATE METADATA that I had failed to notice. Glad to hear it won’t be necessary in 2.0.

OTHER TIPS

I had what I thought was the same issue, but it wasn't resolved by

invalidate metadata;

It turned out that my hive was accessing a local derby database, which impala could not see.

The smoking gun:

On the system where I had imported the table through hive, I had

cat /etc/hive/conf/hive-site.xml
[...]
<property>
    <name>javax.jdo.option.ConnectionURL</name>
   <value>jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
</property>
[...]

The solution:

I re-deployed the hive client configuration from Cloudera Manager.

Afterwards:

  cat /etc/hive/conf/hive-site.xml
  [...]
  <property>
    <name>hive.metastore.local</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://[snipped-host-name]:[snipped-port]</value>
  </property>

Apparently Cloudera Manager is supposed to deploy the client configuration but in some versions it sometimes fails to do so.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top