Domanda

I can easily create an ORC file format in Apache Hadoop or Hortonworks' HDP:

CREATE TABLE ... STORED AS ORC

However this doesn't work in Cloudera's CDH 4.5. (Surprise!) I get:

FAILED: SemanticException Unrecognized file format in STORED AS clause: ORC

So as an alternative, I tried to download and install the Hive jar that contains the ORC classes:

hive> add jar /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/hive/lib/hive-exec-0.11.0.jar;

Then create my ORC Table:

hive>    CREATE TABLE test (name STRING)

> row format serde
> 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
>  stored as inputformat
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'   
>   outputformat
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
OK

But upon inserting into this table from some CSV data, I get an error:

hive> INSERT OVERWRITE TABLE test 
> SELECT name FROM textdata;

    Diagnostic Messages for this Task:
java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)

How should I create an ORC table in Hive in CDH?

È stato utile?

Soluzione

CDH 4.5 contains Hive 0.10, see CDH Version 4.5.0 Packaging and Tarballs. ORC was added in Hive 0.11, see release notes and HIVE-3874: Create a new Optimized Row Columnar file format for Hive.

CDH 5 is in Beta now but it does contain Hive 0.11, see CDH Version 5.0.0 Beta 1.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top