Question

I can easily create an ORC file format in Apache Hadoop or Hortonworks' HDP:

CREATE TABLE ... STORED AS ORC

However this doesn't work in Cloudera's CDH 4.5. (Surprise!) I get:

FAILED: SemanticException Unrecognized file format in STORED AS clause: ORC

So as an alternative, I tried to download and install the Hive jar that contains the ORC classes:

hive> add jar /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/hive/lib/hive-exec-0.11.0.jar;

Then create my ORC Table:

hive>    CREATE TABLE test (name STRING)

> row format serde
> 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
>  stored as inputformat
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'   
>   outputformat
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
OK

But upon inserting into this table from some CSV data, I get an error:

hive> INSERT OVERWRITE TABLE test 
> SELECT name FROM textdata;

    Diagnostic Messages for this Task:
java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)

How should I create an ORC table in Hive in CDH?

Was it helpful?

Solution

CDH 4.5 contains Hive 0.10, see CDH Version 4.5.0 Packaging and Tarballs. ORC was added in Hive 0.11, see release notes and HIVE-3874: Create a new Optimized Row Columnar file format for Hive.

CDH 5 is in Beta now but it does contain Hive 0.11, see CDH Version 5.0.0 Beta 1.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top