Question

I have been trying to create a table containing a column from another table, but Hive CLI consistently fails to do so.

The following is the query:

CREATE TABLE tweets_id_sample AS
SELECT
   id
FROM tweets_sample;

The CLI error that accompanies this Hive query is as follows:

Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201310250853_0023, Tracking URL = http://sandbox:50030/jobdetails.jsp?jobid=job_201310250853_0023
Kill Command = /usr/lib/hadoop/libexec/../bin/hadoop job  -kill job_201310250853_0023
Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 0
2013-10-26 07:40:37,273 Stage-1 map = 0%,  reduce = 0%
2013-10-26 07:41:21,570 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201310250853_0023 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://sandbox:50030/jobdetails.jsp?jobid=job_201310250853_0023
Examining task ID: task_201310250853_0023_m_000008 (and more) from job job_201310250853_0023
Examining task ID: task_201310250853_0023_m_000000 (and more) from job job_201310250853_0023

Task with the most failures(4):
-----
Task ID:
  task_201310250853_0023_m_000000

URL:
  http://sandbox:50030/taskdetails.jsp?jobid=job_201310250853_0023&tipid=task_201310250853_0023_m_000000
-----
Diagnostic Messages for this Task:

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 7   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

Upon checking the job tracker, the tasks, and all of its attempts (until the job was killed) have the same errors as follows:

java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:425)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
    at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
    ... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.openx.data.jsonserde.JsonSerDe
    at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:463)
    at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:479)
    at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:90)
    ... 22 more
Caused by: java.lang.ClassNotFoundException: org.openx.data.jsonserde.JsonSerDe
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
    at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:422)
    ... 24 more

The same query above works in Hive Beeswax.

I have been consistently successful in creating these types of queries in Hive Beeswax. The same query (using a different table name) above worked and has the following log:

13/10/26 07:51:30 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
13/10/26 07:51:30 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
13/10/26 07:51:30 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
13/10/26 07:51:30 INFO ql.Driver: <PERFLOG method=Driver.run>
13/10/26 07:51:30 INFO ql.Driver: <PERFLOG method=TimeToSubmit>
13/10/26 07:51:30 INFO ql.Driver: <PERFLOG method=compile>
13/10/26 07:51:30 INFO parse.ParseDriver: Parsing command: use default
13/10/26 07:51:30 INFO parse.ParseDriver: Parse Completed
13/10/26 07:51:30 INFO ql.Driver: Semantic Analysis Completed
13/10/26 07:51:30 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
13/10/26 07:51:30 INFO ql.Driver: </PERFLOG method=compile start=1382799090878 end=1382799090880 duration=2>
13/10/26 07:51:30 INFO ql.Driver: <PERFLOG method=Driver.execute>
13/10/26 07:51:30 INFO ql.Driver: Starting command: use default
13/10/26 07:51:30 INFO ql.Driver: </PERFLOG method=TimeToSubmit start=1382799090878 end=1382799090880 duration=2>
13/10/26 07:51:30 INFO ql.Driver: </PERFLOG method=Driver.execute start=1382799090880 end=1382799090924 duration=44>
OK
13/10/26 07:51:30 INFO ql.Driver: OK
13/10/26 07:51:30 INFO ql.Driver: <PERFLOG method=releaseLocks>
13/10/26 07:51:30 INFO ql.Driver: </PERFLOG method=releaseLocks start=1382799090924 end=1382799090924 duration=0>
13/10/26 07:51:30 INFO ql.Driver: </PERFLOG method=Driver.run start=1382799090878 end=1382799090924 duration=46>
13/10/26 07:51:30 INFO ql.Driver: <PERFLOG method=compile>
13/10/26 07:51:30 INFO parse.ParseDriver: Parsing command: CREATE TABLE tweets_id_sample_ui AS
   SELECT
      id
FROM tweets_sample
13/10/26 07:51:30 INFO parse.ParseDriver: Parse Completed
13/10/26 07:51:30 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
13/10/26 07:51:30 INFO parse.SemanticAnalyzer: Creating table tweets_id_sample_ui position=13
13/10/26 07:51:30 INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic Analysis
13/10/26 07:51:30 INFO parse.SemanticAnalyzer: Get metadata for source tables
13/10/26 07:51:31 INFO parse.SemanticAnalyzer: Get metadata for subqueries
13/10/26 07:51:31 INFO parse.SemanticAnalyzer: Get metadata for destination tables
13/10/26 07:51:31 INFO parse.SemanticAnalyzer: Completed getting MetaData in Semantic Analysis
13/10/26 07:51:31 INFO ppd.OpProcFactory: Processing for FS(286)
13/10/26 07:51:31 INFO ppd.OpProcFactory: Processing for SEL(285)
13/10/26 07:51:31 INFO ppd.OpProcFactory: Processing for TS(284)
13/10/26 07:51:31 INFO optimizer.GenMRFileSink1: using CombineHiveInputformat for the merge job
13/10/26 07:51:31 INFO physical.MetadataOnlyOptimizer: Looking for table scans where optimization is applicable
13/10/26 07:51:31 INFO physical.MetadataOnlyOptimizer: Found 0 metadata only table scans
13/10/26 07:51:31 INFO parse.SemanticAnalyzer: Completed plan generation
13/10/26 07:51:31 INFO ql.Driver: Semantic Analysis Completed
13/10/26 07:51:31 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:id, type:bigint, comment:null)], properties:null)
13/10/26 07:51:31 INFO ql.Driver: </PERFLOG method=compile start=1382799090924 end=1382799091259 duration=335>
13/10/26 07:51:31 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
13/10/26 07:51:31 INFO ql.Driver: <PERFLOG method=Driver.execute>
13/10/26 07:51:31 INFO ql.Driver: Starting command: CREATE TABLE tweets_id_sample_ui AS
   SELECT
      id
FROM tweets_sample
Total MapReduce jobs = 3
13/10/26 07:51:31 INFO ql.Driver: Total MapReduce jobs = 3
13/10/26 07:51:31 INFO ql.Driver: </PERFLOG method=TimeToSubmit end=1382799091337>
Launching Job 1 out of 3
13/10/26 07:51:31 INFO ql.Driver: Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
13/10/26 07:51:31 INFO exec.Task: Number of reduce tasks is set to 0 since there's no reduce operator
13/10/26 07:51:31 INFO exec.ExecDriver: Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
13/10/26 07:51:31 INFO exec.ExecDriver: adding libjars: file:///usr/lib//hcatalog/share/hcatalog/hcatalog-core.jar,file:///usr/lib/hive/lib/json-serde-1.1.4-jar-with-dependencies.jar
13/10/26 07:51:31 INFO exec.ExecDriver: Processing alias tweets_sample
13/10/26 07:51:31 INFO exec.ExecDriver: Adding input file hdfs://sandbox:8020/data/oct25_tweets
13/10/26 07:51:31 INFO exec.Utilities: Content Summary not cached for hdfs://sandbox:8020/data/oct25_tweets
13/10/26 07:51:35 INFO exec.ExecDriver: Making Temp Directory: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10002
13/10/26 07:51:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/10/26 07:51:35 INFO io.CombineHiveInputFormat: CombineHiveInputSplit creating pool for hdfs://sandbox:8020/data/oct25_tweets; using filter path hdfs://sandbox:8020/data/oct25_tweets
13/10/26 07:51:35 INFO mapred.FileInputFormat: Total input paths to process : 964
13/10/26 07:51:39 INFO io.CombineHiveInputFormat: number of splits 7
Starting Job = job_201310250853_0024, Tracking URL = http://sandbox:50030/jobdetails.jsp?jobid=job_201310250853_0024
13/10/26 07:51:39 INFO exec.Task: Starting Job = job_201310250853_0024, Tracking URL = http://sandbox:50030/jobdetails.jsp?jobid=job_201310250853_0024
Kill Command = /usr/lib/hadoop/libexec/../bin/hadoop job  -kill job_201310250853_0024
13/10/26 07:51:39 INFO exec.Task: Kill Command = /usr/lib/hadoop/libexec/../bin/hadoop job  -kill job_201310250853_0024
Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 0
13/10/26 07:51:48 INFO exec.Task: Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 0
2013-10-26 07:51:48,788 Stage-1 map = 0%,  reduce = 0%
13/10/26 07:51:48 INFO exec.Task: 2013-10-26 07:51:48,788 Stage-1 map = 0%,  reduce = 0%
2013-10-26 07:52:00,853 Stage-1 map = 1%,  reduce = 0%
13/10/26 07:52:00 INFO exec.Task: 2013-10-26 07:52:00,853 Stage-1 map = 1%,  reduce = 0%
2013-10-26 07:52:02,037 Stage-1 map = 2%,  reduce = 0%
13/10/26 07:52:02 INFO exec.Task: 2013-10-26 07:52:02,037 Stage-1 map = 2%,  reduce = 0%
2013-10-26 07:52:04,048 Stage-1 map = 3%,  reduce = 0%
13/10/26 07:52:04 INFO exec.Task: 2013-10-26 07:52:04,048 Stage-1 map = 3%,  reduce = 0%
...
2013-10-26 07:54:30,400 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 141.58 sec
13/10/26 07:54:30 INFO exec.Task: 2013-10-26 07:54:30,400 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 141.58 sec
MapReduce Total cumulative CPU time: 2 minutes 21 seconds 580 msec
13/10/26 07:54:30 INFO exec.Task: MapReduce Total cumulative CPU time: 2 minutes 21 seconds 580 msec
Ended Job = job_201310250853_0024
13/10/26 07:54:30 INFO exec.Task: Ended Job = job_201310250853_0024
13/10/26 07:54:30 INFO exec.FileSinkOperator: Moving tmp dir: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/_tmp.-ext-10002 to: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/_tmp.-ext-10002.intermediate
13/10/26 07:54:30 INFO exec.FileSinkOperator: Moving tmp dir: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/_tmp.-ext-10002.intermediate to: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10002
Stage-4 is filtered out by condition resolver.
13/10/26 07:54:30 INFO exec.Task: Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
13/10/26 07:54:30 INFO exec.Task: Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
13/10/26 07:54:30 INFO exec.Task: Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
13/10/26 07:54:30 INFO ql.Driver: Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
13/10/26 07:54:30 INFO exec.Task: Number of reduce tasks is set to 0 since there's no reduce operator
13/10/26 07:54:30 INFO exec.ExecDriver: Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
13/10/26 07:54:30 INFO exec.ExecDriver: adding libjars: file:///usr/lib//hcatalog/share/hcatalog/hcatalog-core.jar,file:///usr/lib/hive/lib/json-serde-1.1.4-jar-with-dependencies.jar
13/10/26 07:54:30 INFO exec.ExecDriver: Processing alias hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10002
13/10/26 07:54:30 INFO exec.ExecDriver: Adding input file hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10002
13/10/26 07:54:30 INFO exec.Utilities: Content Summary not cached for hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10002
13/10/26 07:54:30 INFO exec.ExecDriver: Making Temp Directory: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10001
13/10/26 07:54:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/10/26 07:54:30 INFO mapred.FileInputFormat: Total input paths to process : 7
13/10/26 07:54:30 INFO io.CombineHiveInputFormat: number of splits 1
Starting Job = job_201310250853_0025, Tracking URL = http://sandbox:50030/jobdetails.jsp?jobid=job_201310250853_0025
13/10/26 07:54:31 INFO exec.Task: Starting Job = job_201310250853_0025, Tracking URL = http://sandbox:50030/jobdetails.jsp?jobid=job_201310250853_0025
Kill Command = /usr/lib/hadoop/libexec/../bin/hadoop job  -kill job_201310250853_0025
13/10/26 07:54:31 INFO exec.Task: Kill Command = /usr/lib/hadoop/libexec/../bin/hadoop job  -kill job_201310250853_0025
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
13/10/26 07:54:39 INFO exec.Task: Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2013-10-26 07:54:39,392 Stage-3 map = 0%,  reduce = 0%
13/10/26 07:54:39 INFO exec.Task: 2013-10-26 07:54:39,392 Stage-3 map = 0%,  reduce = 0%
2013-10-26 07:54:48,505 Stage-3 map = 87%,  reduce = 0%
13/10/26 07:54:48 INFO exec.Task: 2013-10-26 07:54:48,505 Stage-3 map = 87%,  reduce = 0%
2013-10-26 07:54:49,510 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 6.95 sec
13/10/26 07:54:49 INFO exec.Task: 2013-10-26 07:54:49,510 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 6.95 sec
2013-10-26 07:54:50,517 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 6.95 sec
13/10/26 07:54:50 INFO exec.Task: 2013-10-26 07:54:50,517 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 6.95 sec
2013-10-26 07:54:51,525 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
13/10/26 07:54:51 INFO exec.Task: 2013-10-26 07:54:51,525 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
MapReduce Total cumulative CPU time: 6 seconds 950 msec
13/10/26 07:54:51 INFO exec.Task: MapReduce Total cumulative CPU time: 6 seconds 950 msec
Ended Job = job_201310250853_0025
13/10/26 07:54:51 INFO exec.Task: Ended Job = job_201310250853_0025
13/10/26 07:54:51 INFO exec.FileSinkOperator: Moving tmp dir: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/_tmp.-ext-10001 to: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/_tmp.-ext-10001.intermediate
13/10/26 07:54:51 INFO exec.FileSinkOperator: Moving tmp dir: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/_tmp.-ext-10001.intermediate to: hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10001
Moving data to: hdfs://sandbox:8020/apps/hive/warehouse/tweets_id_sample_ui
13/10/26 07:54:51 INFO exec.Task: Moving data to: hdfs://sandbox:8020/apps/hive/warehouse/tweets_id_sample_ui from hdfs://sandbox:8020/tmp/hive-beeswax-hue/hive_2013-10-26_07-51-30_924_8805518057234020615/-ext-10001
13/10/26 07:54:51 INFO exec.DDLTask: Default to LazySimpleSerDe for table tweets_id_sample_ui
13/10/26 07:54:51 INFO hive.metastore: Trying to connect to metastore with URI thrift://sandbox:9083
13/10/26 07:54:51 INFO hive.metastore: Waiting 1 seconds before next connection attempt.
13/10/26 07:54:52 INFO hive.metastore: Connected to metastore.
13/10/26 07:54:53 INFO exec.StatsTask: Executing stats task
Table default.tweets_id_sample_ui stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 10972500, raw_data_size: 0]
13/10/26 07:54:54 INFO exec.Task: Table default.tweets_id_sample_ui stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 10972500, raw_data_size: 0]
13/10/26 07:54:54 INFO ql.Driver: </PERFLOG method=Driver.execute start=1382799091328 end=1382799294689 duration=203361>
MapReduce Jobs Launched: 
13/10/26 07:54:54 INFO ql.Driver: MapReduce Jobs Launched: 
Job 0: Map: 7   Cumulative CPU: 141.58 sec   HDFS Read: 1762842930 HDFS Write: 10972500 SUCCESS
13/10/26 07:54:54 INFO ql.Driver: Job 0: Map: 7   Cumulative CPU: 141.58 sec   HDFS Read: 1762842930 HDFS Write: 10972500 SUCCESS
Job 1: Map: 1   Cumulative CPU: 6.95 sec   HDFS Read: 10973519 HDFS Write: 10972500 SUCCESS
13/10/26 07:54:54 INFO ql.Driver: Job 1: Map: 1   Cumulative CPU: 6.95 sec   HDFS Read: 10973519 HDFS Write: 10972500 SUCCESS
Total MapReduce CPU Time Spent: 2 minutes 28 seconds 530 msec
13/10/26 07:54:54 INFO ql.Driver: Total MapReduce CPU Time Spent: 2 minutes 28 seconds 530 msec
OK
13/10/26 07:54:54 INFO ql.Driver: OK
13/10/26 07:54:56 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
13/10/26 07:54:56 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.

The following are instances that work with my Hive CLI:

  • The query above also works if a view is created instead of a table.
  • Empty tables can be created
  • Tables from HDFS files can be created (e.g. the tweets_sample table found from the first code block was created from HDFS files

Here is the query executed via Hive CLI for tweets_sample:

CREATE EXTERNAL TABLE tweets_sample (
   id BIGINT,
   created_at STRING,
   source STRING,
   favorited BOOLEAN,
   retweet_count INT,
   retweeted_status STRUCT<
      text:STRING,
      user:STRUCT<screen_name:STRING,name:STRING>>,
   entities STRUCT<
      urls:ARRAY<STRUCT<expanded_url:STRING>>,
      user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
      hashtags:ARRAY<STRUCT<text:STRING>>>,
   text STRING,
   user STRUCT<
      screen_name:STRING,
      name:STRING,
      friends_count:INT,
      followers_count:INT,
      statuses_count:INT,
      verified:BOOLEAN,
      utc_offset:STRING, -- was INT but nulls are strings
      time_zone:STRING>,
   in_reply_to_screen_name STRING,
   year int,
   month int,
   day int,
   hour int
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/data/oct25_tweets'
;

Currently, I'm stuck on how to fix this issue.

Other notes:

The environment that I'm working on is as follows:

  • Hortonworks Sandbox v1.3 on Oracle VM VirtualBox
  • I was working on Hortonworks Tutorial #13
  • Hive Beeswax queries are executed via Hue UI from user 'hue'
  • Hive CLI queries are executed from user 'root' (and also tested from user 'hue')
Was it helpful?

Solution

Solution:

This can be solved by configuring Hive to add a jar via Hive CLI to its class path as follows:

hive> ADD JAR [path to JSON SerDe jar file];

For example:

hive> ADD JAR /usr/lib/hive/lib/json-serde-1.1.4-jar-with-dependencies.jar;

Hive will confirm the addition by returning the following statement:

Added /usr/lib/hive/lib/json-serde-1.1.4-jar-with-dependencies.jar to class path
Added resource: /usr/lib/hive/lib/json-serde-1.1.4-jar-with-dependencies.jar

The above has to be executed at the start of every Hive session.

Explanation:

The query presented by the original question produces errors because of the select-from clause. If the following query was submitted to Hive CLI, the same error will be experienced:

SELECT
   id
FROM tweets_sample;

The source table tweets_sample has its rows stored in JSON SerDe format. This can be seen from the query that generated tweets_sample at the end of the question:

CREATE EXTERNAL TABLE tweets_sample (
   id BIGINT,
   ...
   hour int
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/data/oct25_tweets';

By default, Hive does not know how to parse or extract columns in this format. One would notice that the following query will still actually work even before adding the JSON SerDe jar file:

SELECT *
FROM tweets_sample;

This query works because Hive does not need to extract elements from a particular column within a row and, thus, does not need to know what the format of a row is.

By specifying the JSON SerDe jar file before executing any JSON SerDe-format-dependent queries as presented in the solution above, Hive will know how to execute such queries.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top