Question

I am running DSE 3.2.4 with analytics enabled. I am attempting to unload one of my tables into S3 for long term storage. I have created the following table in hive:

CREATE EXTERNAL TABLE events_archive (
    event_id string,
    time string,
    type string,
    source string,
    value string
)
PARTITIONED BY (year string, month string, day string, hour string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://com.mydomain.events/';

I then try to use this query to load some sample data into it:

CREATE TEMPORARY FUNCTION c_to_string AS 'org.apache.hadoop.hive.cassandra.ql.udf.UDFCassandraBinaryToString';
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;


INSERT OVERWRITE TABLE events_archive
PARTITION (year, month, day, hour)
SELECT c_to_string(column4, 'uuid') AS event_id,
       from_unixtime(CAST(column3/1000 AS int)) AS time,
       CASE column1
         WHEN 'pageviews-push' THEN 'page_view'
         WHEN 'score_realtime-internal' THEN 'realtime_score'
         ELSE 'social_data'
       END AS type,
       CASE column1
         WHEN 'pageviews-push' THEN 'internal'
         WHEN 'score_realtime-internal' THEN 'internal'
         ELSE split(column1, '-')[0]
       END AS source,
       value,
       year(from_unixtime(CAST(column3/1000 AS int))) AS year,
       month(from_unixtime(CAST(column3/1000 AS int))) AS month,
       day(from_unixtime(CAST(column3/1000 AS int))) AS day,
       hour(from_unixtime(CAST(column3/1000 AS int))) AS hour,
       c_to_string(key2, 'blob') AS content_id
  FROM events
 WHERE column2 = 'data'
   AND value IS NOT NULL
   AND value != ''
LIMIT 10;

I end up getting this exception:

2014-02-11 20:23:55,810 ERROR ql.Driver (SessionState.java:printError(400)) - FAILED: Hive Internal Error: org.apache.hadoop.fs.s3.    S3Exception(org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <?xml version="1.0"     encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.226.118.113<    /BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC</HostId></Error>    )
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <    ?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.    226.118.113</BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC<    /HostId></Error>
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:156)
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:195)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy14.retrieveINode(Unknown Source)
    at org.apache.hadoop.fs.s3.S3FileSystem.mkdir(S3FileSystem.java:148)
    at org.apache.hadoop.fs.s3.S3FileSystem.mkdirs(S3FileSystem.java:141)
    at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)
    at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:165)
    at org.apache.hadoop.hive.ql.Context.getExternalScratchDir(Context.java:222)
    at org.apache.hadoop.hive.ql.Context.getExternalTmpFileURI(Context.java:315)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4049)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6205)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6136)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6762)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531)
    at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <?xml version="1.0"     encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.226.118.113<    /BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC</HostId></Error>
    at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:416)
    at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestGet(RestS3Service.java:752)
    at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1601)
    at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1544)
    at org.jets3t.service.S3Service.getObject(S3Service.java:2072)
    at org.jets3t.service.S3Service.getObject(S3Service.java:1310)
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:144)
... 33 more

Is the Hive S3 connector supported in the latest DSE? Or what may I be doing wrong?

Était-ce utile?

La solution

Try the following in your hive installation:

hive-site.xml

<property>
  <name>fs.default.name</name>
  <value>s3n://your-bucket</value>
</property>

core-site.xml

<property>
  <name>fs.s3n.awsAccessKeyId</name>
  <value>Your AWS Key</value>
</property>

<property>
  <name>fs.s3n.awsSecretAccessKey</name>
  <value>Your AWS Secret Key</value>
</property>

This is per the 3.1 docs: http://www.datastax.com/docs/datastax_enterprise3.1/solutions/about_hive .

Under:

Using an external file system in Hive

Didn't see it in the 3.2 docs. Not sure why they omitted it if they did, but looks like something essential for you to run Hive on S3

Autres conseils

The Hadoop implementation of S3 file system is out of date, so writing data to S3 from hive doesn't work well. We fix the issue with reading. Now DSE can read S3 files, but writing has issue. We will check it to see whether we could fix it soon

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top