Вопрос

I have a file up in s3 that is zipped. I would like to insert it into a RedShift database. The only way my research has found to do this is by launching an ec2 instance. Moving the file there, unzipping it, and sending it back to S3. Then to insert it into my RedShift table. But I am trying to do this all from JavaSDK from an outside machine and do not want to have to use an Ec2 instance. Is there a way to just have an EMR job unzip the file? Or insert the zipped file directly into RedShift?

Files are .zip not .gzip

Это было полезно?

Решение

You cannot directly insert a zipped file into Redshift as per Guy's comment.

Assuming this is not a 1 time task, I would suggest using AWS Data Pipeline to perform this work. See this example of copy data between S3 buckets. Modify the example to unzip and then gzip your data instead of simply copying it.

Use the ShellCommandActivity to execute a shell script that performs the work. I would assume this script could invoke Java if you choose and appropriate AMI as your EC2 resource (YMMV).

Data Pipeline is highly efficient for this type of work because it will start and terminate the EC2 resource automatically plus you do not have to worry about discovering the name of the new instance in your scripts.

Другие советы

add gzip option, please refer: http://docs.aws.amazon.com/redshift/latest/dg/c_loading-encrypted-files.html we can use Java client to execute SQL

if your file is gzip then try below command

copy mutable from 's3://abc/def/yourfilename.gz' CREDENTIALS 'aws_access_key_id=xxxxx;aws_secret_access_key=yyyyyy' delimiter ',' gzip

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top