Question

I need to backup my dynamoDB table data to S3 using amazon Data pipeline.

My question is- Can i use a single data pipeline to backup multiple dynamoDB tables to S3, or do I have to make a separate pipeline for each of them??

Also, since my tables have a year_month prefix( ex- 2014_3_tableName) , I was thinking of using datapipeline SDK to change the table name in pipeline definition once the month changes.Will this work? Is there an alternate/better way??

Thanks!!

Was it helpful?

Solution

If you are setting up your Data Pipeline through the DynamoDB Console's Import/Export button, you will have to create a separate pipeline per table. If you are using Data Pipeline directly (either through the Data Pipeline API or through the Data Pipeline console), you can export multiple tables in the same pipeline. For each table, simply add an additional DynamoDBDataNode, and an EmrActivity to link that Data Node to the output S3DataNode.

Regarding your year_month prefix use case, using the data pipeline sdk to change the table names periodically seems like the best approach. Another approach could be to make a copy of the script that export EmrActivity is running (you can see the script location under the "step" of the activity), and instead change the way that the hive script determines the table name by checking the current date. You would need to make a copy of this script and host the modified script in your own S3 bucket, and point the EmrActivity to that location instead of the default. I have not tried either approach before, but both are theoretically possible.

More general information about exporting DynamoDB tables can be found in the DynamoDB Developer Guide, and more detailed information can be found in the AWS Data Pipeline developer guide.

OTHER TIPS

Its a old question but I was looking for the answer in last days. When adding multiple DynamoDBDataNode, you can still use one single S3DataNode like output. Just differentiate folders in the S3 bucket through specifying different output.directoryPath in the EmrActivity Step field.

Like this: #{output.directoryPath}/newFolder

Every new folder will be automatically created in the s3 bucket.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top