Script to take a S3 bucket, Compress it, push the compressed file to an SFTP server

https://stackoverflow.com/questions/22228985

10-06-2023
|

Question

I have a s3 bucket with about 100 gb of small files (in folders).

I have been requested to back this up to a local NAS on a weekly basis.

I have access to a an EC2 instance that is attached to the S3 storage.

My Nas allows me to run an sFTP server.

I also have access to a local server in which I can run a cron job to pull the backup if need be.

How can I best go about this? If possible i would like to only download the files that have been added or changed, or compress it on the server end and then push the compressed file to the SFtp on the Nas.

The end goal is to have a complete backup of the S3 bucket on my Nas with the lowest amount of transfer each week.

Any suggestions are welcome!

Thanks for your help!

Ryan

Solution

I think the most scalable method for you to achieve this is using AWS Elastic Map Reduce and Data pipeline.

The architecture is this way:

You will use Data pipeline to configure S3 as an input data node, then EC2 with pig/hive scripts to do the required processing to send the data to SFTP. Pig is extendable to have a custom UDF (user defined function) to send data to SFTP. Then you can setup this pipeline to run at a periodical interval. Having said this this, it requires quite some reading to achieve all these - But a good skill to achieve if you for see future data transformation needs.

Start reading from here:

http://aws.typepad.com/aws/2012/11/the-new-amazon-data-pipeline.html

Similar method can be used for Taking periodic backup of DynamoDB to S3, Reading files from FTP servers, processing and moving to say S3/RDS etc.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow