문제

I'm copying .csv files into s3 bucket and i need to join them like in RDB. Is it possible to do this? I hope for your great minds. =)

도움이 되었습니까?

해결책

You can do this using AWS Data pipeline and EMR.

EMR supports CSV (and TSV) as types (means, it will understand the files and has capability to consider this as a table with data rows).

You will keep these files in an S3 bucket and this bucket gets mounted as an HDFS (Hadoop Distributed File System) table. Once this has happened you can issue HIVE queries (which can be join as well) and do most of the things you need to.

I will point you to the doc from here on. You will need to spend some time to read and understand the entire setup, but once mastered it is very handy. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-s3tos3hivecsv.html

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top