I would like to save data produced by map reducer in hdinsight in a format I can easy report upon. Ideally table structure (Azure table storage). Having done some research, it looks like HDInsight service can only work with Azure Storage Vault (ASV) (both reading and writing). Is that correct?

I would prefer to implement hdinsight mapper/reducer in C#.

I don't know much about hive or pig, and wonder if there is a functionality that will allow to persist results of reducer in external (azure table) data storage other than ASV?

有帮助吗?

解决方案

Currently the default storage backing HDInsight is ASV. You can also store data on the 'local' HDFS filesystem on your HDInsight cluster. However, this means keeping the cluster running permanently, and limits you to the storage on your compute nodes. This can get very expensive.

One solution might be sqoop the results out into something like SQL server (or SQL Azure) depending on size and what you plan to do with them.

Alternatively, I am currently working on a connector between Hive and Azure Tables, which currently allows you to read from Azure Tables into Hive (by way of an external table) but will shortly be getting write support as well.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top