Pergunta

I would like to save data produced by map reducer in hdinsight in a format I can easy report upon. Ideally table structure (Azure table storage). Having done some research, it looks like HDInsight service can only work with Azure Storage Vault (ASV) (both reading and writing). Is that correct?

I would prefer to implement hdinsight mapper/reducer in C#.

I don't know much about hive or pig, and wonder if there is a functionality that will allow to persist results of reducer in external (azure table) data storage other than ASV?

Foi útil?

Solução

Currently the default storage backing HDInsight is ASV. You can also store data on the 'local' HDFS filesystem on your HDInsight cluster. However, this means keeping the cluster running permanently, and limits you to the storage on your compute nodes. This can get very expensive.

One solution might be sqoop the results out into something like SQL server (or SQL Azure) depending on size and what you plan to do with them.

Alternatively, I am currently working on a connector between Hive and Azure Tables, which currently allows you to read from Azure Tables into Hive (by way of an external table) but will shortly be getting write support as well.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top