Pergunta

I am trying to figure out informative data patterns from large volume transactional data.

Typically my data is set of records with well defined columns (like sender, receiver, amount, currency address etc - I have around 40-50 different columns), data volume will be multi million (may be 100s of millions) records and my aim is to generate informative transactional patterns from this like - who is purchasing particular item the most, highest volume transaction recipients, expense patterns, who is getting more transactions from same another sender etc.

Earlier I was planning to load data in relational database (Oracle/MySQL) and write complex SQLs to fetch this information but by looking at volume during my proof of concept, it doesn't seem to be much scalable.

I was trying to get more information on distributive data processing using Hadoop etc. I just started reading Hadoop, up to my initial understanding Hadoop is well suited for unstrcutured data processing and might not be much useful for relational data processing.

Any pointers/suggestions on open source technology which I can quickly experiment with.

Foi útil?

Solução

Hadoop can be used for structured/unstructured data processing. Also, it's not a database to maintain relationships, indexes like a traditional RDBMS.

With millions of rows HBase or Cassandra coupled with/without Hive can be used for batch querying. Batch querying in Hadoop had been there for some time and is mature.

For interactive querying Drill or Imapala can be used. Note that Drill development has just started and is in incubator stage. While, Imapala has been just announced by Cloudera. Here is some interesting info for real time engines.

Note that there are lot of other open source frameworks which might fit the requirements, but only a couple of them are mentioned here. Based on detailed requirement analysis and the pros and cons of the different frameworks, the appropriate framework has to be chosen.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top