Question

Is there a way to replicate data from SQL Server to Hadoop similar to native transaction replication between two SQL Server Databases ?

I am not sure if Microsoft has devised such mechanism wherein the incremental data can be replicated from SQL Server to HAdoop at real time from SQL Server transaction logs.

Any response will be appreciated.

Was it helpful?

Solution

Same thing i am trying to do with CDC. You can try Teland native CDC approach.

You can download Hortonworks – Talend sandbox from http://www.talend.com/talend-big-data-sandbox

OTHER TIPS

I don't know of a feature similar to what you're looking for but there are a few things you should consider:

  1. If you're using plain Hadoop (HDFS+M/R) you should copy big chunks of data (64mb/128mb/256mb - generally speaking, the size of your HDFS blocks).

  2. If you want realtime data insertion into Hadoop you should consider hbase (and that complicates things both on the IT level and the programming level).

  3. In addition to data insertion, do you also want to replicate changes to data (i.e. update, delete)? If so, your only option would be hbase.

  4. I would try to use CDC + code (either in CLR stored procedures or in SSIS) to implement such a mechanism.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top