Domanda

I have 100M datasets in Oracle and try to import all these datasets into Neo4j with Talend. My question is, since the 100M datasets is updating everyday, how can I make sure Talend will only import datasets which are not already existed in the neo4j database? In other words, talend will only import the updated datasets.

For example, suppose Neo4j contains 38890, 38891, 38892 right now. In Oracle, the updated datasets are 38890,38891, 38892, 38893. The expected result is 38893 will be the imported only.

The datasets is very large, it seems not very efficienct to import these datasets to Neo4j everyday and delete the duplicate. Could anyone help me out please? Thanks in advance.

È stato utile?

Soluzione

You should to do 2 loads, 1 for the initial FULL Load, just like you do it now and another one for the daily incremental loads.

Check your primary keys and find a way to make a SELECT query which will return your new/modified rows. You need another query which will show you which rows had been deleted / modified as you need to remove these rows before adding the new/modified rows into your db.

To run this automatically you need to right click on your job and select "export Job" It will build your job into a JAVA JAR file. With a .sh and .bat launcher. You can then use the windows scheduler to execute this daily, or use CRON to execute it daily if you happen to have a linux server.

Altri suggerimenti

You certainly have an updated timestamp on your tables in oracle, so I would use that to filter out the data that was only updated since the last import, which would be much less data, e.g. 1-5M rows.

For those entries you can have a unique constraint and then use cypher with the MERGE on the entries which is a get-or-create.

Make sure to use parameters for updating the data, against the embedded or server APIs

FOREACH (p in {people} |
   MERGE (person:Person {name:{p.name}})
   ON CREATE SET person.age = p.age, ...
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top