Neo4j Loading big data: Data Structures, Matrix vs Json

Question

Please take a look at one of our new features in the latest milestone of Neo4j, Cypher's LOAD CSV clause.

http://docs.neo4j.org/chunked/milestone/import-importing-data-from-a-single-csv-file.html

Generate a CSV file from the document you are analyzing that contains each unique word and its frequency. Push that CSV file to a location that can be accessed by HTTP GET from the Neo4j database server.

That Cypher query will look like this:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
    "http://localhost:8888/csv/docid-ABCD-0000001.csv"
AS csvLine
MERGE (doc:Document { id: csvLine.document_id })
MERGE (word:Word { word: csvLine.word })
MERGE (doc)-[:HAS_WORD { weight: csvLine.word_frequency }]->(word)

This query gets or creates the document node, word nodes, and then connects the two and qualifies the relationship on word frequency for each word in the document.

The header of the CSV file would be: document_id, word, word_frequency

Note: You must download the latest milestone of Neo4j (2.1.0-M01) to use LOAD CSV as of the time I'm posting this. It's not advised to use milestones for production applications.