Split Neo4j Cypher query into smaller queries
Question
So I'm trying to extract some data from my Neo4j database to a file using R
This is what the code looks like :
library('bitops')
library('RCurl')
library('RJSONIO')
query <- function(querystring) {
h = basicTextGatherer()
curlPerform(url="localhost:7474/db/data/cypher",
postfields=paste('query',curlEscape(querystring), sep='='),
writefunction = h$update,
verbose = FALSE
)
result <- fromJSON(h$value())
#print(result)
data <- data.frame(t(sapply(result$data, unlist)))
print(data)
names(data) <- result$columns
data
}
q <-"MATCH (n:`layer_1_SB`)-[r]-> (m) WHERE m:layer_1_SB RETURN n.userid, m.userid LIMIT 18000000"
data <- query(q)
head(data)
dim(data)
names(data)
write.table(data, file = "/home/dataminer/data1.dat", append=FALSE,quote=FALSE,sep=" ",eol="\n", na="NA", dec=".", row.names=FALSE)
And it works fine, returning around 147k relationships. However when I make the same quer between two different labels (layer_1 to layer_2) which should return around 18million relationships, the program loads for a while and then returns NULL. When doing the same query and returning the count on the Neo4j browser it works, so I'm assuming the problem has to do with the amount of data that R can handle.
The question is: How can I split my query into smaller queries so that my code works?
UPDATE
I tried doing a query with 10million rels and it worked. So now I want to useWITH
and ORDER BY
to return the first then the last relationships. However it's returning NULL, I believe my query is badly formatted:
MATCH (n:'layer_1_SB')-[r]-> (m) WITH n ORDER BY n.userid DESC WHERE m:layer_2_SB RETURN n.userid, m.userid LIMIT 8000000
Solution
You should use the transactional endpoint instead or at least pass the header X-Stream:true
.
Both stream data from the server so it doesn't eat up its memory.