Question

So I'm trying to extract some data from my Neo4j database to a file using R

This is what the code looks like :

library('bitops')
library('RCurl')
library('RJSONIO')

query <- function(querystring) {
  h = basicTextGatherer()
  curlPerform(url="localhost:7474/db/data/cypher",
    postfields=paste('query',curlEscape(querystring), sep='='),
    writefunction = h$update,
    verbose = FALSE
  )           
  result <- fromJSON(h$value())
  #print(result)
  data <- data.frame(t(sapply(result$data, unlist)))
  print(data)
  names(data) <- result$columns
  data

}

q <-"MATCH (n:`layer_1_SB`)-[r]-> (m) WHERE m:layer_1_SB RETURN n.userid, m.userid LIMIT 18000000"
 data <- query(q)
 head(data)
 dim(data)
 names(data)
 write.table(data, file = "/home/dataminer/data1.dat", append=FALSE,quote=FALSE,sep=" ",eol="\n", na="NA", dec=".", row.names=FALSE)

And it works fine, returning around 147k relationships. However when I make the same quer between two different labels (layer_1 to layer_2) which should return around 18million relationships, the program loads for a while and then returns NULL. When doing the same query and returning the count on the Neo4j browser it works, so I'm assuming the problem has to do with the amount of data that R can handle.

The question is: How can I split my query into smaller queries so that my code works?

UPDATE I tried doing a query with 10million rels and it worked. So now I want to useWITH and ORDER BY to return the first then the last relationships. However it's returning NULL, I believe my query is badly formatted:

MATCH (n:'layer_1_SB')-[r]-> (m) WITH n ORDER BY n.userid DESC WHERE m:layer_2_SB RETURN n.userid, m.userid LIMIT 8000000

Was it helpful?

Solution

You should use the transactional endpoint instead or at least pass the header X-Stream:true.

Both stream data from the server so it doesn't eat up its memory.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top