Question

I have a Neo4J-enterprise database running on a DigitalOcean VPS with 8Gb RAM and 80Gb SSD. The performance of the Neo4J instance is awful at the moment:

match (n) where n.gram='0gram' AND n.word=~'a.' return n.word LIMIT 5 @ 349ms 
match (n) where n.gram='0gram' AND n.word=~'a.*' return n.word LIMIT 25 @ 1588ms

I understand regex are expensive, but on likewise queries where I replace the 'a.' or 'a.*' part with any other letter, Neo4j simply crashes. I can see a huge build-up in memory before that (towards 90%), and the CPU sky-rocketing.

My Neo4j is populated as follows:

Number Of Relationship Type Ids In Use: 1,
Number Of Node Ids In Use: 172412046,
Number Of Relationship Ids In Use: 172219328,
Number Of Property Ids In Use: 344453742

The VPS only runs Neo4J (on debian 7/amd64). I use the NUMA+parallelGC flags as they're supposed to be faster. I've been tweaking my RAM settings, and although it doesn't crash at often now, I have a feeling there should be some gainings to be made

neostore.nodestore.db.mapped_memory=1024M
neostore.relationshipstore.db.mapped_memory=2048M
neostore.propertystore.db.mapped_memory=6144M
neostore.propertystore.db.strings.mapped_memory=512M
neostore.propertystore.db.arrays.mapped_memory=512M

# caching
cache_type=hpc
node_cache_array_fraction=7
relationship_cache_array_fraction=5
# node_cache_size=3G
# relationship_cache_size=1G  --> these throw a not-enough-heap-mem error

The data is essentially a series of tree, where on node0 only a full text search is needed, the following nodes are searched by a property with floating point values.

node0 -REL-> node0.1 -REL-> node0.1.1 ... node0.1.1.1.1
      \
       -REL-> node0.2 -REL-> node0.2.1 ... node0.2.1.1

There are aprox. 5.000 top-nodes like node0.

Should I reconfigure my memory/cache usage, or should I just add more RAM?

--- Edit on Indexes ---

Because all tree's of nodes al always 4-levels deep, each level has a label for quick finding.in this case all node0 nodes have a label (called 0gram). the n.gram='0gram' should use the index coupled to the label.

--- Edit on new Config ---

I upgraded the VPS to 16Gb. The nodeStore has 2.3Gb (11%), PropertyStore 13.8Gb (64%) and the relastionshipStore amounts to 5.6Gb (26%) on the SSD. On this basis I created a new config (detailed above). I'm waiting for the full set of queries and will do some additional testing in the mean time

Was it helpful?

Solution

Yes you need to create an index, what's your label called? Imagine it being called :NGram

create index on :NGram(gram);

match (n:NGram) where n.gram='0gram' AND n.word=~'a.' return n.word LIMIT 5

match (n:NGram) where n.gram='0gram' AND n.word=~'a.*' return n.word LIMIT 25

What you're doing is not a graph search but just a lookup via full scan + property comparison with a regexp. Not a very efficient operation. What you need is FullTextSearch (which is not supported with the new schema indexes but still with the legacy indexes).

Could you run this query (after you created the index) and say how many nodes it returns?

match (n:NGram) where n.gram='0gram' return count(*)

which is the equivalent to

match (n:NGram {gram:'0gram'}) return count(*)

I wrote a blog post about it a few days ago, please read it and see if it applies to your case.

How big is your Neo4j database on disk? What is the configured heap size? (in neo4j-wrapper.conf?)

As you can see you use more RAM than you machine has (not even counting OS or filesystem caches).

So you would have to reduce the mmio sizes, e.g. to 500M for nodes 2G for rels and 1G for properties.

Look at your store-file sizes and set mmio accordingly.

OTHER TIPS

Depending on the number of nodes having n.gram='0gram' you might benefit a lot from setting a label on them and index for the gram property. If you have this in place a index lookup will directly return all 0gram nodes and apply regex matching only on those. Your current statement will load each and every node from the db and inspect its properties.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top