Question

I need to connect a node of Type "Application" with users created on the systems the application is installed on.

Normally a "Application" is installed on cluster pair (2 systems) and has only the default system as property "n.System".

A "Application" has a naming schema: <prefix><number of 5 digits>, e.g. yxz12345 or ab23456, etc.

On each system are user accounts created (sometimes up to 100 on each system). Some of them have a naming schema like the "Application": <prefix><number of 5 digits>, e.g. sdjhg12345 or tzrw23456, etc. some are not.

An "Application" as a "User" property that can contain the user its running on u.Name = n.User OR it uses all "Users" that have the same 5 digits after its prefix right(u.Name, 5) = right(n.Name, 5).

Usernames are shared across all systems, so we only need to link the users that are on the same systems.

I'm using following query to create the relationship:

MATCH (n:Application {Id: 1})
WITH n
MATCH (s:System)-[:ClusteredWith]-(c:System)
WHERE s.Name = n.System
WITH n, s, c
MATCH (u:User)
WHERE 
    ((u)-[:CreatedOn]->(s) OR (u)-[:CreatedOn]->(c))
AND
    (u.Name = n.User OR right(u.Name, 5) = right(n.Name, 5))

CREATE UNIQUE (u)-[:UsedFor]->(n)

There're 8000 Systems, 100000 Users and 30000 Applications in the neo4j database currently.

I've auto property index on Id, Name, User

This query is extremly slow on a very powerful hardware (Up to 96 GB RAM, etc). I'm using the Neo4jClient version 1.0.0.646 and Neo4j 2.0.1

How to get this query fast?

EDIT: Query Plan Added:

==> EmptyResult(_rows=0, _db_hits=0)
==> UpdateGraph(commands=[{"action": "CreateUnique", "identifiers": ["u", "n", "  UNNAMED305"]}], _rows=0, _db_hits=0)
==>   Eager(_rows=0, _db_hits=0)
==>     Filter(pred="((nonEmpty(PathExpression((u)-[  UNNAMED165:CreatedOn]->(s), true)) OR nonEmpty(PathExpression((u)-[  UNNAMED196:CreatedOn]->(c), true))) AND (Property(u,Name(0)) == Property(n,User(33)) OR RightFunction(Property(u,Name(0)),Literal(5)) == RightFunction(Property(n,Name(0)),Literal(5))))", _rows=0, _db_hits=29774466)
==>       NodeByLabel(identifier="u", _db_hits=0, _rows=4962411, label="User", identifiers=["u"], producer="NodeByLabel")
==>         ColumnFilter(symKeys=["n", "c", "s", "  UNNAMED58"], returnItemNames=["n", "s", "c"], _rows=183, _db_hits=0)
==>           Filter(pred="(Property(s,Name(0)) == Property(n,System(36)) AND hasLabel(s:System(0)))", _rows=183, _db_hits=366)
==>             SimplePatternMatcher(g="(c)-['  UNNAMED58']-(s)", _rows=183, _db_hits=4880)
==>               NodeByLabel(identifier="c", _db_hits=0, _rows=2915, label="System", identifiers=["c"], producer="NodeByLabel")
==>                 Filter(pred="Property(n,Id(0)) == Literal(1)", _rows=1, _db_hits=702)
==>                   NodeByLabel(identifier="n", _db_hits=0, _rows=702, label="Application", identifiers=["n"], producer="NodeByLabel")

This is an query for an application on 2 systems but without a matching user (currently)

Was it helpful?

Solution

Profile your query (how?) and review the execution plan with the following in mind:

  1. Is :Application(Id) unique?
  2. How many database hits are needed for the comparison :Application(System) = :System(Name)? Could they be avoided entirely by relating (:Application)-[:InstalledOn]->(:System)?
  3. Do you have duplicate paths/result items bound at WITH n, s, c?
    • ~ The pattern (s:System)-[:ClusteredWith]-(c:System) is symmetric
    • ~ Duplicates mean the rest of the query is executed multiple times
  4. How many users are matched? Is the index being used? It may not be since the WHERE clause is complex, possibly too complex for the engine to refactor to an exact lookup. Can you refactor it to allow an initial index lookup, or even better, reach the users via (:System)<-[:CreatedOn]-(:User) to avoid handling all the irrelevant users?

My guess is the big bottle neck is the user lookup. If you also have more than one result item at that point in the query, the problem is compounded. The following is guesswork, but maybe you can adapt it and get it to work.

MATCH (app:Application {Id: 1})
MATCH (:System {Name:app.System})-[:ClusteredWith*0..1]-(sys:System)<-[:CreatedOn]-(user)
WHERE  (user.Name = app.User OR right(user.Name, 5) = right(app.Name, 5))
CREATE UNIQUE (user)-[:UsedFor]->(app)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top