I need to connect a node of Type "Application" with users created on the systems the application is installed on.
Normally a "Application" is installed on cluster pair (2 systems) and has only the default system as property "n.System".
A "Application" has a naming schema: <prefix><number of 5 digits>
, e.g. yxz12345 or ab23456, etc.
On each system are user accounts created (sometimes up to 100 on each system). Some of them have a naming schema like the "Application": <prefix><number of 5 digits>
, e.g. sdjhg12345 or tzrw23456, etc. some are not.
An "Application" as a "User" property that can contain the user its running on u.Name = n.User
OR it uses all "Users" that have the same 5 digits after its prefix right(u.Name, 5) = right(n.Name, 5)
.
Usernames are shared across all systems, so we only need to link the users that are on the same systems.
I'm using following query to create the relationship:
MATCH (n:Application {Id: 1})
WITH n
MATCH (s:System)-[:ClusteredWith]-(c:System)
WHERE s.Name = n.System
WITH n, s, c
MATCH (u:User)
WHERE
((u)-[:CreatedOn]->(s) OR (u)-[:CreatedOn]->(c))
AND
(u.Name = n.User OR right(u.Name, 5) = right(n.Name, 5))
CREATE UNIQUE (u)-[:UsedFor]->(n)
There're 8000 Systems, 100000 Users and 30000 Applications in the neo4j database currently.
I've auto property index on Id, Name, User
This query is extremly slow on a very powerful hardware (Up to 96 GB RAM, etc).
I'm using the Neo4jClient version 1.0.0.646 and Neo4j 2.0.1
How to get this query fast?
EDIT: Query Plan Added:
==> EmptyResult(_rows=0, _db_hits=0)
==> UpdateGraph(commands=[{"action": "CreateUnique", "identifiers": ["u", "n", " UNNAMED305"]}], _rows=0, _db_hits=0)
==> Eager(_rows=0, _db_hits=0)
==> Filter(pred="((nonEmpty(PathExpression((u)-[ UNNAMED165:CreatedOn]->(s), true)) OR nonEmpty(PathExpression((u)-[ UNNAMED196:CreatedOn]->(c), true))) AND (Property(u,Name(0)) == Property(n,User(33)) OR RightFunction(Property(u,Name(0)),Literal(5)) == RightFunction(Property(n,Name(0)),Literal(5))))", _rows=0, _db_hits=29774466)
==> NodeByLabel(identifier="u", _db_hits=0, _rows=4962411, label="User", identifiers=["u"], producer="NodeByLabel")
==> ColumnFilter(symKeys=["n", "c", "s", " UNNAMED58"], returnItemNames=["n", "s", "c"], _rows=183, _db_hits=0)
==> Filter(pred="(Property(s,Name(0)) == Property(n,System(36)) AND hasLabel(s:System(0)))", _rows=183, _db_hits=366)
==> SimplePatternMatcher(g="(c)-[' UNNAMED58']-(s)", _rows=183, _db_hits=4880)
==> NodeByLabel(identifier="c", _db_hits=0, _rows=2915, label="System", identifiers=["c"], producer="NodeByLabel")
==> Filter(pred="Property(n,Id(0)) == Literal(1)", _rows=1, _db_hits=702)
==> NodeByLabel(identifier="n", _db_hits=0, _rows=702, label="Application", identifiers=["n"], producer="NodeByLabel")
This is an query for an application on 2 systems but without a matching user (currently)