How to make CREATE UNIQUE work with a subquery?

https://stackoverflow.com//questions/20002740

20-12-2019
|

Question

I have a query like this:

MATCH left, right 
WHERE (ID(right) IN [1, 2, 3] AND ID(left) IN [4, 5, 6]) 
WITH left, right 
  LIMIT 1 
RETURN left, right 
UNION MATCH left, right 
WHERE (ID(right) IN [1, 2, 3] AND ID(left) IN [4, 5, 6]) 
WITH left, right 
  SKIP 4 LIMIT 1 
RETURN left, right 
UNION MATCH left, right 
WHERE (ID(right) IN [1, 2, 3] AND ID(left) IN [4, 5, 6]) 
WITH left, right 
  SKIP 8 LIMIT 1 
RETURN left, right
CREATE UNIQUE left-[rel:FRIEND]->right 
RETURN rel;

In general, I'm just creating a dataset so that I can use it later in CREATE UNIQUE instruction.
Obviously, that doesn't work - query analyzer says that I can only use RETURN clause once. My question is - how to compose a dataset in this case? I tried to assign an alias and use it in CREATE UNIQUE - can't get it to work either. What am I doing wrong? Is this scenario even possible?

Solution

I may misunderstand what you are after, but here's what occurs to me when I look at your query.

To begin with here's an adaptation of your query that uses SKIP and LIMIT without RETURN or UNION.

MATCH left, right
WHERE ID(left) IN [1,2,3] AND ID(right) IN [4,5,6]
WITH left, right 
    LIMIT 1
CREATE UNIQUE left-[rel:FRIEND]->right
WITH [rel] as rels  //If you want to return the relationship later you can put it in a collection and bring it WITH
MATCH left, right
WHERE ID(left) IN [1,2,3] AND ID(right) IN [4,5,6]
WITH left, right, rels 
    SKIP 4 LIMIT 1
CREATE UNIQUE left-[rel:FRIEND]->right
WITH rels + [rel] as rels
MATCH left, right
WHERE ID(left) IN [1,2,3] AND ID(right) IN [4,5,6]
WITH left, right, rels 
    SKIP 8 LIMIT 1
CREATE UNIQUE left-[rel:FRIEND]->right
WITH rels + [rel] as rels
RETURN LENGTH(rels), rels  // You can return the relationships here but SKIP/LIMIT does its job also if you don't return anything

But this query is a bit wild. It's really three queries, where two have been artificially squeezed in as sub queries of the first. It matches the same nodes anew in each sub query, and there really isn't anything gained by running the queries this way rather than separately (it's actually slower, because in each sub query you match also the nodes you know you will not use).

So my first suggestion is to use START instead of MATCH...WHERE when getting nodes by id. As it stands, the query binds every node in the database as "left", and then every node in the database as "right", and then it filters out all the nodes bound to "left" that don't fit the condition in the WHERE clause, and then the same for "right". Since this part of the query is repeated three times, all nodes in the database are bound a total of six times. That's expensive for creating three relationships. If you use START you can bind the nodes you want right away. This doesn't really answer your question, but it will be faster and the query will be cleaner. So, use START to get nodes by their internal id.

START left = node(1,2,3), right = node(4,5,6)

The second thing I think of is the difference between nodes and 'paths' or 'result items' when you match patterns. When you bind three nodes in "left" and three other nodes in "right", you don't have three result items, but nine. For each node bound in "left" you get three results, because there are three possible "right" to combine it with. If you wanted to relate every "left" to every "right", great. But I think what you are looking for are the result items (1),(4), (2),(5), (3),(6), and though it seems convenient to bind the three "left" nodes and the three "right" nodes in one query with collections of node ids, you then you have to do all that filtering to get rid of the 6 unwanted matches. The query gets complex and cumbersome, and its actually slower than running the queries separately. Another way to say this is to say that (1)-[:FRIEND]->(4) is a distinct pattern, not (relevantly) connected to the other patterns you are creating. It would be different if you wanted to create (1)-[:FRIEND]->(2)<-[:FRIEND]-(3), then you would want to handle those three nodes together. Maybe you're just exploring fringe uses of cypher, but I thought I should point it out. By the way, using SKIP and LIMIT in this way is a bit off key, they're not really intended for pattern matching and filtering. It's also unpredictable, unless you also use ORDER BY, since there is no guarantee that the results will be in a certain order. You don't know which result item it is that get's passed on. Anyway, in this case, I think it would be better to bind the nodes and create the relationship in three separate queries.

START left = node(1), right = node(4)
CREATE UNIQUE left-[rel:FRIEND]->right
RETURN rel

START left = node(2), right = node(5)
CREATE UNIQUE left-[rel:FRIEND]->right
RETURN rel

START left = node(3), right = node(6)
CREATE UNIQUE left-[rel:FRIEND]->right
RETURN rel

Since you already know that you want those three pairs, and not, say, (1),(4),(1),(5),(1),(6) it would make sense to query for just those pairs, and the easiest way is to query separately.

But thirdly, since the three queries are structurally identical, differing only in property value (if id is to be considered a property) you can simplify the query by generalizing or anonymizing that which distinguishes them, i.e. use parameters.

START left = node({leftId}), right = node({rightId})
CREATE UNIQUE left-[rel:FRIEND]->right
RETURN rel

parameters: {leftId:1, rightId:4}, {leftId:2, rightId:5}, {leftId:3, rightId:6}

Since the structure is identical, cypher can cache the execution plan. This makes for good performance, and the query is tidy, maintainable and can be easily extended if later you want to do the same operation on other pairs of nodes.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow