Frage

Trying to find similar movies on the basis of tags. But I also need all the tags for the given movie and its each similar movie (to do some calculations). But surprisingly collect(h.w) gives repeated values of h.w (where w is a property of h)

Here is the cypher query. Please help.

MATCH (m:Movie{id:1})-[h1:Has]->(t:Tag)<-[h2:Has]-(sm:Movie),
(m)-[h:Has]->(t0:Tag), 
(sm)-[H:Has]->(t1:Tag) 
WHERE m <> sm 
RETURN distinct(sm), collect(h.w)

Basically a query like

MATCH (x)-[h]->(y), (a)-[H]->(b) 
RETURN h

is returning each result for h n times where n is the number of results for H. Any way around this?

War es hilfreich?

Lösung

I replicated the data model for this question to help answer it.

Example data model

I then setup a sample dataset using Neo4j's online console: http://console.neo4j.org/?id=dakmi3

Running the following query from your question:

MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag),
      (t)<-[h2:HAS_TAG]-(sm:Movie),
      (m)-[h:HAS_TAG]->(t0:Tag),
      (sm)-[H:HAS_TAG]->(t1:Tag)
WHERE m <> sm
RETURN DISTINCT sm, collect(h.weight)

Which results in:

(1:Movie {title:"The Matrix: Reloaded"}) [0.31, 0.12, 0.31, 0.12, 0.31, 0.01, 0.31, 0.01]

The issue is that there are duplicate relationships being returned, which results in duplicated weight in the collection. The solution is to use WITH to limit relationships to distinct records and then return the collection of weights of those relationships.

MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag),
      (t)<-[h2:HAS_TAG]-(sm:Movie),
      (m)-[h:HAS_TAG]->(t0:Tag),
      (sm)-[H:HAS_TAG]->(t1:Tag)
WHERE m <> sm
WITH DISTINCT sm, h
RETURN sm, collect(h.weight)

(1:Movie {title:"The Matrix: Reloaded"}) [0.31, 0.12, 0.01]

Andere Tipps

I'm afraid I still don't quite get your intention, but about the general question of duplicate results, that is just the way a disconnected pattern works. Cypher must consider something like

(:A), (:B)

as one pattern, not two. That means that any satisfying graph structure is considered a distinct match. Suppose you have the graph resulting from

CREATE (:A), (:B), (:B)

and query it for the pattern above, you get two results, namely

neo4j-sh (?)$ MATCH (a:A),(b:B) RETURN *;
==> +-------------------------------+
==> | a             | b             |
==> +-------------------------------+
==> | Node[15204]{} | Node[15207]{} |
==> | Node[15204]{} | Node[15208]{} |
==> +-------------------------------+
==> 2 rows
==> 53 ms

Similarly when matching your pattern (x)-[h]->(y), (a)-[H]->(b) cypher considers each combination of the two pattern parts to make up a unique match for the one whole pattern–so the results for h are compounded by the results for H.

This the way the pattern matching works. To achieve what you want you could first consider if you really need to query for a disconnected pattern. If you do, or if a connected pattern also generates redundant matches, then aggregate one or more of the pattern parts. A simple case might be

CREATE (a:A), (b1:B), (b2:B)
    , (c1:C), (c2:C), (c3:C)
    , a-[:X]->b1, a-[:X]->b2
    , a-[:Y]->c1, a-[:Y]->c2, a-[:Y]->c3

queried with

MATCH (b:B)<-[:X]-(a:A)-[:Y]->(c:C)              // with 1 (a), 2 (b) and 3 (c) you get 6 matched paths
RETURN a, collect (b) as bb, collect (c) as cc   // after aggregation by (a) there is one path

Sometimes it makes sense to do the aggregation as an intermediate step

MATCH (b)<-[:X]-(a:A)              // 2 paths
WITH a, collect(b) as bb           // 1 path
MATCH a-[:Y]->(c)                  // 3 paths
RETURN a, bb, collect(c) as cc     // 1 path
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top