Fast way to count all vertices (with property x)

https://stackoverflow.com/questions/21476729

05-10-2022
|

Domanda

I'm using Titan with Cassandra and have several (related) questions about querying the database with Gremlin:

1.) Is there an faster way to count all vertices than

g.V.count()

Titan claims to use an index. But how can I use an index without property?

WARN  c.t.t.g.transaction.StandardTitanTx - Query requires iterating over all vertices [<>]. For better performance, use indexes

2.) Is there an faster way to count all vertices with property 'myProperty' than

g.V.has('myProperty').count()

Again Titan means following:

WARN  c.t.t.g.transaction.StandardTitanTx - Query requires iterating over all vertices [(myProperty<> null)]. For better performance, use indexes

But again, how can I do this? I already have an index of 'myProperty', but it needs a value to query fast.

3.) And the same questions with edges...

Soluzione

Iterating all vertices with g.V.count() is the only way to get the count. It can't be done "faster". If your graph is so large that it takes hours to get an answer or your query just never returns at all, you should consider using Faunus. However, even with Faunus you can expect to wait for your answer (such is the nature of Hadoop...no sub-second response here), but at least you will get one.

Any time you do a table scan (i.e. iterate all Vertices) you get that warning of "iterating over all vertices". Generally speaking, you don't want to do that, as you will never get a response. Adding an index won't help you count all vertices any faster.

Edges have the same answer. Use g.E.count() in Gremlin if you can. If it takes too long, then try Faunus so you can at least get an answer.

Altri suggerimenti

doing a count is expensive in big distributed graph databases. You can have a node that keeps track of many of the databases frequent aggregate numbers and update it from a cron job so you have it handy. Usually if you have millions of vertices having the count from the previous hour is not such disaster.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow