Aggregating data while traversing from root node to all the nodes and back in OrientDB graph database

Question 1

This solution might get you closer to what you need, though I'm not sure it's an exact fit as it only let's you calculate the the average at a selected level of the hierarchy (i.e. at "A" or "A1"). Here's my Gremlin session:

gremlin> g = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> a = g.addVertex("a")  
==>v[a]
gremlin> a1 = g.addVertex("a1")
==>v[a1]
gremlin> a2 = g.addVertex("a2")
==>v[a2]
gremlin> a.addEdge('has',a1)    
==>e[0][a-has->a1]
gremlin> a.addEdge('has',a2)
==>e[1][a-has->a2]
gremlin> a1.addEdge('relationship',g.addVertex("a11",[points:20]))
==>e[2][a1-relationship->a11]
gremlin> a1.addEdge('relationship',g.addVertex("a12",[points:20]))
==>e[3][a1-relationship->a12]
gremlin> a2.addEdge('relationship',g.addVertex("a21",[points:100]))
==>e[4][a2-relationship->a21]
gremlin> a2.addEdge('relationship',g.addVertex("a22",[points:0]))  
==>e[5][a2-relationship->a22]
gremlin> p=g.v("a").out.loop(1){it.loops<10}{true}.path.filter{it.last().getProperty("points")!=null}.toList()               
==>[v[a], v[a2], v[a22]]
==>[v[a], v[a2], v[a21]]
==>[v[a], v[a1], v[a12]]
==>[v[a], v[a1], v[a11]]
gremlin> p.collect{[it, it.last().getProperty("points")]}._().groupBy{it[0][0]}{it[1]}{it.sum()/it.size()}.cap.next()
==>v[a]=35
gremlin> p.collect{[it, it.last().getProperty("points")]}._().groupBy{it[0][1]}{it[1]}{it.sum()/it.size()}.cap.next()
==>v[a1]=20
==>v[a2]=50

So, this line gets us the paths that matter (i.e. those that end with a leaf node that has points:

p=g.v("a").out.loop(1){it.loops<10}{true}.path.filter{it.last().getProperty("points")!=null}.toList()

I store those in p for later usage. Note that this will explore the tree to a depth of 10 as controlled by it.loops<10. From there it's pretty simple to use p to calculate the averages. Here's the example for calculating it for A:

p.collect{[it, it.last().getProperty("points")]}._().groupBy{it[0][0]}{it[1]}{it.sum()/it.size()}.cap.next()

The above basically says, for each path, convert that to a new List where the first item is the path and the second item is the points at the leaf node. Convert that list into a pipeline with the identity function and group on the first item in the path (identified by it[0][0]) and grab the points value (the second closure to groupBy) for that path. The third closure to groupBy is a reducer function which sums the points and calculates the average from that.

Another alternative, if you only need to calculate the average for a single vertex would be with this approach:

gremlin> g.v("a").out.loop(1){it.loops<10}{true}.path{it.points}.filter{it.last()!=null}
                 .transform{it.last()}.gather.transform{it.sum()/it.size()}
==>35

Note that the traversal is largely the same at the start, but uses a closure when grabbing the path. That closure converts the vertex to the value of the points property (note that using it.getProperty("points") is much more efficient than it.points). From there I again filter out paths that have a null value the last item in the path (i.e. as leaf nodes are the only ones with a points property this should leave us with paths that end with leaves). I then transform those paths to grab the points, gather them to a list and convert the list to an average points for "A".

Question 2

You could start by looking at Traverse statement and store average points: - into parent vertices or - into the run-time context

By writing few lines of Java (or Javascript) would be easier using the Java Traverse.

Example to traverse all the nodes in "out" starting from node A assuming A is the name property (I suggest to index the "name" property for faster retrieval):

traverse out('has_a') from (
  select from Node where name = 'A'
)

With this query you create a in-memory map with key = record and value = the average of the underlying node points:

select avg( out('has_a').points ) ) as total_points from (
  traverse out('has_a') from (
    select from Node where name = 'A'
  )
)

This works only if all the nodes under "A" already have points.