TF-IDF algorithm in gremlin

Question

I probably haven't covered the part

B: ...in relation to each applicable term in Ts.

...but the rest should work as expected. I wrote a little helper function that accepts single terms as well as multiple terms:

tfidf = { g, terms, N ->
  def closure = {
    def paths = it.outE("occursIn").inV().path().toList()
    def numPaths = paths.size()
    [it.getProperty("term"), paths.collectEntries({
      def title = it[2].getProperty("title")
      def tf = it[1].getProperty("frequency")
      def idf = Math.log10(N / numPaths)
      [title, tf * idf]
    })]
  }
  def single = terms instanceof String
  def pipe = single ? g.V("term", terms) : g.V().has("term", T.in, terms)
  def result = pipe.collect(closure).collectEntries()
  single ? result[terms] : result
}

Then I took the Wikipedia example to test it:

g = new TinkerGraph()

g.createKeyIndex("type", Vertex.class)
g.createKeyIndex("term", Vertex.class)

t1 = g.addVertex(["type":"term","term":"this"])
t2 = g.addVertex(["type":"term","term":"is"])
t3 = g.addVertex(["type":"term","term":"a"])
t4 = g.addVertex(["type":"term","term":"sample"])
t5 = g.addVertex(["type":"term","term":"another"])
t6 = g.addVertex(["type":"term","term":"example"])

d1 = g.addVertex(["type":"document","title":"Document 1"])
d2 = g.addVertex(["type":"document","title":"Document 2"])

t1.addEdge("occursIn", d1, ["frequency":1])
t1.addEdge("occursIn", d2, ["frequency":1])
t2.addEdge("occursIn", d1, ["frequency":1])
t2.addEdge("occursIn", d2, ["frequency":1])
t3.addEdge("occursIn", d1, ["frequency":2])
t4.addEdge("occursIn", d1, ["frequency":1])
t5.addEdge("occursIn", d2, ["frequency":2])
t6.addEdge("occursIn", d2, ["frequency":3])

N = g.V("type","document").count()

tfidf(g, "this", N)
tfidf(g, "example", N)
tfidf(g, ["this", "example"], N)

Output:

gremlin> tfidf(g, "this", N)
==>Document 1=0.0
==>Document 2=0.0
gremlin> tfidf(g, "example", N)
==>Document 2=0.9030899869919435
gremlin> tfidf(g, ["this", "example"], N)
==>this={Document 1=0.0, Document 2=0.0}
==>example={Document 2=0.9030899869919435}

I hope this already helps.

Cheers, Daniel