I probably haven't covered the part
B: ...in relation to each applicable term in Ts.
...but the rest should work as expected. I wrote a little helper function that accepts single terms as well as multiple terms:
tfidf = { g, terms, N ->
def closure = {
def paths = it.outE("occursIn").inV().path().toList()
def numPaths = paths.size()
[it.getProperty("term"), paths.collectEntries({
def title = it[2].getProperty("title")
def tf = it[1].getProperty("frequency")
def idf = Math.log10(N / numPaths)
[title, tf * idf]
})]
}
def single = terms instanceof String
def pipe = single ? g.V("term", terms) : g.V().has("term", T.in, terms)
def result = pipe.collect(closure).collectEntries()
single ? result[terms] : result
}
Then I took the Wikipedia example to test it:
g = new TinkerGraph()
g.createKeyIndex("type", Vertex.class)
g.createKeyIndex("term", Vertex.class)
t1 = g.addVertex(["type":"term","term":"this"])
t2 = g.addVertex(["type":"term","term":"is"])
t3 = g.addVertex(["type":"term","term":"a"])
t4 = g.addVertex(["type":"term","term":"sample"])
t5 = g.addVertex(["type":"term","term":"another"])
t6 = g.addVertex(["type":"term","term":"example"])
d1 = g.addVertex(["type":"document","title":"Document 1"])
d2 = g.addVertex(["type":"document","title":"Document 2"])
t1.addEdge("occursIn", d1, ["frequency":1])
t1.addEdge("occursIn", d2, ["frequency":1])
t2.addEdge("occursIn", d1, ["frequency":1])
t2.addEdge("occursIn", d2, ["frequency":1])
t3.addEdge("occursIn", d1, ["frequency":2])
t4.addEdge("occursIn", d1, ["frequency":1])
t5.addEdge("occursIn", d2, ["frequency":2])
t6.addEdge("occursIn", d2, ["frequency":3])
N = g.V("type","document").count()
tfidf(g, "this", N)
tfidf(g, "example", N)
tfidf(g, ["this", "example"], N)
Output:
gremlin> tfidf(g, "this", N)
==>Document 1=0.0
==>Document 2=0.0
gremlin> tfidf(g, "example", N)
==>Document 2=0.9030899869919435
gremlin> tfidf(g, ["this", "example"], N)
==>this={Document 1=0.0, Document 2=0.0}
==>example={Document 2=0.9030899869919435}
I hope this already helps.
Cheers, Daniel