Question

I'm running Titan 0.4.0 and am trying to use the latest REGEX operator for the ES string search.

I've created an index on my_key for my ES index named search.

gremlin> g.makeKey("my_key").dataType(String.class).indexed("search",Vertex.class).single().make()
==>v[82]

Then I add a vertex:

gremlin> v = g.addVertex(null, ["my_key":"123-abc"])
==>v[8]
gremlin> v.map
==>{my_key=123-abc}

The REGEX seems to work...

gremlin> g.query().has("my_key", REGEX, "[12]{2}3").vertices()
==>v[8]

...but only on my tokenized "123" and "abc" independently:

gremlin> g.query().has("my_key", REGEX, "123").vertices()
==>v[8]
gremlin> g.query().has("my_key", REGEX, "abc").vertices()
==>v[8]

However, if I attempt to run a regular expression that matches my full value, my vertex is not retrieved (none of the below return results):

gremlin> g.query().has("my_key", REGEX, "123-abc").vertices()
gremlin> g.query().has("my_key", REGEX, "123.abc").vertices()
gremlin> g.query().has("my_key", REGEX, "[0-9]+.[abc]{3}").vertices()
gremlin> g.query().has("my_key", REGEX, "123.").vertices()

Is there a way in Titan to query the index in this way (regex w/o tokenized/analyzed terms)?

Was it helpful?

Solution

The way this was handled in Titan up until 0.4.0 can be a little bit confusing, because strings are always tokenized when they are indexed in an external indexing backend. This leads to strings being "chunked" into words an non-letter characters (as well as stop words) being ignored.

In the upcoming Titan 0.4.1 release we are making this more explicit. Have a look at the updated documentation: https://github.com/thinkaurelius/titan/wiki/Full-Text-and-String-Search

The gist: You can now specify whether you want your strings indexed "as-is" or as a bag of words after analysis. For your use case, it would be the former. We also straightened out the terminology: If you are looking for words in a string matching a regular expression, the predicate Text.CONTAINS_REGEX is used. If you want the entire string to match an expression, use Text.REGEX.

Titan 0.4.1 is currently in final preview and will be released next week.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top