Frage

I'm looking for a Java/Scala library that can take an user query and a text and returns if there was a matching or not.

I'm processing a stream of information, ie: Twitter Stream, and can't afford to use a batching process, I need to evaluate each tweet in realtime, instead of index it through Lucene RAMDisk and querying it later.

It's possible create a parser/lexer using ANTLR but this is such common usage that I can't believe nobody create a lib before.

Some samples from TextQuery Ruby library that does exactly what I need:

    TextQuery.new("'to be' OR NOT 'to_be'").match?("to be")   # => true

    TextQuery.new("-test").match?("some string of text")      # => true
    TextQuery.new("NOT test").match?("some string of text")   # => true

    TextQuery.new("a AND b").match?("b a")                    # => true
    TextQuery.new("a AND b").match?("a c")                    # => false

    q = TextQuery.new("a AND (b AND NOT (c OR d))")
    q.match?("d a b")                                         # => false
    q.match?("b")                                             # => false
    q.match?("a b cdefg")                                     # => true

    TextQuery.new("a~").match?("adf")                         # => true
    TextQuery.new("~a").match?("dfa")                         # => true
    TextQuery.new("~a~").match?("daf")                        # => true
    TextQuery.new("2~a~1").match?("edaf")                     # => true
    TextQuery.new("2~a~2").match?("edaf")                     # => false

    TextQuery.new("a", :ignorecase => true).match?("A b cD")  # => true

Once it was implemented in Ruby it's not suitable for my platform, also I can't use JRuby just for this point on our solution:

I found a similar question but couldn't get answer from it: Boolean Query / Expression to a Concrete syntax tree

Thanks!

War es hilfreich?

Lösung

Given that you are doing text search, I would try to leverage some of the infrastructure provided by Lucene. May be you could create a QueryParser and call parse to get back a Query. Instantiable subclasses of Query are:

TermQuery
MultiTermQuery
BooleanQuery
WildcardQuery
PhraseQuery
PrefixQuery
MultiPhraseQuery
FuzzyQuery
TermRangeQuery
NumericRangeQuery
SpanQuery

Then you may be able to use pattern matching to implement what a match means for your application:

def match_?(tweet: String, query: Query): Boolean = query match {
  case q: TermQuery => tweet.contains(q.getTerm.text)
  case q: BooleanQuery => 
    // return true if all must clauses are satisfied
    // call match_? recursively
  // you need to cover all subclasses above
  case _ => false
}

val q = queryParser.parse(userQuery)
val res = match_?(tweet, q)

Here is an implementation. It surely has bugs but you'll get the idea and it shows a working proof of concept. It re-uses the syntax, documentation and grammer of the default Lucene QueryParser.

Andere Tipps

Spring Expression Language (SpEL) supports a matches operator that returns booleans based on regular expressions. See this section of the documentation for usage.

This would also allow you to use logical operators such as and, or and not.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top