Here's what I intend to do:

doc = xapian.Document()
doc.set_data(somedata)
..
..
doc.add_term("Ajohn doe")

Assume the prefix "author" is available for document author.

Now I want to be able to run this search "searchterm AND author:john doe"

This is obviously not working because "doe" is being considered part of the author (the QueryParser is translating it to "searchterm AND author:john OR doe"). Should I do this:

doc.add-term("Ajohn_doe")

and search by "searchterm AND author:john_doe"? Are there any alternatives for searching text with spaces in general?

有帮助吗?

解决方案

The most common way of doing this would be to add terms Ajohn and Adoe (probably using Xapian's TermGenerator, which will do word splitting and term creation for you). Having done this, you can then run a search author:"john doe" (a prefixed phrase search, which will be able to search across multiple terms). Something like the following:

import xapian
db = xapian.WritableDatabase("my-db", xapian.DB_CREATE_OR_OPEN)
tg = xapian.TermGenerator()

doc = xapian.Document()
tg.set_document(doc)
tg.index_text("John Doe", 1, "A")
db.add_document(doc)

qp = xapian.QueryParser()
qp.add_prefix("author", "A")
q = qp.parse_query('author:"John Doe"')

enq = xapian.Enquire(db)
enq.set_query(q)
for match in enq.get_mset(0, 10):
    print "%8.8i: %f" % (match.docid, match.weight,)

(Tested against a semi-recent Xapian trunk, although I don't believe anything is particularly new here.)

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top