Question

I'm using Zend Lucene, but don't think the question is specific to that library.

Say I want to provide fulltext search for a database of books. Assume following models:

Model 1:

TABLE: book
- book_id
- name

TABLE: book_author
- book_author_id
- book_id
- author_id

TABLE: author
- author_id
- name

(a book can have 0 or more authors)

Model 2:

TABLE: book
- book_id
- name

TABLE: book_eav
- book_eav_id
- book_id
- attribute (e.g. "author")
- value (e.g. "Tom Clancy")

(a book can have 0 or more authors + information about publisher, number of pages, etc.)

What do I need to do in order to insert all the authors associated with a particular book in a document to be indexed? Do I put all the authors in one field in the document? Would I use some sort of delimiter to group author information? I'm looking for general strategies with this kind of data.

Was it helpful?

Solution

Put all the authors in one field in the document with a delimiter. So the document schema will be:

book_id
name
author: |author 1|author 2|...|author n|
other_attribute_1: |val 1|val 2|
other_attribute_2: |val 1|val 2|

With this schema you can search by author with different boosts with a query like:

(author:"|Tom Clancy|")^10 OR 
(author:"Tom Clancy")^5 OR 
(author:Tom Clancy)^1

This query will show the exact matches first, phrase matches then and finally other matches.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top