Question

My model here consists on online courses. Every course has got an id number, a title and can have a different number of content files (large html files). I tried to represent them in Lucene using the following scheme (every line is a document):

  • course: "1", title: "Introduction to Java"
  • course: "1", content: "Chapter 1: basics..."
  • course: "1", content: "Chapter 2: collections..."
  • course: "2", title: "Java networking"
  • course: "2", content: "First part: sockets..."
  • course: "3", title: ...

But now, suppose I need to ask Lucene to give me all the courses (just the id) with "Java" in the title and "collections" in some of its contents. A query such as title:java AND content:collections won't work because the information is split into several documents.

Can somebody suggest me some alternate representation or querying technique to address this problem? Note that I can't just join all the contents into a single file and index it in the same document along with the title because some chapters are added after the course has been created.

Thanks in advance.

Était-ce utile?

La solution

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top