Question

My model here consists on online courses. Every course has got an id number, a title and can have a different number of content files (large html files). I tried to represent them in Lucene using the following scheme (every line is a document):

  • course: "1", title: "Introduction to Java"
  • course: "1", content: "Chapter 1: basics..."
  • course: "1", content: "Chapter 2: collections..."
  • course: "2", title: "Java networking"
  • course: "2", content: "First part: sockets..."
  • course: "3", title: ...

But now, suppose I need to ask Lucene to give me all the courses (just the id) with "Java" in the title and "collections" in some of its contents. A query such as title:java AND content:collections won't work because the information is split into several documents.

Can somebody suggest me some alternate representation or querying technique to address this problem? Note that I can't just join all the contents into a single file and index it in the same document along with the title because some chapters are added after the course has been created.

Thanks in advance.

Was it helpful?
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top