My model here consists on online courses. Every course has got an id number, a title and can have a different number of content files (large html files). I tried to represent them in Lucene using the following scheme (every line is a document):
- course: "1", title: "Introduction to Java"
- course: "1", content: "Chapter 1: basics..."
- course: "1", content: "Chapter 2: collections..."
- course: "2", title: "Java networking"
- course: "2", content: "First part: sockets..."
- course: "3", title: ...
But now, suppose I need to ask Lucene to give me all the courses (just the id) with "Java" in the title and "collections" in some of its contents. A query such as title:java AND content:collections
won't work because the information is split into several documents.
Can somebody suggest me some alternate representation or querying technique to address this problem? Note that I can't just join all the contents into a single file and index it in the same document along with the title because some chapters are added after the course has been created.
Thanks in advance.