I would create N + 1 fields, 1 for each language and 1 where you put everything in.
doc.addField("EN", //english stuff);
doc.addField("NL", //dutch stuff);
doc.addField("all", //english stuff and dutch stuff);
In this scenario, if you would search within a specific language, use that field. If you want to search in all languages, use all
If you want to figure out which language it is in when searching the all
field; you cannot do that easily. You can use N
boolean queries (all
AND EN
), (all
AND NL
).
Or perhaps better, create facets
for all language-fields. Then you can retrieve the facetscounts for each language field in one (quick) query. The highest scoring facet will be the language that matches the search.