Question

I have the following classes:

public class Resource
{
    public Guid? Id { get; set; }
    public IList<LocalizedValue> LocalizedValues { get; set; }
}

public class LocalizedValue
{
    public Guid? Id { get; set; }
    public Resource Resource { get; set; }
    public string Locale { get; set; }
    public string TextValue { get; set; }
}

This is used to store multilingual data in other objects, like this:

public class Job
{
    public Resource Description { get; set; }

    // some other properties...
}

So I'm able to store the description in several languages.

I would like to index the Job object (including its Resource properties) in Lucene.Net in order to be able to search something either in all languages or in a specified one.

I looked at the other relevant questions on SO or elsewhere, but I'm not really sure about what to do.

I considered using several fields (one for each TextValue of each Resource in the Job class), but how can I identify which language the text is in without falling into quite complex queries ?

I suppose I'll be able to manage one way or another, but I still ask just in case someone has a great idea I'm missing.

Was it helpful?

Solution

I would create N + 1 fields, 1 for each language and 1 where you put everything in.

doc.addField("EN", //english stuff);
doc.addField("NL", //dutch stuff);
doc.addField("all", //english stuff and dutch stuff);

In this scenario, if you would search within a specific language, use that field. If you want to search in all languages, use all

If you want to figure out which language it is in when searching the all field; you cannot do that easily. You can use N boolean queries (all AND EN), (all AND NL).

Or perhaps better, create facets for all language-fields. Then you can retrieve the facetscounts for each language field in one (quick) query. The highest scoring facet will be the language that matches the search.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top