Question

I'm trying to implement a smart search feature in my application. Usecase: The user enters the search term in a textbox

Eg: Find me a christian male 28 years old from Brazil.

I need to be parse the input into a map as follows:

Gender: male Age: 38 Location: Brazil Relegion: Christian

Already had a glance on : OpenNLP, Cross Validate, Java Pattern Matching and Regex, Information Extraction. I'm confused which one I need to look deeper into.

Is there any java lib already available for this specific domain?

Was it helpful?

Solution

There's an API that extracts structured information (JSON) from free text: http://wit.ai

You need to train Wit with some examples of what you want to be achieved.

enter image description here

OTHER TIPS

Just an approach (there are many ways to do this I think): split your String in a String[] and process each word as you need:

String str = "Find me a christian male 28 years old from Brazil";
for(String s : str.split(" ")){ //splits your String using space char
    processWord(s);
}

Where processWord(s) should do something to determine if s is or not a key word based on your business rules.

EDIT: Well, as many people consider this answer insufficient I'll add some more tips.

Let's say you have a class in which you put some search criteria (assuming you want to get people that match these criteria):

public class SearchCriteria {
    public void setGender(String gender){...}
    public void setCountry(String country){...}
    public void setReligion(String religion){...}
    ...
    public void setWatheverYouThinkIsImportant(String str){...}
}

As @Sotirios pointed in his comment, you may need a pool of matching words. Let's say you can use List<String> with basic matching words:

List<String> gender = Arrays.asList(new String[]{"MALE","FEMALE","BOY","GIRL"...});
List<String> country = Arrays.asList(new String[]{"ALGERIA","ARGENTINA","AUSTRIA"...});
List<String> religion = Arrays.asList(new String[]{"CHRISTIAN","JEWISH","MUSLIM"...});

Now I'll modify processWord(s) a little (assuming this method has access to lists above):

public void processWord(String word, SearchCriteria sc){
    if(gender.contains(word.toUpperCase()){
        sc.setGender(word.toUpperCase());
        return;
    }
    if(country.contains(word.toUpperCase()){
        sc.setCountry(word.toUpperCase());
        return;
    }
    if(religion.contains(word.toUpperCase()){
        sc.setReligion(word.toUpperCase());
        return;
    }
    ....
}

Finally you need to process user's input:

String usersInput = "Find me a christian girl 28 years old from Brazil"; //sorry I change "male" for "girl" but I like girls :P
SearchCriteria sc = new SearchCriteria();
for(String word : usersInput.split(" "){
    processWord(word, sc);
}
// do something with your SearchCriteria object

Sure you can do this so much better. This is only an approach. If you want to do the search more accurate take a read about Levenshtein's distance. It will help you for example if somebody puts "Brasil" instead "Brazil" or "cristian" instead "christian".

This is a pretty huge area of research in language processing: it's called Information Extraction. If it's Java you want, GATE has pretty extensive support for IE.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top