I'm trying to implement a smart search feature in my application. Usecase: The user enters the search term in a textbox

Eg: Find me a christian male 28 years old from Brazil.

I need to be parse the input into a map as follows:

Gender: male Age: 38 Location: Brazil Relegion: Christian

Already had a glance on : OpenNLP, Cross Validate, Java Pattern Matching and Regex, Information Extraction. I'm confused which one I need to look deeper into.

Is there any java lib already available for this specific domain?

有帮助吗?

解决方案

There's an API that extracts structured information (JSON) from free text: http://wit.ai

You need to train Wit with some examples of what you want to be achieved.

enter image description here

其他提示

Just an approach (there are many ways to do this I think): split your String in a String[] and process each word as you need:

String str = "Find me a christian male 28 years old from Brazil";
for(String s : str.split(" ")){ //splits your String using space char
    processWord(s);
}

Where processWord(s) should do something to determine if s is or not a key word based on your business rules.

EDIT: Well, as many people consider this answer insufficient I'll add some more tips.

Let's say you have a class in which you put some search criteria (assuming you want to get people that match these criteria):

public class SearchCriteria {
    public void setGender(String gender){...}
    public void setCountry(String country){...}
    public void setReligion(String religion){...}
    ...
    public void setWatheverYouThinkIsImportant(String str){...}
}

As @Sotirios pointed in his comment, you may need a pool of matching words. Let's say you can use List<String> with basic matching words:

List<String> gender = Arrays.asList(new String[]{"MALE","FEMALE","BOY","GIRL"...});
List<String> country = Arrays.asList(new String[]{"ALGERIA","ARGENTINA","AUSTRIA"...});
List<String> religion = Arrays.asList(new String[]{"CHRISTIAN","JEWISH","MUSLIM"...});

Now I'll modify processWord(s) a little (assuming this method has access to lists above):

public void processWord(String word, SearchCriteria sc){
    if(gender.contains(word.toUpperCase()){
        sc.setGender(word.toUpperCase());
        return;
    }
    if(country.contains(word.toUpperCase()){
        sc.setCountry(word.toUpperCase());
        return;
    }
    if(religion.contains(word.toUpperCase()){
        sc.setReligion(word.toUpperCase());
        return;
    }
    ....
}

Finally you need to process user's input:

String usersInput = "Find me a christian girl 28 years old from Brazil"; //sorry I change "male" for "girl" but I like girls :P
SearchCriteria sc = new SearchCriteria();
for(String word : usersInput.split(" "){
    processWord(word, sc);
}
// do something with your SearchCriteria object

Sure you can do this so much better. This is only an approach. If you want to do the search more accurate take a read about Levenshtein's distance. It will help you for example if somebody puts "Brasil" instead "Brazil" or "cristian" instead "christian".

This is a pretty huge area of research in language processing: it's called Information Extraction. If it's Java you want, GATE has pretty extensive support for IE.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top