Question

I've a some strings like that "paddington road" and I need to extract the word "road" from this string. How can I do that?

The problem is that I need to process a list of streets and extract some words like "road" "park" "street" "boulevard" and many others.

What could be the best way to do that? The complexity is O(n*m) and if you consider that I process more than 5000 streets, the performance should be very important.

I'm extracting the values from a Postgres db and putting into a List but I'm not sure it's the best way, may be a hash table is faster to query?

I tried something like this:

    // Parse selectedList
    Iterator<String> it = streets.iterator();
    Iterator<String> it_exception = exception.iterator();

    int counter = streets.size();
    while(it.hasNext()) {   

        while ( it_exception.hasNext() ) {
            // remove substring it_exception.next() from it.next()              
        }               
    }

What do you think?

Was it helpful?

Solution

You can try Set:

Set<String> exceptions = new HashSet<String>(...);
for (String street : streets) {
    String[] words = street.split(" ");
    StringBuilder res = new StringBuilder();
    for (String word : words) {
        if (!exceptions.contains(word)) {
            res.append(word).append(" ");
        }
    } 
    System.out.println(res);
}

I think complexity will be O(n), where n is a number of all words in streets.

OTHER TIPS

You need to get a new iterator for your list of keywords at each iteration of the outer loop. The easiest way is to use the foreach syntax:

for (String streetName : streets) {
    for (String keyword : keywords) {
        // find if the string contains the keyword, and perhaps break if found to avoid searching for the other keywords
    }
}

Don't preoptimize. 5000 is nothing for a computer, and street names are short strings. And if you place the most frequent keywords (street, rather than boulevard) at the beginning of the keyword list, you'll have less iterations.

List streets = new ArrayList<String>();
    streets.add("paddington road");
    streets.add("paddington park");

    for (Object object : streets) {
        String cmpstring = object.toString();
        String[] abc = cmpstring.split(" ");
        String secondwrd = abc[1];
        System.out.println("secondwrd"+secondwrd);

    }

you can keep secondwrd in a list or string buffer etc....

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top