Question

My code is like:

String try1 = " how abcd is a lake 3909 Witmer Road Niagara Falls NY 14305 and our adress is 120, 5th cross, 1st main, domlur, Bangalore 50071 nad 420, Fanboy Lane, NewYark, AS 12345";
String add1="( \\b+[0-9]{3,5}[, ]* (.*)[, ]* (.*)[, ]* [a-zA-Z]{2} [0-9]{5})";
Pattern p = Pattern.compile(add1);
Matcher m = p.matcher(try1);
if(m.find())
{ 
    System.out.println("Address ======> " + m.group());
}
else System.out.println("Address ======>Not found ");

I want only US addresses in output:

[(3909 Witmer Road Niagara Falls NY 14305) and (420, Fanboy Lane, NewYark, AS 12345)]

but it's outputting like this:

(3909 Witmer Road Niagara Falls NY 14305 and our adress is 120, 5th cross, 1st main, domlur, Bangalore 50071 nad 420, Fanboy Lane, NewYark, AS 12345)
Was it helpful?

Solution

You could try a regex a bit more like this:

"(\\b[0-9]{3,5},? [A-Za-z]+(?: [A-Za-z]+,?)* [a-zA-Z]{2} [0-9]{5})"

The [A-Za-z]+,? part allows only letters (and not numbers).

regex101 demo.

OTHER TIPS

The * operator is greedy, so it matches as many characters as it can. In your expression, the [a-zA-Z]{2} [0-9]{5} part that matches the zip code and state matches the very last ZIP and state in the input, because the .* patterns you have earlier in the expression, expand to as many characters as they can.

Try changing the .s to [^0-9] so that it matches anything except digits.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top