Question

I have been fighting with this for a while, so hopefully someone can help me out. I'm open to any and all suggestions.

When I query QGeoAddress::street(), I (may) receive both the street number, plus the street name. I would like to get just the street name.

Example:

King St W -> King St W
99 King St W -> King St W
99a King St W -> King St W ...

1st St -> 1st St
99 1st St -> 1st St
99a 1st St -> 1st St ...

315 W. 42nd -> W. 42nd
42 St. Paul Drive -> St. Paul Drive

I need to do this so that the location of two separate devices can be compared via the most recent street name. If a device is at "99 King St W", it is on the same street as "113 King St W", or "113a King St W".

As it stands, I don't believe regex is a good, reliable solution as there are too many rules to impose and the variability of street names is working against me. Theoretically, there may be a street called "1 St", which would fail the regex normalizing "1 1st St".

Writing my own fuzzy matcher may provide better results, but may fail for shorter street names.

I have also considered querying a REST web service, however many of the free services have limitations on requests per day, or a minimum time between requests that would render that method too expensive.

Like I say, I'd love to hear what you guys can come up with.

Much appreciated :)

Was it helpful?

Solution

As I said in the comments, the problem here is that the wrong question is being asked. But if you have to, and you can exlude PO boxes (the string ends in a number?), and you limit yourself to addresses in the USA (because you wouldn't believe some of the things you see in the UK), then you might start by detecting a leading number, then appending everything that isn't separated from it by a space. It's hardly perfect, because there'll always be people who write "99 A King St.", rather than "99a King St.". (But then, in the first, is the name of the street "King St." or "A King St."? Unless you know the street yourself, you can't be sure.) The regular expression for this would be "\\d+\\w*". Beyond that, you can try certain heuristics with the results: if they are a single word, exactly matching "St", "Street", "Ave", etc. (there are probably about 20 different words you should check, with or without trailing "." in the case of abbreviations), then you probably have just the street.

But before even starting, I would insist that you query the assignment. It's well known, for example, that when inputting addresses, about all you can do is "First line:", "Second line:", etc. Even asking for a post code can be tricky.

OTHER TIPS

Description

This regex will look for the street St or avenue Ave and capture the preceding word and the rest of the line. I made the expression allow St or Ave incase you wanted to expand the test beyond streets just called "xxx street", if your use case requires just St then replace the (St|Ave) with just St.

(\b\S*\b\s(St|Ave)\b.*?)$

enter image description here

Example

I only include this PHP example to demo how the expression works and what the group captures will look like

<?php
$sourcestring="King St W 
99 King St W 
99a King St W 

1st St 
99 1st St 
99a 1st St";
preg_match_all('/(\b\S*\b\s(St|Ave)\b.*?)$/m',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

$matches Array:
(
    [0] => Array
        (
            [0] => King St W 
            [1] => King St W 
            [2] => King St W 
            [3] => 1st St 
            [4] => 1st St 
            [5] => 1st St
        )

    [1] => Array
        (
            [0] => King St W 
            [1] => King St W 
            [2] => King St W 
            [3] => 1st St 
            [4] => 1st St 
            [5] => 1st St
        )

    [2] => Array
        (
            [0] => St
            [1] => St
            [2] => St
            [3] => St
            [4] => St
            [5] => St
        )

)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top