Regular Expressions for City name

https://stackoverflow.com/questions/11757013

regex
city

24-06-2021
|

Frage

I need a regular Expression for Validating City textBox, the city textbox field accepts only Letters, spaces and dashes(-).

Lösung

This can be arbitrarily complex, depending on how precise you need the match to be, and the variation you're willing to allow.

Something fairly simple like ^[a-zA-Z]+(?:[\s-][a-zA-Z]+)*$ should work.

warning: This does not match cities like München, etc, but here you basically need to work with the [a-zA-Z] part of the expression, and define what characters are allowed for your particular case.

Keep in mind that it also allows for something like San----Francisco, or having several spaces.

Translates to something like: 1 or more letters, followed by a block of: 0 or more spaces or dashes and more letters, this last block can occur 0 or more times.

Weird stuff in there: the ?: bit. If you're not familiarized with regexes, it might be confusing, but that simply states that the piece of regex between parenthesis, is not a capturing group (I don't want to capture the part it matches to reuse later), so the parenthesis are only used as to group the expression (and not to capture the match).

"New York" // passes

"San-Francisco" // passes

"San Fran Cisco" // passes (sorry, needed an example with three tokens)

"Chicago" // passes

"  Chicago" // doesn't pass, starts with spaces

"San-" // doesn't pass, ends with a dash

Andere Tipps

This answer assumes that the letters which @Manaysah refers to also encompasses the use of diacritical marks. I've added the single quote ' since many names in Canada and France have it. I've also added the period (dot) since it's required for contracted names.

Building upon @UIDs answer I came up with,

^([a-zA-Z\u0080-\u024F]+(?:. |-| |'))*[a-zA-Z\u0080-\u024F]*$

The list of cities it accepts:

Toronto
St. Catharines
San Fransisco
Val-d'Or
Presqu'ile
Niagara on the Lake
Niagara-on-the-Lake
München
toronto
toRonTo
villes du Québec
Provence-Alpes-Côte d'Azur
Île-de-France
Kópavogur
Garðabær
Sauðárkrókur
Þorlákshöfn

And what it rejects:

A----B
------
*******
&&
()
//
\\

I didn't add in the use of brackets and other marks since it didn't fall within the scope of this question.

I've stayed away from \s for whitespace. Tabs and line feeds aren't part of a city name and shouldn't be used in my opinion.

Adding my answer if anybody needs its while searching for Regex for City Names, Like I did

Please use this :

^[a-zA-Z\u0080-\u024F\s\/\-\)\(\`\.\"\']+$

As many city names contains dashes, such as Soddy-Daisy, Tennessee, or special characters like, ñ in La Cañada Flintridge, California

Hope this helps!

Here is the one I've found works best

for PCRE flavours allowing \p{L} (.NET, php, Golang)

/^\p{L}+(?:([\ \-\']|(\.\ ))\p{L}+)*$/u

for regex that does not allow \p{L} replace it with [a-zA-Z\u0080-\u024F]

so for javascript, python regex use

/^[a-zA-Z\u0080-\u024F]+(?:([\ \-\']|(\.\ ))[a-zA-Z\u0080-\u024F]+)*$/

White listing a bunch of character is easy, but there are things to watch for in your regex

consecutive non-alphabetical characters should not be allowed. i.e. Los Angeles should fail because it has two spaces
periods should have a space after. i.e. St.Albert should fail because it's missing the space
names cannot start or end with non-alphabetical characters i.e. -Chicago- should fail
a whitespace character \s !== \, i.e. a tab and line feed character could pass, so space character should be defined instead

Note: When building regex rules, I find https://regex101.com/tests is very helpful, as you can easily create unit tests

js: https://regex101.com/r/cgJwc0/1/tests
php: https://regex101.com/r/Yo3GV2/1/tests

Here's one that will work with most cities, and has been tested:

^[a-zA-Z\u0080-\u024F]+(?:. |-| |')*([1-9a-zA-Z\u0080-\u024F]+(?:. |-| |'))*[a-zA-Z\u0080-\u024F]*$

Python code below, including its test.

import re
import pytest


CITY_RE = re.compile(
    r"^[a-zA-Z\u0080-\u024F]+(?:. |-| |')*"  # a word
    r"([1-9a-zA-Z\u0080-\u024F]+(?:. |-| |'))*"
    r"[a-zA-Z\u0080-\u024F]*$"
)


def is_city(value: str) -> bool:
    valid = CITY_RE.match(value) is not None
    return valid

# Tests
@pytest.mark.parametrize(
    "value,expected",
    (
        ("1", False),
        ("Toronto", True),
        ("Saint-Père-en-Retz", True),
        ("Saint Père en Retz", True),
        ("Saint-Père en Retz", True),
        ("Paris 13e Arrondissement", True),
        ("Paris  13e  Arrondissement ", True),
        ("Bouc-Étourdi", True),
        ("Arnac-la-Poste", True),
        ("Bourré", True),
        ("Å", True),
        ("San Francisco", True),
    ),
)
def test_is_city(value, expected):
    valid, msg = validate.is_city(value)
    assert valid is expected

^[a-zA-Z\- ]+$

Also this might be useful http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

use this regex:

^[a-zA-Z-\s]+$

After many hours of looking for a city regex matcher I have built this and it meets my needs 100%

(?ix)^[A-Z.-]+(?:\s+[A-Z.-]+)*$

expression for testing city. Matches

City
St. City
Some Silly-City
City St.
Too Many Words City

it seems that there are many flavors of regex and I built this for my Java needs and it works great

^[a-zA-Z.-]+(?:[\s-][\/a-zA-Z.]+)*$

This will help identify some city names like St. Johns, Baie-Sainte-Anne, Grand-Salut/Grand Falls

I like shepley's suggestion, but it has a couple flaws in it.

If you change shpeley's regex to this, it will not accept other special characters:

^([a-zA-Z\u0080-\u024F]{1}[a-zA-Z\u0080-\u024F\. |\-| |']*[a-zA-Z\u0080-\u024F\.']{1})$

I use that one:

^[a-zA-Z\\u0080-\\u024F.]+((?:[ -.|'])[a-zA-Z\\u0080-\\u024F]+)*$

You can try this:

^\p{L}+(?:[\s\-]\p{L}+)*

The above regex will:

Restrict leading and trailing spaces, hyphens
Match cities with names like Néewiller-près-lauterbourg

Here are some fun edge-cases:

's Graveland
's Gravendeel
's Gravenpolder
's Gravenzande
's Heer Arendskerke
's Heerenberg
's Heerenhoek
's Hertogenbosch
't Harde
't Veld
't Zand
100 Mile House
6 October City

So, don't forget to add ' and 0-9 as a possible first character of the city name.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow