The regex language is embedded in Perl (and vice versa), but it shares no syntax with Perl¹. This means other sytax for repetition or ranges.
1) Regexes share syntax with Perl strings, although the two are not fully compatible, see different meanings of the \b
escape.
Character classes define a set of multiple properties. The character class will match if one of the specified properties will match. A charclass can contain:
- single characters, like
[aeiou]
(match lowercase vowels) - ranges, to match continuous ranges of code points:
[A-Z]
(uppercase latin characters) - negation of the whole charclass:
[^']
(everything that is not a single quote) - named charclasses like
\d
,\w
(and a lot of fun with Unicode properties) - (POSIX charclasses)
If a charclass contains a character multiple times, this is irrelevant, as it behaves like a set union.
The metacharacters in charclasses are
]
: end of match. To match square brackets, one has to do[\[\]]
or[][]
, as a charclass cannot be empty.- The negation operator
^
which is only special in leading position:[~&|^]
would match any of the bitwise logical Perl operators. - The range operator
-
. To match a literal minus, it can be put at the end of the charclass: The class[+-*]
would be invalid (*
comes before+
, so the class is empty, which is illegal), but[+*-]
works just fine - The backslash still is the escape character.
Space is significant inside charclasses, even under the /x
flag.
On your charclasses:
[1..9]
could also be written as[19.]
, and matches a1
,9
or.
, as the period is not a metacharacter inside charclasses.[1 .. 9]
could be written[19. ]
, and additionally matches a space. As I said above, whitespace is significant in charclasses.
What you probably meant:
If you want to match any of the digits 0
or 1
to 9
, you can use the range [0-9]
. Remember that the minus is the range operator in charclasses.