Can anyone explain why this re (in Python):

pattern = re.compile(r"""
^
([[a-zA-Zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]+\s{1}]+)
([a-zA-Zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]+)   # Last word.
\.{1}                                                                                 
$
""", re.VERBOSE + re.UNICODE)

if re.match(pattern, line):

does not match "A sentence."

I would actually like to return the entire sentence (including the period) as a returned group (), but have been failing miserably.

没有正确的解决方案

其他提示

I think that maybe you meant to do this:

(([a-zA-Zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]+\s{1})+)
 ^                                             ^

I don't think the nested square brackets you had do what you think they do.

This regex works:

pattern = re.compile(r"""
^
([a-zA-Zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]+\s{1})+
([a-zA-Zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]+)   # Last word.
\.{1}
$
""", re.VERBOSE + re.UNICODE)

line = "A sentence."

match = re.match(pattern, line)

>>> print "'%s'" % match.group(0)
'A sentence.'
>>> print "'%s'" % match.group(1)
'A '
>>> print "'%s'" % match.group(2)
'sentence'

To return the entire match (line in this case), use match.group(0).

Because the first match group can match multiple times (once for each word except the last one), you can only access the next to last word using match.group(1).

Btw, the {1} notation is not necessary in this case, matching once and only once is the default behavior, so this bit can be removed.

The extra set of square brackets definitely weren't helping you :)

It turns out the following actually works and includes all the extended ascii characters I wanted

^
([\w+\s{1}]+\w{1}\.{1})
$
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top