Python regular expression for a sentence does not want to match

Question 1

I think that maybe you meant to do this:

(([a-zA-Zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]+\s{1})+)
 ^                                             ^

I don't think the nested square brackets you had do what you think they do.

Question 2

This regex works:

pattern = re.compile(r"""
^
([a-zA-Zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]+\s{1})+
([a-zA-Zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]+)   # Last word.
\.{1}
$
""", re.VERBOSE + re.UNICODE)

line = "A sentence."

match = re.match(pattern, line)

>>> print "'%s'" % match.group(0)
'A sentence.'
>>> print "'%s'" % match.group(1)
'A '
>>> print "'%s'" % match.group(2)
'sentence'

To return the entire match (line in this case), use match.group(0).

Because the first match group can match multiple times (once for each word except the last one), you can only access the next to last word using match.group(1).

Btw, the {1} notation is not necessary in this case, matching once and only once is the default behavior, so this bit can be removed.

The extra set of square brackets definitely weren't helping you :)

Question 3

It turns out the following actually works and includes all the extended ascii characters I wanted

^
([\w+\s{1}]+\w{1}\.{1})
$