Question

What regex can I use to check if there is an excessive number of capitals in a word? e.g.

AAAApples

The program should match AAAApples as having too many capital letters at the start, and using re.sub, replace them with empty strings to leave Apples

So using regex, this: r'^[A-Z]*[a-z]' finds capitals, and checks that the next is a lowercase letter. I then replace this with an empty string, to remove the capitals. But of course, this then also removes 'Ap', leaving 'ples'.

What do I need to do to my regex to fix this?

Was it helpful?

Solution

Use a capture group to get the letters after the extra capitals.

re.sub(r'^[A-Z]+([A-Z][a-z])', r'\1', string)

This matches a sequence of uppercase letters, followed by an uppercase and then lowercase letter. The parentheses cause the match for the last two letters to be put in a capture group. In the replacement \1 is replaced with the contents of the first capture group.

Or you can use lookahead:

re.sub(r'^[A-Z]+(?=[A-Z][a-z])', '', string)

A lookahead specifies that the pattern matches only if it's followed by a match for the sub-pattern, but that sub-pattern isn't included in the match. So this matches a sequence of uppercase letters that must be followed by an uppercase and then lowercase letter. But only the initial sequence of uppercase letters is included in the match, which then gets replaced by the empty string.

Go to regular-expressions.info to learn all about regexp.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top