Regular expression to find a lowercase letter followed by an uppercase
-
14-04-2021 - |
Question
I have difficulty using Regular Expression (Grep) in TextWrangler to find occurrences of lowercase letter followed by uppercase. For example:
This announcement meansStudents are welcome.
In fact, I want to split the occurrence by adding a colon so that it becomes means: Students
I have tried:
[a-z][A-Z]
But this expression does not work in TextWrangler.
*EDIT: here are the exact contexts in which the occurrences appear (I mean only with these font colors).*
<font color =#48B700> - Stột jlăm wẻ baOne hundred and three<br></font>
<font color =#C0C0C0> »» Qzống pguộc lyời ba yghìm fảy dyổiTo live a life full of vicissitudes, to live a life marked by ups and downs<br></font>
"baOne" and "dyổiTo" must be "ba: One" and "dyổi: To"
Could anyone help? Many thanks.
La solution
I do believe (don't have TextWrangler at hand though) that you need to search for ([a-z])([A-Z])
and replace it with: \1: \2
Hope this helps.
Autres conseils
Replace ([a-z])([A-Z])
with \1:\2
- I don't have TextWrangler, but it works on Notepad++
The parenthesis are for capturing the data, which is referred to using \1
syntax in the replacement string
This question is ages old, but I stumbled upon it, so someone else might, as well. The OP's comment to Igor's response clarified how the task was meant to be described (& could have be added to the description).
To match only those font-specific lines of the HTML replace
(?<=<font color =#(?:48B700|C0C0C0)>)(.*?[a-z])([A-Z])
with \1: \2
Explanation:
(?<=[fixed-length regex])
is a positive lookbehind and means "if my match has this just before it"(?:48B700|C0C0C0)
is an unnamed group to match only 2 colours. Since they are of the same length, they work in a lookbehind (that needs to be of fixed length)(.*?[a-z])([A-Z])
will match everything after the>
of those begin font tags up to your Capital letters.- The
\1: \2
replacement is the same as in Igor's response, only that\1
will match the entire first string that needs separating.
Addition:
Your input strings contain special characters and the part you want to split may very well end in one. In this case they won't be caught by [a-z]
alone. You will need to add a character ranger that captures all the letters you care about, something like
(?<=<font color =#(?:48B700|C0C0C0)>)(.*?[a-zḁ-ῼ])([A-Z])
That is the correct pattern for identifying lower case and upper case letters, however, you will need to check matching to be Case Sensitive within the Find/Replace dialogue.