Question

I have difficulty using Regular Expression (Grep) in TextWrangler to find occurrences of lowercase letter followed by uppercase. For example:

This announcement meansStudents are welcome.

In fact, I want to split the occurrence by adding a colon so that it becomes means: Students

I have tried:

[a-z][A-Z]

But this expression does not work in TextWrangler.

*EDIT: here are the exact contexts in which the occurrences appear (I mean only with these font colors).*

<font color =#48B700>  - Stột jlăm wẻ baOne hundred and three<br></font>

<font color =#C0C0C0>     »» Qzống pguộc lyời ba yghìm fảy dyổiTo live a life full of vicissitudes, to live a life marked by ups and downs<br></font>

"baOne" and "dyổiTo" must be "ba: One" and "dyổi: To" 

Could anyone help? Many thanks.

Was it helpful?

Solution

I do believe (don't have TextWrangler at hand though) that you need to search for ([a-z])([A-Z]) and replace it with: \1: \2

Hope this helps.

OTHER TIPS

Replace ([a-z])([A-Z]) with \1:\2 - I don't have TextWrangler, but it works on Notepad++

The parenthesis are for capturing the data, which is referred to using \1 syntax in the replacement string

This question is ages old, but I stumbled upon it, so someone else might, as well. The OP's comment to Igor's response clarified how the task was meant to be described (& could have be added to the description).

To match only those font-specific lines of the HTML replace

(?<=<font color =#(?:48B700|C0C0C0)>)(.*?[a-z])([A-Z])

with \1: \2

Explanation:

  • (?<=[fixed-length regex]) is a positive lookbehind and means "if my match has this just before it"
  • (?:48B700|C0C0C0) is an unnamed group to match only 2 colours. Since they are of the same length, they work in a lookbehind (that needs to be of fixed length)
  • (.*?[a-z])([A-Z]) will match everything after the > of those begin font tags up to your Capital letters.
  • The \1: \2 replacement is the same as in Igor's response, only that \1 will match the entire first string that needs separating.

Addition:

Your input strings contain special characters and the part you want to split may very well end in one. In this case they won't be caught by [a-z] alone. You will need to add a character ranger that captures all the letters you care about, something like

(?<=<font color =#(?:48B700|C0C0C0)>)(.*?[a-zḁ-ῼ])([A-Z])

That is the correct pattern for identifying lower case and upper case letters, however, you will need to check matching to be Case Sensitive within the Find/Replace dialogue.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top