Regex to change to sentence case

https://stackoverflow.com/questions/1039226

22-07-2019
|

Question

I'm using Notepad++ to do some text replacement in a 5453-row language file. The format of the file's rows is:

variable.name = Variable Value Over Here, that''s for sure, Really

Double apostrophe is intentional.

I need to convert the value to sentence case, except for the words "Here" and "Really" which are proper and should remain capitalized. As you can see, the case within the value is typically mixed to begin with.

I've worked on this for a little while. All I've got so far is:

 (. )([A-Z])(.+)

which seems to at least select the proper strings. The replacement piece is where I'm struggling.

Solution

Regex replacement cannot execute function (like capitalization) on matches. You'd have to script that, e.g. in PHP or JavaScript.

Update: See Jonas' answer.

I built myself a Web page called Text Utilities to do that sort of things:

paste your text
go in "Find, regexp & replace" (or press Ctrl+Shift+F)
enter your regex (mine would be ^(.*?\=\s*\w)(.*)$)
check the "^$ match line limits" option
choose "Apply JS function to matches"
add arguments (first is the match, then sub patterns), here s, start, rest
change the return statement to return start + rest.toLowerCase();

The final function in the text area looks like this:

return function (s, start, rest) {
     return start + rest.toLowerCase();
};

Maybe add some code to capitalize some words like "Really" and "Here".

OTHER TIPS

Find:    (. )([A-Z])(.+)
Replace: \1\U\2\L\3

In Notepad++ 6.0 or better (which comes with built-in PCRE support).

In Notepad++ you can use a plugin called PythonScript to do the job. If you install the plugin, create a new script like so:

enter image description here

Then you can use the following script, replacing the regex and function variables as you see fit:

import re

#change these
regex = r"[a-z]+sym"
function = str.upper

def perLine(line, num, total):
for match in re.finditer(regex, line):
    if match:
        s, e = match.start(), match.end()
        line = line[:s] + function(line[s:e]) + line[e:]
        editor.replaceWholeLine(num, line)

editor.forEachLine(perLine)

This particular example works by finding all the matches in a particular line, then applying the function each each match. If you need multiline support, the Python Script "Conext-Help" explains all the functions offered including pymlsearch/pymlreplace functions defined under the 'editor' object.

When you're ready to run your script, go to the file you want it to run on first, then go to "Scripts >" in the Python Script menu and run yours.

Note: while you will probably be able to use notepad++'s undo functionality if you mess up, it might be a good idea to put the text in another file first to verify it works.

P.S. You can 'find' and 'mark' every occurrence of a regular expression using notepad++'s built-in find dialog, and if you could select them all you could use TextFX's "Characters->UPPER CASE" functionality for this particular problem, but I'm not sure how to go from marked or found text to selected text. But, I thought I would post this in case anyone does...

Edit: In Notepad++ 6.0 or higher, you can use "PCRE (Perl Compatible Regular Expression) Search/Replace" (source: http://sourceforge.net/apps/mediawiki/notepad-plus/?title=Regular_Expressions) So this could have been solved using a regex like (. )([A-z])(.+) with a replacement argument like \1\U\2\3.

The questioner had a very specific case in mind. As a general "change to sentence case" in notepad++ the first regexp suggestion did not work properly for me. while not perfect, here is a tweaked version which was a big improvement on the original for my purposes :

find:    ([\.\r\n][ ]*)([A-Za-z\r])([^\.^\r^\n]+) 
replace: \1\U\2\L\3

You still have a problem with lower case nouns, names, dates, countries etc. but a good spellchecker can help with that.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow