Using regular expressions to do mass replace in Notepad++ and Vim

https://stackoverflow.com/questions/287404

08-07-2019
|

Question

So I've got a big text file which looks like the following:

<option value value='1' >A
<option value value='2' >B
<option value value='3' >C
<option value value='4' >D

It's several hundred lines long and I really don't want to do it manually. The expression that I'm trying to use is:

<option value='.{1,}' >

Which is working as intended when i run it through several online regular expression testers. I basically want to remove everything before A, B, C, etc. The issue is when I try to use that expression in Vim and Notepad++, it can't seem to find anything.

Solution

Everything before the A, B, C, etc.

That seems so simple I must be misinterpreting you. It's just

:%s/<.*>//

OTHER TIPS

In Notepad++ you don't need to use Regular Expressions for this.

Hold down alt to allow you to select a rectangle of text across multiple rows at once. Select the chunk you want to be rid of, and press delete.

In Notepad++ :

<option value value='1' >A
<option value value='2' >B
<option value value='3' >C
<option value value='4' >D


Find what: (.*)(>)(.)
Replace with: \3

Replace All


A
B
C
D

There is a very simple solution to this unless I have not understood the problem. The following regular expression:

(.*)(>)(.*)

will match the pattern specified in your post.

So, in notepad++ you find (.*)(>)(.*) and replace it with \3.

The regular expressions are basically greedy in the sense that if you specify (.*) it will match the whole line and what you want to do is break it down somehow so that you can extract the string you want to keep. Here, I have done exactly the same and it works fine in Notepad++ and Editplus3.

There are two problems with your original solution. Firstly, your example text:

<option value value='1' >A

has two occurences of the "value" word. Your regex does not. Also, you need to escape the opening brace in the quantifier of your regex or Vim will interpret it as a literal brace. This regex works:

:%s/<option value value='.\{1,}' >//g

This will remove the option tag and just leave the letters in vim:

:%s/<option.*>//g

It may help if you're less specific. Your expression there is "greedy", which may be interpreted different ways by different programs. Try this in vim:

%s/^<[^>]+>//

In vim

:%s/<option value='.\{1,}' >//

:%s/<option value='.\+' >//

In vim regular expressions you have to escape the one-or-more symbol, capturing parentheses, the bounded number curly braces and some others.

See :help /magic to see which special characters need to be escaped (and how to change that).

In notepad++

(<option value="\w\w">)\w+">(.+)

Replace with

\1\2

Having the same problem (with jQuery " done..." strings), but only in Notepad++, I asked, received good friendly replies (that made me understand what I had missed), then spent the time to build a detailed step-by-step explanation, see Finding Line Beginning using Regular expression in Notepad++

Versailles, Tue 27 Apr 2010 22:53:25 +0200

Notepad ++ : Search Mode = Regular expression

Find what: (.*>)(.)

Replace with: \2

This will work. Tested it in my vim. the single quotes are the trouble.

1,$s/^<option value value=['].['] >/

Vim:

:%s/.* >//

A little after the fact, but in case its useful to anyone, I was able to follow one of the examples on here (by sdgfsdg) and quickly pick up Regular Expressions for Notepad++.

I had to similarly pull out some redundant data from a list of HTML select dropdown options, of the form:

<select>
  <option value="AC">saint_helena">Ascension Island</option>
  <option value="AD">andorra">Andorra</option>
  <option value="AE">united_arab_emirates">United Arab Emirates</option>
  <option value="AF">afghanistan">Afghanistan</option>:
  ...
</select>

And what I really wanted was:

<select>
  <option value="AC">Ascension Island</option>
  <option value="AD">Andorra</option>
  <option value="AE">United Arab Emirates</option>
  <option value="AF">Afghanistan</option>
  ...
</select>

After some hair-pulling I realized that as of version 5.8.5 (Sep. 2010) the Regular Expressions still don't seem to allow certain loops in the expressions (unless there is another syntax), for example, the following would find even ">united_arab_emirated_emirates"> despite its additional separating underscores:

(">)([a-z]+([_]*[a-z]*)*)(">)

This query worked in most generic RegEx tools but while within Notepad++, I had to account for the maximum number of nested underscores (which unfortunately was 8) by hand, using the much uglier:

(">)([a-z]+[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*)[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*(">)

If someone knows a way to simulate a Regex loop in Notepad++'s replace feature, please let me know.

Find what: (">)([a-z]+[_][a-z][_][a-z][_][a-z][_][a-z])[_][a-z][_][a-z][_][a-z][_][a-z](">)

Replace with: ">

Result: 255 occurrences were replaced.

Here's a nice article on Notepad++ Regular expressions
http://markantoniou.blogspot.com/2008/06/notepad-how-to-use-regular-expressions.html

Very simple just Find:

<option value value=.*?>

and Click Replace

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow

Using regular expressions to do mass replace in Notepad++ and Vim

Find what: *(">)([a-z]+[_][a-z][_][a-z][_][a-z][_][a-z])[_][a-z][_][a-z][_][a-z][_][a-z](">)*

Replace with: ">

Result: 255 occurrences were replaced.

Find what: (">)([a-z]+[_][a-z][_][a-z][_][a-z][_][a-z])[_][a-z][_][a-z][_][a-z][_][a-z](">)