Vim: regular expression to delete all lines except those starting with a given list of numbers

StackOverflow https://stackoverflow.com/questions/21920624

  •  14-10-2022
  •  | 
  •  

Question

I have a csv file where every line but the first starts with a number and looks like this:

subject,parameter1,parameter2,parameter3
1,blah,blah,blah
3,blah,blah,blah
2,blah,blah,blah
44,blah,blah,blah
12,blah,blah,blah
14,blah,blah,blah
11,blah,blah,blah
10,blah,blah,blah
11,blah,blah,blah
13,blah,blah,blah
3,blah,blah,blah
...

I would like to delete all lines except the first that start, say, with the numbers 1,6,12. I was trying something like this:

:g!/^[1 6 12]\|^subject/d

But the 12 is interpreted as "1 or 2" so this also erases the lines that start with 2..

What am I missing, and what should be the most efficient way to do this? Btw instead of 1, 6, 12, my list contains many multiple single and 2-digit numbers.

Was it helpful?

Solution 2

Use a global match:

:v/^\(subject\|1\|6\|12\),/ delete

For every line that does not match that regular expression, delete it.

It yields:

subject,parameter1,parameter2,parameter3
1,blah,blah,blah
12,blah,blah,blah

EDIT: Just now I realised that you were already using the global match. You error was in the character class. It matches any character inside it regardless of repeated ones, in your case numbers one, two, six and a space. You must separate them in different branches, like I did before.

OTHER TIPS

The character class [1 6 12] means "any single character that is in this class,
i.e. any one of ' ', 1, 2, 6 (the repeated 1 is ignored).

You could use

:g!/^1,\|^6,\|^12,\|^subject/d

which is close to your original syntax - but it works (tested with vim on Mac OS X).

Note - it is important to include the comma, so that the line starting with 1 doesn't "protect" 11, 12345, etc.

You might want to do this differently though - using grep.

Put all the "white listed" numbers in a file, one per line, like so:

^subject
^1,
^2,
^6,
^12,

then do

grep -f whitelist csvFile

and the output will be your "edited" file (which you can pipe to a new file).

If you are even more interested in "efficiency", you could make your text file (let's continue to call it whitelist) just

subject
1
2
6
12

and use the following command:

cat whitelist | xargs -I {} grep "^"{}"," cvsFile

This needs a bit of explaining.

xargs            - take the input one line at a time
-I {}            - and insert that line in the command that follows, at the {}

This means that the grep command will be run n times (once per line in the whitelist file), and each time the regular expression that is fed into grep will be the concatenation of

"^"              - start of line
{}               - contents of one line of the input file (whitelist)
","              - comma that follows the number

So this is a compact way of writing

grep "^subject," csvFile; grep "^1," csvFile; grep "^2," csvFile; 

etc.

It has the advantage that you can now generate your whitelist any way you want - as long as it ends up in a file, one line at a time, you can use it; the disadvantage is that you are essentially running grep n times. If your files get very large, and you have a large number of items in your white list, that may start to be a problem; but since your OS is likely to put the file into cache after the first read-through, it is really quite fast. The use of the ^ anchor makes the regular expression very efficient - as soon as it doesn't find a match it goes on to the next line.

a "functional" alternative:

:g/./if index([1,12,6],str2nr(split(getline("."),",")[0]))<0|exec 'normal! dd'|endif
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top