Question

I need to identify and remove several occurrences of 100000 N (as in character N) from a 18 GB file. They occur in long strings. The command I want to use is:

  sed -r '/N{100000}/d' bigFile > newBigFile

The error I get is that the { is an illegal charcater. Decreasing the number to 10000 yields no errors, and the process runs just fine.

Help is appreciated.

Was it helpful?

Solution

I've checked sed on my fedora linux and I've found that sed has a maximum string length: 2^15 - 1. So, you can write regex with max length of N = 32767

sed -r 's/N{32767}//g' bigFile > newBigFile

Also, you can multiply this value, like this (e.g. multiply on 3):

sed -r 's/(NNN){32767}//g' bigFile > newBigFile

Even, you can play with regex rule without right border if it acceptable in your case:

sed -r 's/N{32767,}//g' bigFile > newBigFile
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top