sed with more than 100000 characters

https://stackoverflow.com/questions/19511004

large-files
sed
largenumber

01-07-2022
|

Domanda

I need to identify and remove several occurrences of 100000 N (as in character N) from a 18 GB file. They occur in long strings. The command I want to use is:

  sed -r '/N{100000}/d' bigFile > newBigFile

The error I get is that the { is an illegal charcater. Decreasing the number to 10000 yields no errors, and the process runs just fine.

Help is appreciated.

Soluzione

I've checked sed on my fedora linux and I've found that sed has a maximum string length: 2^15 - 1. So, you can write regex with max length of N = 32767

sed -r 's/N{32767}//g' bigFile > newBigFile

Also, you can multiply this value, like this (e.g. multiply on 3):

sed -r 's/(NNN){32767}//g' bigFile > newBigFile

Even, you can play with regex rule without right border if it acceptable in your case:

sed -r 's/N{32767,}//g' bigFile > newBigFile

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow