Domanda

I need to identify and remove several occurrences of 100000 N (as in character N) from a 18 GB file. They occur in long strings. The command I want to use is:

  sed -r '/N{100000}/d' bigFile > newBigFile

The error I get is that the { is an illegal charcater. Decreasing the number to 10000 yields no errors, and the process runs just fine.

Help is appreciated.

È stato utile?

Soluzione

I've checked sed on my fedora linux and I've found that sed has a maximum string length: 2^15 - 1. So, you can write regex with max length of N = 32767

sed -r 's/N{32767}//g' bigFile > newBigFile

Also, you can multiply this value, like this (e.g. multiply on 3):

sed -r 's/(NNN){32767}//g' bigFile > newBigFile

Even, you can play with regex rule without right border if it acceptable in your case:

sed -r 's/N{32767,}//g' bigFile > newBigFile
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top