문제

I need to identify and remove several occurrences of 100000 N (as in character N) from a 18 GB file. They occur in long strings. The command I want to use is:

  sed -r '/N{100000}/d' bigFile > newBigFile

The error I get is that the { is an illegal charcater. Decreasing the number to 10000 yields no errors, and the process runs just fine.

Help is appreciated.

도움이 되었습니까?

해결책

I've checked sed on my fedora linux and I've found that sed has a maximum string length: 2^15 - 1. So, you can write regex with max length of N = 32767

sed -r 's/N{32767}//g' bigFile > newBigFile

Also, you can multiply this value, like this (e.g. multiply on 3):

sed -r 's/(NNN){32767}//g' bigFile > newBigFile

Even, you can play with regex rule without right border if it acceptable in your case:

sed -r 's/N{32767,}//g' bigFile > newBigFile
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top