search&replace on huge txt files

https://stackoverflow.com/questions/21784413

11-10-2022
|

Вопрос

I need a text processing tool that can perform search and replace operations PER LINE on HUGE TEXT FILES (>0.5 GB). Can be either windows or linux based. (I don't know if there is anything like a streamreader/writer in Linux but I have a feeling that it would be the ideal solution. The editors I have tries so far load the whole file into the momory.)

Bonus question: a tool that can MERGE two huge texts on a per line basis, separated with e.g. tabs

Решение

Sounds like you want sed. For example,

sed 's/foo/bar/' < big-input-file > big-output-file

should replace the first occurrence of foo by bar in each line of big-input-file, writing the results to big-output-file.

Bonus answer: I just learned about paste, which seems to be exactly what you want for your bonus question.

Другие советы

'sed' is built into Linux/Unix, and is available for Windows. I believe that it only loads a buffer at a time (not the whole file) -- you might try that.

What would you be trying to do with the merge -- interleaved in some way, rather than just concatenating?

Add: interleave.pl

use strict;
use warnings;

my $B;

open INA, $ARGV[0];
open INB, $ARGV[1];

while (<INA>) {
  print $_;
  $B = <INB>;
  print $B;
}

close INA;
close INB;

run: perl interleave.pl fileA fileB > mergedFile

Note that this is a very bare-bones utility. It does not check if the files exist, and it expects that the files have the same number of lines.

I would use perl for this. It is easy to read a file line by line, has great search/repace available using regular expressions, and will enable you to merge, and you can make your perl script aware of both files.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow