在 Linux/bash 下分割文件及其行

https://stackoverflow.com/questions/63870

09-06-2019
|

题

我有一个相当大的文件（1.5 亿行，每行 10 个字符）。我需要将其拆分为 150 个文件，每行 200 万行，每个输出行要么是源行的前 5 个字符，要么是后 5 个字符。我可以在 Perl 中相当快地完成此操作，但我想知道是否有使用 bash 的简单解决方案。有任何想法吗？

解决方案

家庭作业？:-)

我认为一个带有 sed （将每行分成两行）和 split （将内容分成多个文件）的简单管道就足够了。

man 命令是你的朋友。

确认不是作业后添加：

怎么样

sed 's/\(.....\)\(.....\)/\1\n\2/' input_file | split -l 2000000 - out-prefix-

其他提示

我认为这样的事情可以工作：

out_file=1
out_pairs=0
cat $in_file | while read line; do
    if [ $out_pairs -gt 1000000 ]; then
        out_file=$(($out_file + 1))
        out_pairs=0
    fi
    echo "${line%?????}" >> out${out_file}
    echo "${line#?????}" >> out${out_file}
    out_pairs=$(($out_pairs + 1))
done

但不确定它是否比使用 Perl 更简单或更有效。

每行变体的前五个字符，假设大文件名为 x.txt，并假设可以在当前目录中创建名为 x.txt.* 的文件：

split -l 2000000 x.txt x.txt.out && （对于 x.txt.out* 中的 splitfile）；做 outfile="${splitfile}.first Five";echo "$splitfile -> $outfile";cut -c 1-5 "$splitfile" > "$outfile";完毕）

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow