Spliting a text file's contents based on repetitive line values

https://stackoverflow.com/questions/23373710

12-07-2023
|

Question

I have a single-column text file like:

A.txt

0;
1;
2;
3;
.
.
.
0;
4;
8;
.
.
.
0;
6;
9;

The goal is to split A.txt into files based on the line's values in a way that for each line value that is seen more than once in A.txt, there must be a separate split based on that. Here is an example of the desired output files assuming that "0;" is the only repetitive element inside A.txt:

A1.txt

0;
1;
2;
3;
.
.
.

A2.txt

0;
4;
8;
.
.
.

A3.txt

0;
6;
9;
.
.
.

any idea how to that through linux bash scripting?

Solution

Perl to the rescue:

#!/usr/bin/perl
use warnings;
use strict;

my @lines = <>;
chomp @lines;

my %count;
$count{$_}++ for @lines;

my $OUT;
my $x;
for my $separator (grep $count{$_} > 1, keys %count) {
    for my $line (@lines) {
        open $OUT, '>', 'A' . ++$x . '.txt' or die $!
            if not $OUT or $separator eq $line;
        print {$OUT} "$line\n";
    }
    undef $OUT;
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow