Question

Take a text file with lines like:

/user$ cat ORIGFILE 
se832p41iEC.200289_EDI832I140401232506.txt 
pt832p41iEC.213631_EDI832I140401232501.txt
xe832p41iEC.201687_EDI832I140401232512.txt 
pt832p41iEC.213632_EDI832I140401232502.txt
se832p41iEC.200289_EDI832I140401232508.txt 
se832p41iEC.200289_EDI832I140401232507.txt 
xe832p41iEC.201687_EDI832I140401232513.txt 
xe832p41iEC.201687_EDI832I140401232511.txt

If there are duplicate session number (e.g 200289), it should output each repeating section to a file and display like this:

 /user$ cat se832p41iEC.200289
 se832p41iEC.200289_EDI832I140401232506.txt
 se832p41iEC.200289_EDI832I140401232507.txt 
 se832p41iEC.200289_EDI832I140401232508.txt

 /user$ cat xe832p41iEC.201687
 xe832p41iEC.201687_EDI832I140401232511.txt
 xe832p41iEC.201687_EDI832I140401232512.txt
 xe832p41iEC.201687_EDI832I140401232513.txt

 /user$ cat NEWFILE
 pt832p41iEC.213631_EDI832I140401232501.txt
 pt832p41iEC.213632_EDI832I140401232502.txt

Thank you in advance.

Update: Just figured it out after @Jaypal's hint (thanks man):

  First - sort ORIGFILE| uniq -u > NEWFILE
  Second - sort ORIGFILE | uniq -D > AWKFILE
  Last - awk -F_ '{print $0 > $1}' AWKFILE
Was it helpful?

Solution

Now that you have added your attempt, here is a way of doing it with awk:

$ ls
file

$ cat file
se832p41iEC.200289_EDI832I140401232506.txt
pt832p41iEC.213631_EDI832I140401232501.txt
xe832p41iEC.201687_EDI832I140401232512.txt
pt832p41iEC.213632_EDI832I140401232502.txt
se832p41iEC.200289_EDI832I140401232508.txt
se832p41iEC.200289_EDI832I140401232507.txt
xe832p41iEC.201687_EDI832I140401232513.txt
xe832p41iEC.201687_EDI832I140401232511.txt

$ awk -F_ '{
    a[$1] = (a[$1] ? a[$1] RS $0 : $0)
    b[$1]++
}
END {
    for(x in a) print a[x] > (b[x]>1 ? x : "NEWFILE")
}' file

$ ls
NEWFILE  file  se832p41iEC.200289  xe832p41iEC.201687

$ head *
==> NEWFILE <==
pt832p41iEC.213631_EDI832I140401232501.txt
pt832p41iEC.213632_EDI832I140401232502.txt

==> file <==
se832p41iEC.200289_EDI832I140401232506.txt
pt832p41iEC.213631_EDI832I140401232501.txt
xe832p41iEC.201687_EDI832I140401232512.txt
pt832p41iEC.213632_EDI832I140401232502.txt
se832p41iEC.200289_EDI832I140401232508.txt
se832p41iEC.200289_EDI832I140401232507.txt
xe832p41iEC.201687_EDI832I140401232513.txt
xe832p41iEC.201687_EDI832I140401232511.txt

==> se832p41iEC.200289 <==
se832p41iEC.200289_EDI832I140401232506.txt
se832p41iEC.200289_EDI832I140401232508.txt
se832p41iEC.200289_EDI832I140401232507.txt

==> xe832p41iEC.201687 <==
xe832p41iEC.201687_EDI832I140401232512.txt
xe832p41iEC.201687_EDI832I140401232513.txt
xe832p41iEC.201687_EDI832I140401232511.txt
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top