Question

I am using sort and uniq to process the output from a script, i.e.

script | sort | uniq -c

The script outputs data in chunks of maybe 1000 lines before performing more processing and then outputting another 1000 lines. The chunks are completely independent.

It would be useful for me to be able to process the output in 1000 line chunks through sort and uniq and output the results to the console before continuing the process.

Another way of putting this is that I want sort to clear it's buffer after processing 1000 lines. Is this possible in any way?

Was it helpful?

Solution

Something like this should do what you want:

#!/bin/bash

while true; do
    i=0
    while IFS= read -r; do
        printf "%s\n" "$REPLY"
        if ((i >= 999)); then
            break
        fi
        ((i++))
    done > >(sort | uniq -c)
    if ((i==0)); then
        # previous `while read` returned no lines. We're done.
        break
    fi
done | cat

This reads up to 1000 lines at a time from stdin, and applies sort | uniq -c for each block of 1000 lines. You'd use it something like this:

./yourscript | ./read1000.sh > output.txt

Before anyone dings me for a Useless use of cat award, this cat is necessary to sync the output from the sort | uniq -c process substitution, if the output is going to the terminal instead of redirected to a file or pipe.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top