Question

is there a difference in the order of uniq and sort when calling them in a shell script? i’m talking here about time- and space-wise.

grep 'somePattern' | uniq | sort

vs.

grep 'somePattern' | sort | uniq

a quick test on a 140 k lines textfile showed a slight speed improvement (5.5 s vs 5.0 s) for the first method (get uniq values and then sort)

i don’t know how to measure memory usage though

the question now is: does the order make a difference? or is it dependent on the returned greplines (many/few duplicates)

i’m looking forward to your answers

Was it helpful?

Solution

The only correct order is to call uniq after sort, since the man page for uniq says:

Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).

Therefore it should be

grep 'somePattern' | sort | uniq

OTHER TIPS

I believe that sort -u is suited to this exact scenario, and will both sort and uniquify things. Obviously, that'll be more efficient than calling sort and uniq individually in either order.

uniq depends on the items being sorted to remove duplicates(since it compares the previous and current item), hence why sort is always run before uniq. Try it and see.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top