Any of the languages you mention are perfectly suitable for this.
In Perl, I would not use the diff command, I would just use Algorithm::Diff on the original files.
Question
By stream splitting I mean the ability to:
An example is sometime better than a long explanation. This command line use tee
and process substitution to split the stream:
$> cut -f2 file | tee >( grep "AB" | sort | ... ) | grep -v "AB" | tr A B | ...
In this example, the stream is split in two: the lines containing "AB"
and the rest:
cut -f2 file ---->- line contains "AB" ->- sort ->- ...
\--->- does not contain "AB" ->- tr A B ->- ...
But I do not like this stream splitting technique because the stream is first duplicated (by tee
) to be then filtered twice (by grep
and grep -v
).
Therefore I wonder if something like stream splitting is available in other languages as perl, python, ruby, c++...
I provide a more complex example below.
bash
stream splittingcounter.sh
splits a stream in three sections (begin, middle and end). And for each section, the stream is again split to count the occurrences of symbols <
, |
and >
:
#!/bin/bash
{
{ tee >( sed -n '1,/^--$/!p' >&3 ) |
sed -n '1,/^--$/p' |
tee >( echo "del at begin: $(grep -c '<')" >&4 ) |
tee >( echo "add at begin: $(grep -c '>')" >&4 ) |
{ echo "chg at begin: $(grep -c '|')"; } >&4
} 3>&1 1>&2 |
{ tee >( sed -n '/^--$/,/^--$/!p' >&3 ) |
sed -n '/^--$/,/^--$/p' |
tee >( echo "del at end: $(grep -c '<')" >&4 ) |
tee >( echo "add at end: $(grep -c '>')" >&4 ) |
{ echo "chg at end: $(grep -c '|')"; } >&4
} 3>&1 1>&2 |
tee >( echo "del in middle: $(grep -c '<')" >&4 ) |
tee >( echo "add in middle: $(grep -c '>')" >&4 ) |
echo "chg in middle: $(grep -c '|')";
} 4>&1
This script is used to count the number of added/changed/deleted lines in sections begin/middle/end. The input of this script is a stream:
$> cat file-A
1
22
3
4
5
6
77
8
$> cat file-B
22
3
4
42
6
77
8
99
$> diff --side-by-side file-A file-B | egrep -1 '<|\||>' | ./counter.sh
del at begin: 1
add at begin: 0
chg at begin: 0
del at end: 0
add at end: 1
chg at end: 0
del in middle: 0
add in middle: 0
chg in middle: 1
How to implement efficiently a such counter.sh
in other programming languages without storing the data in a temporary buffer?
As noted by Lennart Regebro, I am over-thinking this question. Of course, all these languages are able to split input streams as answered by ysth. In pseudo code:
while input-stream
{
case (begin section)
{
case (symbol <) aB++
case (symbol |) cB++
case (symbol >) dB++
}
case (middle section)
{
case (symbol <) aM++
case (symbol |) cM++
case (symbol >) dM++
}
case (ending section)
{
case (symbol <) aE++
case (symbol |) cE++
case (symbol >) dE++
}
}
PrintResult (aB, cB, dB, aM, cM, dM, aE, cE, dE)
Conclusion: Stream splitting is better done in python
/perl
/awk
/C++
than using tee
+ process substitution.
Solution
Any of the languages you mention are perfectly suitable for this.
In Perl, I would not use the diff command, I would just use Algorithm::Diff on the original files.
OTHER TIPS
Tee is just a C program using basic system calls, you can implement it in any language that provides access to the system libraries.
A google search for
tee in my favorite language
should find all the answers you need.