Are you sure that the shell isn't fast enough? Maybe your bash just needs improved. :)
It appears that you want to print every line after a line with just a (
until you get to a closing )
. So...
#!/usr/bin/ksh
print=0
while read
do
if [[ "$REPLY" == ')' ]]
then
print=0
elif [[ "$print" == 1 ]]
then
echo "${REPLY//[()]/}"
elif [[ "$REPLY" == '(' ]]
then
print=1
fi
done
exit 0
And, with your provided test data:
danny@machine:~$ ./test.sh < file
$data $data $data
$data $data $data
$data $data $data
$data $data $data
$data $data $data
I'll bet you'll find that to be roughly as fast as anything else you would write. If I was going to be using this often, I'd be inclined to add several more error checks - but if your data is well-formed, this will work fine.
Alternatively, you could just use sed.
danny@machine:~$ sed -n '/^($/,/^)$/{/^[()]$/d;s/[()]//gp}' file
$data $data $data
$data $data $data
$data $data $data
$data $data $data
$data $data $data
performance note edit:
I was comparing python implementations below, so I thought I'd test these as well. The sed solution runs about identically to the fastest python implementation on the same data - less than one second (0.9 seconds) to filter ~80K lines. The bash version takes 42.5 seconds to do it. However, just replacing #!/bin/bash
with #!/usr/bin/ksh
above (which is ksh93, on Ubuntu 13.10) and making no other changes to the script reduces runtime down to 10.5 seconds. Still slower than python or sed, but that's part of why I hate scripting in bash.
I also updated both solutions to remove the opening and closing parens, to be more consistent with the other answers.