Question

I have a file with lines like this:

"def{word}  def{word}"
"def{worad} def{worads}"

and I want to report braced words that occur twice. So the output should, in this case, be just "word" What I have is:

#!/bin/bash
arr=(
   "def{word}  def{word}"
   "def{worad} def{worads}"
)
for i in "${arr[@]}"; do 
   [[ $i =~ def\{([a-z]+)\}.*def\{\1\} ]] || continue
   echo ${BASH_REMATCH[1]}
done

i.e., I try to match the first word with \1 (a back reference). However, the output is nil. How can I do this?

Was it helpful?

Solution

I find bash regular expressions behave better when enclosed in quotes, even though you have to be a bit careful with this, as direct quoting will cause exact instead of regex matching. To get around this, you can put your regular expression in a variable, quoted, and then reference it in your =~ expression:

#!/bin/bash
arr=(
   "def{word}  def{word}"
   "def{worad} def{worads}"
)
re="def\{([a-z]+)\}.*def\{\1\}"
for i in "${arr[@]}"; do 
   [[ $i =~ $re ]] || continue
   echo ${BASH_REMATCH[1]}
done

Output:

$ ./worad.sh 
word
$ 

This only seems to work in Bash v4, though.

OTHER TIPS

Using sed

sed -n '/\({[^{]*}\).*\1/p' file

"def{word}  def{word}"

If only export the word

sed  -n 's/.*{\([^{]*\)}.*{\1}.*/\1/p' file

word

for loops in bash are really slow, and this is probably a little complicated for bash. I'd recommend python or awk for this. Here's some code in python to do what you want:

#!/usr/bin/env python

import re
import sys 
import itertools

def freq(alist):
    counts = {}
    for x in alist:
        x = x[1:-1]
        counts[x] = counts.get(x,0) + 1 
    return {m:[j[0] for j in n] for m,n in itertools.groupby(counts.iteritems(), lambda y: y[1])}

for line in sys.stdin:
    counts = freq(re.findall(r'\{[^}]*\}', line))
    if 2 in counts:
        print ' '.join(counts[2])
    else:
        print

Assuming this script is in a file called two.py run like this:

cat yourfile | python two.py

Now that it's in python, you have something that's much easier to extend and maintain.

Yes, many ways to do this, including:

perl -lne '/def\{(.+?)\}.*def\{\1\}/ and print $1' filename
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top