Question

I have a file like this. abc.txt

<ra><r>12.34</r><e>235</e><a>34.908</a><r>23</r><a>234.09</a><p>234</p><a>23</a></ra>
<hello>sadfaf</hello>
<hi>hiisadf</hi>
<ra><s>asdf</s><qw>345</qw><a>345</a><po>234</po><a>345</a></ra>

What I have to do is I have to find <ra> tag and for inside <ra> tag there is <a> tag whose valeus I have to store the values inside of into some variables which I need to process further. How should I do this.?

values inside tag within tag are:
34.908,234.09,23
345,345

Was it helpful?

Solution

This awk should do:

cat file
<ra><r>12.34</r><e>235</e><a>34.908</a><r>23</r><a>234.09</a><p>234</p><a>23</a></ra><a>12344</a><ra><e>45</e><a>666</a></ra>
<hello>sadfaf</hello>
<hi>no print from this line</hi><a>256</a>
<ra><s>asdf</s><qw>345</qw><a>345</a><po>234</po><a>345</a></ra>

awk -v RS="<" -F">" '/^ra/,/\/ra/ {if (/^a>/) print $2}' file
34.908
234.09
23
666
345
345

It take in care if there are multiple <ra>...</ra> groups in one line.


A small variation:

awk -v RS=\< -F\> '/\/ra/ {f=0} f&&/^a/ {print $2} /^ra/ {f=1}' file
34.908
234.09
23
666
345
345

How does it work:

awk -v RS="<" -F">" '   # This sets record separator to < and gives a new line for every <
/^ra/,/\/ra/ {          # within the record starting witn "ra" to record ending with "/ra" do
    if (/^a>/)          # if line starts with an "a" do
    print $2}'          # print filed 2

To see how changing RS works try:

awk -v RS="<" '$1=$1' file
ra>
r>12.34
/r>
e>235
/e>
a>34.908
/a>
r>23
/r>
a>234.09
/a>
p>234
...

To store it in an variable you can do as BMW suggested:

var=$(awk ...)
var=$(awk -v RS=\< -F\> '/\/ra/ {f=0} f&&/^a/ {print $2} /^ra/ {f=1}' file)
echo $var
34.908 234.09 23 666 345 345
echo "$var"
34.908
234.09
23
666
345
345

Since its many values, you can use an array:

array=($(awk -v RS=\< -F\> '/\/ra/ {f=0} f&&/^a/ {print $2} /^ra/ {f=1}' file))
echo ${array[2]}
23
echo ${var2[0]}
34.908
echo ${var2[*]}
34.908 234.09 23 666 345 345

OTHER TIPS

Use gnu grep's Lookahead and Lookbehind Zero-Length Assertions

grep -oP "(?<=<ra>).*?(?=</ra>)" file |grep -Po "(?<=<a>).*?(?=</a>)"

explanation

  • the first grep will get the content in ra tag. Even there are several ra tags in one line, it still can identified.

  • The second grep get the content in a tag

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top