Question

Morning all, I am writing a bash script to extract the values of certain XML tags from all files in a given directory. I have decided to do this by tokenising each line and returning th4e relavent token. The problem is that it isn't tokenising correctly and I can't quite work out why. Here is the smallest example that I could make that reconstructs the issue

#!/bin/bash
for file in `ls $MY_DIRECTORY`
do
    for line in `cat $MY_DIRECTORY/$file`
    do
        LOCALIFS=$IFS
        IFS=<>\"

        TOKENS=( $line )
        IFS=$LOCALIFS
        echo "Token 0: ${TOKENS[0]}" 
        echo "Token 1: ${TOKENS[1]}" 
        echo "Token 2: ${TOKENS[2]}" 
        echo "Token 3: ${TOKENS[3]}" 

    done
 done

I'm guessing the issue is to do with my fiddling with IFS inside a loop which itself uses IFS (i.e. the cat operation), but this has never been a problem before.
Any ideas?

Thanks, Rik

Was it helpful?

Solution

Use a better tool to parse xml, ideally it should be a parser, but if your requirement is simple and you know how your xml is structured, simple string manipulation might suffice. For example, xml file and you want to get value of tag3

$  cat file
blah
<tag1>value1 </tag1>
<tag2>value2 </tag2>
<tag3>value3
</tag3>
blah

$ awk -vRS="</tag3>" '/tag2/{ gsub(/.*tag3>/,"");print}' file
value3

so to iterate over your directory

for file in *.xml
do
  value="$(awk -vRS="</tag3>" '/tag2/{ gsub(/.*tag3>/,"");print}' "$file" )"
  echo "$value"
done 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top