Shell Script - list files, read files and write data to new file

Question 1

All you need is:

find -name '*.nfo' | xargs awk -F'[><]' '{print FILENAME,$3}'

If you have more in your file than just what you show in your sample input then this is probably all you need:

... awk -F'[><]' '/<user>/{print FILENAME,$3}' file

Try this (untested):

> outfile
find -name '*.nfo' -printf "%p %Tc\n" |
while IFS= read -r fname tstamp
do
      awk -v tstamp="$tstamp" -F'[><]' -v OFS=":::" '
          { a[$2] = a[$2] sep[$2] $3; sep[$2] = ", " }
          END {
              print a["string1"], FILENAME, tstamp, a["string4"], a["string3"], a["hobby"], a["string2"]
          }
      ' "$fname" >> outfile
done

The above will only work if your file names do not contain spaces. If they can, we'd need to tweak the loop.

Alternative if your find doesn't support -printf (suggestion - seriously consider getting a modern "find"!):

> outfile
find -name '*.nfo' -print |
while IFS= read -r fname
do
      tstamp=$(stat -c"%x" "$fname")
      awk -v tstamp="$tstamp" -F'[><]' -v OFS=":::" '
          { a[$2] = a[$2] sep[$2] $3; sep[$2] = ", " }
          END {
              print a["string1"], FILENAME, tstamp, a["string4"], a["string3"], a["hobby"], a["string2"]
          }
      ' "$fname" >> outfile
done

If you don't have "stat" then google for alternatives to get a timestamp from a file or consider parsing the output of ls -l - it's unreliable but if it's all you've got...

Question 2

The pat1,pat2 notation of sed is line based. Think of it like this, pat1 sets an enable flag for its commands and pat2 disables the flag. If both pat1 and pat2 are on the same line the flag will be set, and thus in your case print everything following and including the <user> line. See grymoire's sed howto for more.

An alternative to sed, in this case, would be to use a grep that supports look-around assertions, e.g. GNU grep:

find . -type f -name '*.nfo' | xargs grep -oP '(?<=<user>).*(?=</user>)'

If grep doesn't support -P, you can use a combination of grep and sed:

find . -type f -name '*.nfo' | xargs grep -o '<user>.*</user>' | sed 's:</\?user>::g'

Output:

./file1.nfo:test1
./file2.nfo:test2

Note, you should be aware of the issues involved with passing files on to xargs and perhaps use -exec ... instead.

Question 3

It so happens that grep outputs in the format you need and is enough for an one-liner.

By default a grep '' *.nfo will output something like:

file1.nfo:random data  
file1.nfo:<user>test1</user>  
file1.nfo:some more random data  
file2.nfo:not needed  
file2.nfo:<user>test2</user>  
file2.nfo:etc etc

By adding the -P option (Perl RegEx) you can restrict the output to matches only:

grep -P "<user>\w+<\/user>" *.nfo

output:

file1.nfo:<user>test1</user>  
file2.nfo:<user>test2</user>

Now the -o option (only show what matched) saves the day, but we'll need a bit more advanced RegEx since the tags are not needed:

grep -oP "(?<=<user>)\w+(?=<\/user>)" *.nfo > /test/database.txt

output of cat /test/database.txt:

file1.nfo:test1 
file2.nfo:test2

Explained RegEx here: http://regex101.com/r/oU2wQ1

And your whole script just became a single command.

Update:

If you don't have the --perl-regexp option try:

grep -oE "<user>\w+<\/user>" *.nfo|sed 's#</?user>##g' > /test/database.txt