R How to extract info from xml tags

Question 1

You can do this for example, using XML package:

tt <- '<?xml version="1.0" encoding="utf-8"?>
<item id="rt" name ="th">
  <point1>1254</point1>
  <point2>1254</point2>
</item>
'

library(XML)
xpathSApply(doc,'//item',xmlGetAttr,'id')
[1] "rt"

EDIT

In case your data is not well formatted, you should reformat your data as I did above or read your data line by line , and extract the information using some regular expression ( not recommended with XML tags to use regex)

    tt <- '<item1 id=rt name ="th">
<point1>1254</point1>
<point2>1254</point2>
</item>
    '

    ll <- readLines(textConnection(tt))
    gsub('.*id=(.*)[ ]name.*','\\1',ll[1])
 [1] "rt"

Question 2

How about a regex?

/=\K\W?\K\w+/g

=\K finds but does not save the =

\W?\K finds but does not save the potential quotation mark before your tag.

\w+ is your tag.

You can read the file line by line and save your matches into an array, something like:

my @matches = $line =~ /=\K\W?\K\w+/g;

And then use $matches[] to access the individual elements.

Here it the regex in action if you want to play with it further: http://regexr.com?37im8