So your XML is still not well-formed (missing closing tag for <items>
, but close enough to be usable.
The code below creates a data frame from the contents of the <tags>
element, with 1 row for each <tag>
element, and with columns for <class>
, <hexepc>
and each of the <prop>
elements. The column names from the different <prop>
elements are parsed out of the text (so, RF_PHASE
, READ_COUNT
, etc.). Note that is works if each <tag>
has the same <props>
.
In this example, the xml you provided (corrected) is called xml.text
.
library(XML)
xml <- xmlInternalTreeParse(xml.text,useInternalNodes=T)
# add a few extra tag nodes - you have this already
tags <- xml["//data/inventory/items/item/tags"]
tag <- xml["//data/inventory/items/item/tags/tag"]
addChildren(node=tags[[1]],xmlClone(tag[[1]]))
addChildren(node=tags[[1]],xmlClone(tag[[1]]))
addChildren(node=tags[[1]],xmlClone(tag[[1]]))
# this is where you start
tags <- xml["//data/inventory/items/item/tags/tag"]
result <- do.call(rbind,lapply(tags,function(tag){
class <- xmlValue(tag["class"][[1]])
hexepc <- xmlValue(tag["hexepc"][[1]])
props <- sapply(tag["props"]$props["prop"],xmlValue)
props <- strsplit(props,":")
props <- setNames(sapply(props,function(x)x[2]),sapply(props,function(x)x[1]))
c(class=class,hexepc=hexepc,props)
}))
result <- data.frame(result)
# class hexepc RF_PHASE READ_COUNT RSSI TIME_STAMP ANTENNA_PORT
# 1 CONTEXT_TAG_DATA 00000000000000000000A200 154 1 -55 1396964708122 1
# 2 CONTEXT_TAG_DATA 00000000000000000000A200 154 1 -55 1396964708122 1
# 3 CONTEXT_TAG_DATA 00000000000000000000A200 154 1 -55 1396964708122 1
# 4 CONTEXT_TAG_DATA 00000000000000000000A200 154 1 -55 1396964708122 1