Frage

I have a question about the parsing of XML files. Then I put my function and a sample XML file. My question is: In this file, I can parse the subnode "item" and subnode "tag" without problems, but when I try to parse the subnode prop, I get a single string with all the values ​​together. The XML parsing function does not distinguish between them because they all have the same label "prop". I need subnode values ​​are stored in separate columns within a data.frame, is there any way to do that?

My function:

PARSE_INVENTORY_items<-function(DF_DEVICE_IDE_value, URL_DEVICE_value){
  require(XML)
  require(RCurl)
  host<-URL_DEVICE_value
  device<-"/devices/"
  ID_devices<-DF_DEVICE_IDE_value[1,1]

  inventory<-"/inventory"
  start_device<-"/start"

  FULL_url<-paste(host, device, ID_devices, inventory, sep="")
  FULL_url_start<-paste(host, device, ID_devices, start_device, sep="")

  URL_inventory<-gsub(" ","", FULL_url, fixed=TRUE)
  URL_start_device<-gsub(" ","", FULL_url_start, fixed=TRUE)

  httpGET(URL_start_device)

  XML_inventory_exists = url.exists(URL_inventory)
  # Regular HTTP
  if( XML_inventory_exists) {
    inventory = getURL(URL_inventory)
    inventory_xml <- xmlInternalTreeParse(inventory) 
    items <- getNodeSet(inventory_xml,"//data/inventory/items/item")
    DataFrame_inventory_items <- xmlToDataFrame(items)

    items_tags<-getNodeSet(inventory_xml, "//data/inventory/items/item/tags/tag")
    DataFrame_inventory_tags_subnode <- xmlToDataFrame(items_tags)

    #items_tags_props<-getNodeSet(inventory_xml, "//data/inventory/items/item/tags/tag/props/prop")
    #DataFrame_inventory_props_subnode_tag <- xmlToDataFrame(items_tags_props)

    DataFrame_inventory_items<-cbind(DataFrame_inventory_items,DataFrame_inventory_tags_subnode)
    #aux<-DataFrame_inventory_items
    #DataFrame_inventory_items<-subset(DataFrame_inventory_items, select=(-tags))
    return(DataFrame_inventory_items)
  }
}

Example of XML file

<?xml version="1.0" encoding="UTF-8"?>
<inventory>
    <type>inventory</type>
    <ts>1396964708000</ts>
    <status>OK</status>
    <msg-version>2.0.0</msg-version>
    <op>inventory</op>
    <data>
        <advanNetId>AdvanNet-instance-00:26:b9:08:cd:e1-3161</advanNetId>
        <deviceId>adrd1</deviceId>
        <inventory>
            <class>INVENTORY</class>
            <deviceId>adrd1</deviceId>
            <timeWindow>2500</timeWindow>
            <items>
                <item>
                    <class>READ_EVENT</class>
                    <epc>00000000000000000000A200</epc>
                    <ts>1396964708122</ts>
                    <deviceId>adrd1</deviceId>
                    <tags>
                        <tag>
                            <class>CONTEXT_TAG_DATA</class>
                            <hexepc>00000000000000000000A200</hexepc>
                            <props>
                                <prop>RF_PHASE:154</prop>
                                <prop>READ_COUNT:1</prop>
                                <prop>RSSI:-55</prop>
                                <prop>TIME_STAMP:1396964708122</prop>
                                <prop>ANTENNA_PORT:1</prop>
                            </props>
                        </tag>
                    </tags>
                    <tag-rssi>-55.0</tag-rssi>
                    <tag-readcount>1</tag-readcount>
                    <tag-phase>154.0</tag-phase>
                </item>
            </items>
        </inventory>
    </data>
</inventory>
War es hilfreich?

Lösung

So your XML is still not well-formed (missing closing tag for <items>, but close enough to be usable.

The code below creates a data frame from the contents of the <tags> element, with 1 row for each <tag> element, and with columns for <class>, <hexepc> and each of the <prop> elements. The column names from the different <prop> elements are parsed out of the text (so, RF_PHASE, READ_COUNT, etc.). Note that is works if each <tag> has the same <props>.

In this example, the xml you provided (corrected) is called xml.text.

library(XML)
xml <- xmlInternalTreeParse(xml.text,useInternalNodes=T)

# add a few extra tag nodes - you have this already
tags <- xml["//data/inventory/items/item/tags"]
tag  <- xml["//data/inventory/items/item/tags/tag"]
addChildren(node=tags[[1]],xmlClone(tag[[1]]))
addChildren(node=tags[[1]],xmlClone(tag[[1]]))
addChildren(node=tags[[1]],xmlClone(tag[[1]]))

# this is where you start
tags  <- xml["//data/inventory/items/item/tags/tag"]
result <- do.call(rbind,lapply(tags,function(tag){
  class  <- xmlValue(tag["class"][[1]])
  hexepc <- xmlValue(tag["hexepc"][[1]])
  props  <- sapply(tag["props"]$props["prop"],xmlValue)
  props  <- strsplit(props,":")
  props  <- setNames(sapply(props,function(x)x[2]),sapply(props,function(x)x[1]))
  c(class=class,hexepc=hexepc,props)
}))
result <- data.frame(result)
#              class                   hexepc RF_PHASE READ_COUNT RSSI    TIME_STAMP ANTENNA_PORT
# 1 CONTEXT_TAG_DATA 00000000000000000000A200      154          1  -55 1396964708122            1
# 2 CONTEXT_TAG_DATA 00000000000000000000A200      154          1  -55 1396964708122            1
# 3 CONTEXT_TAG_DATA 00000000000000000000A200      154          1  -55 1396964708122            1
# 4 CONTEXT_TAG_DATA 00000000000000000000A200      154          1  -55 1396964708122            1
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top