Question

I'm parsing an XML with SAX and so far so good, however this xml is a microsoft schema called rowset, therefore each column is named c0,c1,c2 and so on, however the actual name of the column is stored in the schema, which is in the same XML file.

Please see the following example:

    <?xml version="1.0" encoding="utf-8"?>
<RS xmlns="urn:schemas-microsoft-com:xml-analysis:rowset">
    <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:saw-sql="urn:saw-sql" targetNamespace="urn:schemas-microsoft-com:xml-analysis:rowset">
        <xsd:complexType name="R">
            <xsd:sequence>
                <xsd:element name="C0" type="xsd:string" minOccurs="1" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="Opportunity ID" saw-sql:columnID="c55007aeb4fd2c909"/>
                <xsd:element name="C1" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="Opportunity Name" saw-sql:columnID="c9e466deb3ad5a949"/>
                <xsd:element name="C2" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="Closed Flag" saw-sql:columnID="c4f1b71539820e65c"/>
                <xsd:element name="C3" type="xsd:dateTime" minOccurs="0" maxOccurs="1" saw-sql:type="timestamp" saw-sql:columnHeading="Actual Close Date" saw-sql:columnID="c007ad0eb3ebae22d"/>
                <xsd:element name="C4" type="xsd:dateTime" minOccurs="0" maxOccurs="1" saw-sql:type="timestamp" saw-sql:columnHeading="Creation Date" saw-sql:columnID="c8bf417f59e742c63"/>
                <xsd:element name="C5" type="xsd:double" minOccurs="0" maxOccurs="1" saw-sql:type="double" saw-sql:columnHeading="Line Win Probability" saw-sql:columnID="c49c9eb926449bccc"/>
                <xsd:element name="C6" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="First Name" saw-sql:columnID="c7c2e5dc4832c9e81"/>
                <xsd:element name="C7" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="Last Name" saw-sql:columnID="c0fc239ab56275027"/>
                <xsd:element name="C8" type="xsd:string" minOccurs="1" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="Current Base Resource Name" saw-sql:columnID="c7c53a4daa37f1637"/>
                <xsd:element name="C9" type="xsd:string" minOccurs="1" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="Current Level - 1 Resource Name" saw-sql:columnID="ccb4a337a024e6838"/>
                <xsd:element name="C10" type="xsd:string" minOccurs="1" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="Current Level - 2 Resource Name" saw-sql:columnID="cfbde2b815500dd87"/>
                <xsd:element name="C11" type="xsd:string" minOccurs="1" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="Current Level - 3 Resource Name" saw-sql:columnID="cef1347354936a6a8"/>
                <xsd:element name="C12" type="xsd:string" minOccurs="1" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="Current Level - 4 Resource Name" saw-sql:columnID="cbdc3fbd438cafb0f"/>
                <xsd:element name="C13" type="xsd:string" minOccurs="1" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="Current Top-Level Resource Name" saw-sql:columnID="c7d7bc50e00cb94ae"/>
                <xsd:element name="C14" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="Name" saw-sql:columnID="c08c8bd262c9cfaac"/>
                <xsd:element name="C15" type="xsd:int" minOccurs="0" maxOccurs="1" saw-sql:type="integer" saw-sql:columnHeading="# Days in Stage" saw-sql:columnID="c5b0dbd396ffe3a62"/>
                <xsd:element name="C16" type="xsd:double" minOccurs="0" maxOccurs="1" saw-sql:type="double" saw-sql:columnHeading="Revenue" saw-sql:columnID="cdc65489635ccdda0"/>
                <xsd:element name="C17" type="xsd:dateTime" minOccurs="0" maxOccurs="1" saw-sql:type="timestamp" saw-sql:columnHeading="Close Date" saw-sql:columnID="c4bd9fe19ff33072d"/>
                <xsd:element name="C18" type="xsd:string" minOccurs="0" maxOccurs="1" saw-sql:type="varchar" saw-sql:columnHeading="Opportunity Number" saw-sql:columnID="c6429e0148a929b79"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:schema>
    <R>
        <C0>100000009829205</C0>
        <C1>Upgrading Network Servers</C1>
        <C2>Y</C2>
        <C3>2013-01-16T00:00:00</C3>
        <C4>2012-05-01T00:00:00</C4>
        <C5>100</C5>
        <C6>Lisa</C6>
        <C7>Jones</C7>
        <C8>Pepe papas</C8>
        <C9>Pepe papas</C9>
        <C10>Pepe papas</C10>
        <C11>Pepe papas</C11>
        <C12>Pepe papas</C12>
        <C13>Perro Hunter</C13>
        <C14>Amazing customer (SEATTLE, US)</C14>
        <C15></C15>
        <C16></C16>
        <C17></C17>
        <C18>RUP3-Opty-1</C18>
    </R>
    ...

You can see that the actual name of the colum is stored in the saw-sql:columnHeading attribute of the schema, I can match the name attribute to the colum tag-name but I don't know how to access the schema from the SAX handler,

so far I have this

try {

            SAXParserFactory factory = SAXParserFactory.newInstance();
            SAXParser saxParser = factory.newSAXParser();

            DefaultHandler handler = new DefaultHandler() {

                boolean bfname = false;
                boolean blname = false;
                boolean bnname = false;
                boolean bsalary = false;

                public void startElement(String uri, String localName, String qName,
                                         Attributes attributes) throws SAXException {
                    System.out.println("Colum :" + qName);
                }

                public void characters(char ch[], int start, int length) throws SAXException {
                    String value = new String(ch, start, length);
                    System.out.println("value: "+ value);
                }

            };

            InputSource is = new InputSource(new StringReader(xmlContent));
            is.setEncoding("UTF-8");

            saxParser.parse(is, handler);


        } catch (Exception e) {
            e.printStackTrace();
        }

Any ideas?

Was it helpful?

Solution

The SAX handler will record the startElement events in document order. The first will be RS, then xsd:schema, then xsd:complexType, etc. You will need to keep track of each event and write the logic that works out where the table is. SAX is a low-level parser and does not by default keep the document Structure.

I would expect that you would have a indexed local data structure (perhaps a table) for the schema and that for each column you would add an entry. Then for each row as read you would look up the details in the schema (e.g. through getColumn("C1"), etc.)

For small problems you might wish to read the whole document into a DOM (I use XOM) when XPath can be used to make the referencing easy to write.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top