Question

Edit: This code is fine. I found a logic bug somewhere that doesn't exist in my pseudo code. I was blaming it on my lack of Java experience.

In the pseudo code below, I'm trying to parse the XML shown. A silly example maybe but my code was too large/specific for anyone to get any real value out of seeing it and learning from answers posted. So, this is more entertaining and hopefully others can learn from the answer as well as me.

I'm new to Java but an experienced C++ programmer which makes me believe my problem lies in my understanding of the Java language.

Problem: When the parser finishes, my Vector is full of uninitialized Cows. I create the Vector of Cows with a default capacity (which shouldn't effect it's "size" if it's anything like C++ STL Vector). When I print the contents of the Cow Vector out after the parse, it gives the right size of Vector but all the values appear never to have been set.

Info: I have successfully done this with other parsers that don't have Vector fields but in this case, I'd like to use a Vector to accumulate Cow properties.

MoreInfo: I can't use generics (Vector< Cow >) so please don't point me there. :)

Thanks in advance.

<pluralcow>
        <cow>
            <color>black</color>
            <age>1</age>
        </cow>
        <cow>
            <color>brown</color>
            <age>2</age>
        </cow>
        <cow>
            <color>blue</color>
            <age>3</age>
        </cow>
</pluralcow>

public class Handler extends DefaultHandler{
    // vector to store all the cow knowledge
    private Vector  m_CowVec;

    // temp variable to store cow knowledge until
    // we're ready to add it to the vector
    private Cow     m_WorkingCow;

    // flags to indicate when to look at char data
    private boolean m_bColor;
    private boolean m_bAge;

    public void startElement(...tag...)
    {
        if(tag == pluralcow){   // rule: there is only 1 pluralcow tag in the doc
                // I happen to magically know how many cows there are here.             
                m_CowVec = new Vector(numcows);
        }else if(tag == cow ){  // rule: multiple cow tags exist
            m_WorkingCow = new Cow();
        }else if(tag == color){ // rule: single color within cow
            m_bColor = true;
        }else if(tag == age){   // rule: single age within cow
            m_bAge = true;
        }
    }

    public void characters(...chars...)
    {
        if(m_bColor){
            m_WorkingCow.setColor(chars);   
        }else if(m_bAge){
            m_WorkingCow.setAge(chars);
        }
    }

    public void endElement(...tag...)
    {
        if(tag == pluralcow){
            // that's all the cows
        }else if(tag == cow ){
            m_CowVec.addElement(m_WorkingCow);      
        }else if(tag == color){
            m_bColor = false;
        }else if(tag == age){
            m_bAge = false;
        }
    }
}
Was it helpful?

Solution

The code looks fine to me. I say set breakpoints at the start of each function and watch it in the debugger or add some print statements. My gut tells me that either characters() is not being called or setColor() and setAge() don't work correctly, but that's just a guess.

OTHER TIPS

When you say that the Cows are uninitialized, are the String properties initialized to null? Or empty Strings?

I know you mentioned that this is pseudo-code, but I just wanted to point out a few potential problems:

public void startElement(...tag...)
    {
        if(tag == pluralcow){   // rule: there is only 1 pluralcow tag in the doc
                // I happen to magically know how many cows there are here.                     
                m_CowVec = new Vector(numcows);
        }else if(tag == cow ){  // rule: multiple cow tags exist
                m_WorkingCow = new Cow();
        }else if(tag == color){ // rule: single color within cow
                m_bColor = true;
        }else if(tag == age){   // rule: single age within cow
                m_bAge = true;
        }
    }

You really should be using tag.equals(...) instead of tag == ... here.

public void characters(...chars...)
{
    if(m_bColor){
            m_WorkingCow.setColor(chars);   
    }else if(m_bAge){
            m_WorkingCow.setAge(chars);
    }
}

I'm assuming you're aware of this, but this methods is actually called with a character buffer with start and end indexes.

Note also that characters(...) can be called multiple times for a single text block, returning small chunks in each call: http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)

"...SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks..."

I doubt you'll run into that problem in the simple example you provided, but you also mentioned that this is a simplified version of a more complex problem. If in your original problem, your XML consists of large text blocks, this is something to consider.

Finally, as others have mentioned, if you could, it's a good idea to consider an XML marshalling library (e.g., JAXB, Castor, JIBX, XMLBeans, XStream to name a few).

I have to say that I'm not a big fan of this design. However, are you sure that your characters is ever called ? (maybe a few system.outs would help). If it's never called, you would end up with an uninitialized cow.

Also, I would not try to implement an XML parser myself like this since you need to be more robust against validation issues.

You can use SAX or DOM4J, or even better, use Apache digester.

Also, if I have a schema I will use JaxB, or another code generator to speed up development of XML interface code. The code generators hide a lot of the complexity of working directly with SAX or DOM4J.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top