Question

I need to parse a large XML file (probably going to use StAX in Java) and output it into a delimited text file and I have a couple of design questions. First here is an example of the XML

    <demographic>
        <value>001</value>
        <question>Name?</question>
        <value>Bob</value>
        <question>Last Name?</question>
        <value>Smith</value>
        <followUpQuestions>
            <question>Middle Init.</question>
            <value>J</value>
        </followUpQuestions>
    </demographic>

this would need to be outputted (in the delimited output file) as

001~Bob~Smith~J

so here are my questions:

  1. How can I distinguish between all the different "value" tags, since the tag names are not unique. Currently I tried to resolve this by having 'state' variables that turn on once they pass question-text such as "Name?", however this approach doesnt really work for the first value since I have to check to make sure the 'name' and 'lastName' states are off to ensure I'm getting the first value.

  2. Everytime the client changes the text of the questions (which happens) I have to change the code and recompile it. Is there anyway to avoid this? Maybe save the questions-text in a text file that the program reads in?

  3. Can this be scalable? I need to extract over 100 values and the XML files are usually about 2 gigs large.

Thank you, in advance, for your help (from a Java and XML newbie)!!

UPDATE: here is my attempt to code the solution, can someone please help to streamline? There has to be a less messy way to do this:

import javax.xml.stream.XMLInputFactory; 
import javax.xml.stream.XMLStreamConstants; 
import javax.xml.stream.XMLStreamException; 
import javax.xml.stream.XMLStreamReader;
import java.io.*;
class TestJavaForStackOverflow{

boolean nameState = false,
                lastNameState = false,
                middleInitState = false;

String  name = "",
                lastName = "",
                middleInit = "",
                value = "";

public void parse() throws IOException, XMLStreamException{
        XMLInputFactory factory = XMLInputFactory.newInstance();
        XMLStreamReader streamReader = factory.createXMLStreamReader(
                new FileReader("/n04/data/revmgmt/anthony/scripts/Java_Programs/TestJavaForStackOverflow.xml"));



        while(streamReader.hasNext()){
                streamReader.next();

                if(streamReader.getEventType() == XMLStreamReader.START_ELEMENT){
                        if("demographic".equals(streamReader.getLocalName())){
                                parseDemographicInformation(streamReader);
                        } 
                }
        }
        System.out.println(value + "~" + name + "~" + lastName + "~" + middleInit);
}

public void parseDemographicInformation(XMLStreamReader streamReader) throws XMLStreamException {
        while(streamReader.hasNext()){
                streamReader.next();

                if(streamReader.getEventType() == XMLStreamReader.END_ELEMENT){
                        if("demographic".equals(streamReader.getLocalName())){
                            return;
                        }
                } 
                else if(streamReader.getEventType() == XMLStreamReader.START_ELEMENT){
                     if("question".equals(streamReader.getLocalName())){
                        streamReader.next();
                        if("Name?".equals(streamReader.getText())){
                            nameState = true;
                        }
                        else if("Last Name?".equals(streamReader.getText())){
                            lastNameState = true;
                        }
                        else if("Middle Init.".equals(streamReader.getText())){
                            middleInitState = true;
                        }
                    }
                    else if("value".equals(streamReader.getLocalName())){
                        streamReader.next();
                        if(nameState){ 
                            name = streamReader.getText();
                            nameState = false;
                        }
                        else if (lastNameState){
                            lastName = streamReader.getText();
                            lastNameState = false;
                        }
                        else if (middleInitState){ 
                            middleInit = streamReader.getText();
                            middleInitState = false;
                        }
                        else {
                            value = streamReader.getText();
                        }
                    }
                }

        }
}
public static void main(String[] args){
    TestJavaForStackOverflow t = new TestJavaForStackOverflow();
    try{t.parse();}
    catch(IOException e1){}
    catch(XMLStreamException e2){}
}
}
Était-ce utile?

La solution

I think the flags are not very scalable if you have a lot of different questions to parse, and neither are the global variables to hold the results... if you have 100 questions then you'll need 100 variables, and when they change over time it will be a bear to keep them up to date. I would use a map structure to hold the result, and another one to hold the correspondence between each question text and the corresponding field you are trying to capture (this is not actual Java, just an approximation):

public Map parseDemographicInformation(XmlStream xml, Map questionMap) {
  Map record = new Map();
  String field = "id";
  while((elem = xml.getNextElement())) {
    if(elem.tagName == "question") {
      field = questionMap[elem.value];
    } else if(elem.tagName == "value") {
      record[field] = elem.value;
    }
  }
  return record;
}

Then you have something like this to output the result:

String[] fieldsToOutput = { "id", "firstName", "lastName" };  // ideally read this from a file too so it can be changed dynamically

// ...

for(int i=0; i < fieldsToOutput.length; i++){
  if(i > 0)
    System.out.print("~");
  System.out.print(record[fieldsToOutput[i]]);
}
System.out.println();
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top