Parse UTF-8 xml file with XmlSlurper

https://stackoverflow.com/questions/7807333

26-10-2019
|

Question

I'm trying to parse google atom with XmlSlurper. My use case is something like this.

1) Send an atom xml to server with rest client.

2)Handle request and parse it on server side.

I develop my server with Groovy and used XmlSlurper as a parser. But i couldnt succed and get the "content is not allowed in prolog" exception. And then i tried to find the reason why it happened. I saved my atom xml to a file which is encoded with utf-8. And then tried read file and parse atom, i get the same exception. But then i saved atom xml to a file whixh is encoded with ansi. And I parsed atom xml successfully. So i think the problem is about XmlSlurper and "UTF-8".

Do you have any idea about this limitation? My atom xml has to be utf-8, so how can i parse this atom xml ? Thanks for your help.

XML :

<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns:atom='http://www.w3.org/2005/Atom'
    xmlns:gd='http://schemas.google.com/g/2005'>
  <category scheme='http://schemas.google.com/g/2005#kind'
    term='http://schemas.google.com/contact/2008#contact' />
  <title type='text'>Elizabeth Bennet</title>
  <content type='text'>Notes</content>
  <gd:email rel='http://schemas.google.com/g/2005#work'
    address='liz@gmail.com' />
  <gd:email rel='http://schemas.google.com/g/2005#home'
    address='liz@example.org' />
  <gd:phoneNumber rel='http://schemas.google.com/g/2005#work'
    primary='true'>
    (206)555-1212
  </gd:phoneNumber>
  <gd:phoneNumber rel='http://schemas.google.com/g/2005#home'>
    (206)555-1213
  </gd:phoneNumber>
  <gd:im address='liz@gmail.com'
    protocol='http://schemas.google.com/g/2005#GOOGLE_TALK'
    rel='http://schemas.google.com/g/2005#home' />
  <gd:postalAddress rel='http://schemas.google.com/g/2005#work'
    primary='true'>
    1600 Amphitheatre Pkwy Mountain View
  </gd:postalAddress>
</entry>

read file and parse :

 String file = "C:\\Documents and Settings\\user\\Desktop\\create.xml";
 String line = "";
 StringBuilder sb = new StringBuilder();
 BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
 while ((line = br.readLine()) !=null) {
     sb.append(line);
 }
 System.out.println("sb.toString() = " + sb.toString());

 def xmlf = new XmlSlurper().parseText(sb.toString())
    .declareNamespace(gContact:'http://schemas.google.com/contact/2008',
        gd:'http://schemas.google.com/g/2005')

   println xmlf.title

Solution

Try:

String file = "C:\\Documents and Settings\\user\\Desktop\\create.xml"

def xmlf = new XmlSlurper().parse( new File( file ) ).declareNamespace( 
        gContact:'http://schemas.google.com/contact/2008',
        gd:'http://schemas.google.com/g/2005' )
println xmlf.title

You're going the long way round

OTHER TIPS

This is the problem:

BufferedReader br = new BufferedReader(
    new InputStreamReader(new FileInputStream(file)));
while ((line = br.readLine()) !=null) {
    sb.append(line);
}

That's reading the file with the platform default encoding. If the encoding is wrong, you'll be reading the data incorrectly.

What you should do is let the XML parser handle it for you. It should be able to detect the encoding itself, based on the first line of data.

I'm not familiar with XmlSlurper but I'd expect it to either be able to parse an input stream (in which case just give it the FileInputStream) or handle the name of the file itself.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow