Question

I'm working with some XML files as part of a team. Since some people have different indentation settings, the formatting sometimes gets screwed up, and it's convenient to have an automated tool re-pretty-print the file. Is there a way to pretty-print XML, without deleting all of the newlines in empty lines? These are human-readable/edited XML files I'm working with (Ant scripts, configuration files, a proprietary XHTML-like thing, etc.). The newlines in these files are to break up the text/code flow into blocks, and are really important for making the file easily readable.

I'm using EditPadPro as my text editor (and it can use external tools fine), and HTML Tidy as my XML-formatter, but I don't like that it deletes newlines. What tool can I use that will correctly format/pretty-print XML, without deleting newlines?

Example annoying XML:

<thing>
  <frob>
    </frob>

  <!-- Done frobbing; now for BAZ. -->
        <baz />
</thing>

Preferred output:

<thing>
  <frob>
  </frob>

  <!-- Done frobbing; now for BAZ. -->
  <baz />
</thing>
Was it helpful?

Solution

You can use an XML parser to parse it and dump it again. Here's the code in Python:

from xml.parsers.expat import ParserCreate

class process:
    def __init__(self):
        self.level = 0
    def start_element(self, name, attrs):
        attr = ''
        for i, j in attrs.iteritems():
            attr += ' {0}="{1}"'.format(i, j)
        print '{0}<{1}{2}>'.format('  '*self.level, name, attr)
        self.level += 1
    def end_element(self, name):
        self.level -= 1
        print '{0}</{1}>'.format('  '*self.level, name)
    def char_data(self, data):
        data = data.strip()
        if data:
            print '  '*self.level+data

if __name__ == '__main__':
    import sys
    for f in sys.argv[1:]:
        p = ParserCreate()
        q = process()
        p.StartElementHandler = q.start_element
        p.EndElementHandler = q.end_element
        p.CharacterDataHandler = q.char_data
        p.ParseFile(open(f))

Save it as xml_prettifier.py and run python xml_prettifier.py <file>.xml.

OTHER TIPS

Eclipse XML editor does that when you select all and reindent (Ctrl+A, Ctrl+I). This is a little bit overkill since Eclipse is a complete IDE and not a lightweight text editor but if you are desperate its a solution.

Note that reindenting with the XML editor does other things, like splitting elements on multiple lines if they are longer than 80 chars by default, this can be of course tweaked or disabled.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top