Question

I'm writing out XML files using the MSXML parser, with a wrapper I downloaded from here: http://www.codeproject.com/KB/XML/JW_CXml.aspx. Works great except that when I create a new document from code (so not load from file and modify), the result is all in one big line. I'd like elements to be indented nicely so that I can read it easily in a text editor.

Googling shows many people with the same question - asked around 2001 or so. Replies usually say 'apply an XSL transformation' or 'add your own whitespace nodes'. Especially the last one makes me go %( so I'm hoping that in 2008 there's an easier way to pretty MSXML output. So my question; is there, and how do I use it?

Was it helpful?

Solution

Try this, I found this years ago on the web.

#include <msxml2.h>

bool FormatDOMDocument (IXMLDOMDocument *pDoc, IStream *pStream)
{

    // Create the writer

    CComPtr <IMXWriter> pMXWriter;
    if (FAILED (pMXWriter.CoCreateInstance(__uuidof (MXXMLWriter), NULL, CLSCTX_ALL)))
    {
        return false;
    }
    CComPtr <ISAXContentHandler> pISAXContentHandler;
    if (FAILED (pMXWriter.QueryInterface(&pISAXContentHandler)))
    {
        return false;
    }
    CComPtr <ISAXErrorHandler> pISAXErrorHandler;
    if (FAILED (pMXWriter.QueryInterface (&pISAXErrorHandler)))
    {
        return false;
    }
    CComPtr <ISAXDTDHandler> pISAXDTDHandler;
    if (FAILED (pMXWriter.QueryInterface (&pISAXDTDHandler)))
    {
        return false;
    }

    if (FAILED (pMXWriter ->put_omitXMLDeclaration (VARIANT_FALSE)) ||
        FAILED (pMXWriter ->put_standalone (VARIANT_TRUE)) ||
        FAILED (pMXWriter ->put_indent (VARIANT_TRUE)) ||
        FAILED (pMXWriter ->put_encoding (L"UTF-8")))
    {
        return false;
    }

    // Create the SAX reader

    CComPtr <ISAXXMLReader> pSAXReader;
    if (FAILED (pSAXReader.CoCreateInstance (__uuidof (SAXXMLReader), NULL, CLSCTX_ALL)))
    {
        return false;
    }

    if (FAILED (pSAXReader ->putContentHandler (pISAXContentHandler)) ||
        FAILED (pSAXReader ->putDTDHandler (pISAXDTDHandler)) ||
        FAILED (pSAXReader ->putErrorHandler (pISAXErrorHandler)) ||
        FAILED (pSAXReader ->putProperty (
        L"http://xml.org/sax/properties/lexical-handler", CComVariant (pMXWriter))) ||
        FAILED (pSAXReader ->putProperty (
        L"http://xml.org/sax/properties/declaration-handler", CComVariant (pMXWriter))))
    {
        return false;
    }

    // Perform the write

    return 
       SUCCEEDED (pMXWriter ->put_output (CComVariant (pStream))) &&
       SUCCEEDED (pSAXReader ->parse (CComVariant (pDoc)));
}

OTHER TIPS

Here's a modified version of the accepted answer that will transform in-memory (changes only in the last few lines but I'm posting the whole block for the convenience of future readers):

bool CXml::FormatDOMDocument(IXMLDOMDocument *pDoc)
{
    // Create the writer
    CComPtr <IMXWriter> pMXWriter;
    if (FAILED (pMXWriter.CoCreateInstance(__uuidof (MXXMLWriter), NULL, CLSCTX_ALL))) {
        return false;
    }
    CComPtr <ISAXContentHandler> pISAXContentHandler;
    if (FAILED (pMXWriter.QueryInterface(&pISAXContentHandler))) {
        return false;
    }
    CComPtr <ISAXErrorHandler> pISAXErrorHandler;
    if (FAILED (pMXWriter.QueryInterface (&pISAXErrorHandler))) {
        return false;
    }
    CComPtr <ISAXDTDHandler> pISAXDTDHandler;
    if (FAILED (pMXWriter.QueryInterface (&pISAXDTDHandler))) {
        return false;
    }

    if (FAILED (pMXWriter->put_omitXMLDeclaration (VARIANT_FALSE)) ||
        FAILED (pMXWriter->put_standalone (VARIANT_TRUE)) ||
        FAILED (pMXWriter->put_indent (VARIANT_TRUE)) ||
        FAILED (pMXWriter->put_encoding (L"UTF-8")))
    {
        return false;
    }

    // Create the SAX reader
    CComPtr <ISAXXMLReader> pSAXReader;
    if (FAILED(pSAXReader.CoCreateInstance(__uuidof (SAXXMLReader), NULL, CLSCTX_ALL))) {
        return false;
    }

    if (FAILED(pSAXReader->putContentHandler (pISAXContentHandler)) ||
        FAILED(pSAXReader->putDTDHandler (pISAXDTDHandler)) ||
        FAILED(pSAXReader->putErrorHandler (pISAXErrorHandler)) ||
        FAILED(pSAXReader->putProperty (L"http://xml.org/sax/properties/lexical-handler", CComVariant (pMXWriter))) ||
        FAILED(pSAXReader->putProperty (L"http://xml.org/sax/properties/declaration-handler", CComVariant (pMXWriter))))
    {
        return false;
    }

    // Perform the write
    bool success1 = SUCCEEDED(pMXWriter->put_output(CComVariant(pDoc.GetInterfacePtr())));
    bool success2 = SUCCEEDED(pSAXReader->parse(CComVariant(pDoc.GetInterfacePtr())));

    return success1 && success2;
}

Even my 2 cents arrive 7 years later I think the question still deserves a simple answer wrapped in just a few lines of code, which is possible by using Visual C++'s #import directive and the native C++ COM support library (offering smart pointers and encapsulating error handling).

Note that like the accepted answer it doesn't try to fit into the CXml class the OP is using but rather shows the core idea. Also I assume msxml6.

Pretty-printing to any stream

void PrettyWriteXmlDocument(MSXML2::IXMLDOMDocument* xmlDoc, IStream* stream)
{
    MSXML2::IMXWriterPtr writer(__uuidof(MSXML2::MXXMLWriter60));
    writer->encoding = L"utf-8";
    writer->indent = _variant_t(true);
    writer->standalone = _variant_t(true);
    writer->output = stream;

    MSXML2::ISAXXMLReaderPtr saxReader(__uuidof(MSXML2::SAXXMLReader60));
    saxReader->putContentHandler(MSXML2::ISAXContentHandlerPtr(writer));
    saxReader->putProperty(PUSHORT(L"http://xml.org/sax/properties/lexical-handler"), writer.GetInterfacePtr());
    saxReader->parse(xmlDoc);
}

File stream

If you need a stream writing to a file you need an implementation of the IStream interface.
wtlext has got a class, which you can use or from which you can deduce how you can write your own.

Another simple solution that has worked well for me is utilising the Ado Stream class:

void PrettySaveXmlDocument(MSXML2::IXMLDOMDocument* xmlDoc, const wchar_t* filePath)
{
    ADODB::_StreamPtr stream(__uuidof(ADODB::Stream));
    stream->Type = ADODB::adTypeBinary;
    stream->Open(vtMissing, ADODB::adModeUnknown, ADODB::adOpenStreamUnspecified, _bstr_t(), _bstr_t());
    PrettyWriteXmlDocument(xmlDoc, IStreamPtr(stream));
    stream->SaveToFile(filePath, ADODB::adSaveCreateOverWrite);
}

Glueing it together

A simplistic main function shows this in action:

#include <stdlib.h>
#include <objbase.h>
#include <comutil.h>
#include <comdef.h>
#include <comdefsp.h>
#import <msxml6.dll>
#import <msado60.tlb> rename("EOF", "EndOfFile")  // requires: /I $(CommonProgramFiles)\System\ado


void PrettyWriteXmlDocument(MSXML2::IXMLDOMDocument* xmlDoc, IStream* stream);
void PrettySaveXmlDocument(MSXML2::IXMLDOMDocument* xmlDoc, const wchar_t* filePath);


int wmain()
{
    CoInitializeEx(nullptr, COINIT_MULTITHREADED);

    try
    {
        MSXML2::IXMLDOMDocumentPtr xmlDoc(__uuidof(MSXML2::DOMDocument60));
        xmlDoc->appendChild(xmlDoc->createElement(L"root"));

        PrettySaveXmlDocument(xmlDoc, L"xmldoc.xml");
    }
    catch (const _com_error&)
    {
    }

    CoUninitialize();

    return EXIT_SUCCESS;
}


// assume definitions of PrettyWriteXmlDocument and PrettySaveXmlDocument go here

Unless the library has a format option then the only other way is to use XSLT, or an external pretty printer ( I think htmltidy can also do xml) There doen't seem to be an option in the codeproject lib but you can specify an XSLT stylesheet to MSXML.

I've written a sed script a while back for basic xml indenting. You can use it as an external indenter if all else fails (save this to xmlindent.sed, and process your xml with sed -f xmlindent.sed <filename>). You might need cygwin or some other posix environment to use it though.

Here's the source:

:a
/>/!N;s/\n/ /;ta
s/  / /g;s/^ *//;s/  */ /g
/^<!--/{
:e
/-->/!N;s/\n//;te
s/-->/\n/;D;
}
/^<[?!][^>]*>/{
H;x;s/\n//;s/>.*$/>/;p;bb
}
/^<\/[^>]*>/{
H;x;s/\n//;s/>.*$/>/;s/^    //;p;bb
}
/^<[^>]*\/>/{
H;x;s/\n//;s/>.*$/>/;p;bb
}
/^<[^>]*[^\/]>/{
H;x;s/\n//;s/>.*$/>/;p;s/^/ /;bb
}
/</!ba
{
H;x;s/\n//;s/ *<.*$//;p;s/[^    ].*$//;x;s/^[^<]*//;ba
}
:b
{
s/[^    ].*$//;x;s/^<[^>]*>//;ba
}

Hrmp, tabs seem to be garbled... You can copy-waste from here instead: XML indenting with sed(1)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top