Question

After a lot of reference from stack overflow answers I have the following XSL which converts a CSV into XML using the column headings as the node name for each appropriate cell.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    exclude-result-prefixes="xsl">
    <xsl:output method="xml" encoding="utf-8" />

    <xsl:variable name="newline" select="'&#10;'" />
    <xsl:variable name="comma" select="','" />
    <xsl:variable name="csv" select="." />
    <xsl:variable name="fields" select="substring-before( concat( $csv, $newline ), $newline )" />

    <xsl:template match="/">
        <xsl:element name="EXCHANGE">
            <xsl:element name="DDM">
                <xsl:call-template name="write-row">
                    <xsl:with-param name="rows" select="substring-after( $csv, $newline)"/>
                </xsl:call-template>
            </xsl:element>
        </xsl:element>
    </xsl:template>

    <xsl:template name="write-row">
        <xsl:param name="rows"/>

        <xsl:variable name="this-row" select="substring-before( concat( $rows, $newline ), $newline )" />
        <xsl:variable name="remaining-rows" select="substring-after( $rows, $newline )" />

        <xsl:if test="string-length($this-row) > 1">
            <xsl:element name="DDMSRS">
                <xsl:call-template name="write-item">
                    <xsl:with-param name="columns" select="$fields"/>
                    <xsl:with-param name="row" select="$this-row" />
                </xsl:call-template>
            </xsl:element>
        </xsl:if>

        <xsl:if test="string-length( $remaining-rows ) > 0">
            <xsl:call-template name="write-row">
                <xsl:with-param name="rows" select="$remaining-rows" />
            </xsl:call-template>
        </xsl:if>
    </xsl:template>


    <xsl:template name="write-item">
        <xsl:param name="row"/>
        <xsl:param name="columns"/>

        <xsl:variable name="col" select="substring-before( concat( $columns, $comma ), $comma)" />
        <xsl:variable name="remaining-items" select="substring-after( $row, $comma )" />
        <xsl:variable name="remaining-columns" select="substring-after( $columns, $comma )" />

        <xsl:if test="$col != ''">
            <xsl:element name="{$col}">
                <xsl:value-of select="substring-before( concat( $row, $comma ), $comma)" /> 
            </xsl:element>
        </xsl:if>

        <xsl:if test="string-length( $remaining-items ) > 0">
            <xsl:call-template name="write-item">
                <xsl:with-param name="columns" select="$remaining-columns"/>
                <xsl:with-param name="row" select="$remaining-items" />
            </xsl:call-template>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>

Running the XSL on a csv like this (line breaks being the row separator):

<root><![CDATA[COL_HEAD1,COL_HEAD2,COL_HEAD3
123456789,Peter,My address
]]></root>

Will return the following xml:

<?xml version="1.0" encoding="utf-8"?>
<EXCHANGE>
    <DDM>
        <DDMSRS>
            <COL_HEAD1>123456789</COL_HEAD1>
            <COL_HEAD2>Peter</COL_HEAD2>
            <COL_HEAD3>My address</COL_HEAD3>
        </DDMSRS>
    </DDM>
</EXCHANGE>

The issue I now have is when I want to process a lot of rows in a csv (1000 or more) I run out of memory.

I have seen reference to divide and conquer in other stackoverflow questions but I can't figure out how to split my string in half.

So my questions are:

  1. How do I perform divide and conquer in this scenario?
  2. Are there any other ways of improving the performance of this XSL using XSLT 1.0?
Was it helpful?

Solution

Performing a divide and conquer in this scenario with XSLT is not trivial. There are ways to optimize XSLT processing and you might want to try a different processor which does things differently, but your code is not easy to improve, since it practically doesn't do any XML processing. It operates on one big single element node containing a string. You are practically only using XPath functions to parse Strings and XSLT variables to store them. It would be more efficient to get the XSLT overhead out of the way.

Your options include:

  1. Increase your memory (using the -Xmx options).
  2. Break your file into smaller parts (which need to be well formed XML) and process each part separately in XSLT.

But XSLT can't help you with option no. 2, since to start processing your file it will need to load it all into memory. It can't load part of the text and split it, since each fragment has to be well-formed XML. Not even a SAX parser might be that efficient, since you only have one node. You are better off using an efficient string parser where you could split your CSV and then wrap each fragment inside XML tags.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top