Question

I need to merge two XML log files. One log file contains a trace with position updates. The other log file contains the received messages. There can be multiple received messages without having a position update inbetween.

Both logs have timestamps:

  • The trace log uses <date> (eg. 14.7.2012 11:08:07)
  • The message log uses a unix timestamp <timeStamp> (eg. 1342264087)

The structure of the trace looks like:

<item>
        <date>14.7.2012 11:08:07.222</date>
        <MyPosition>
        // Position data
        </MyPosition>
</item>
<item>
        <date>14.7.2012 12:13:07.112</date>
        <MyPosition>
        // Position data
        </MyPosition>
</item>
...

The structure of the messages is like that:

<Message>
    // some content of the message
    <subTag>
        <timeStamp>1342264087</timeStamp>
    </subTag>
    // other content of the message
</Message>
<Message>
    // same as above
</Message>
...

When doing the merging, the timestamps should be read (also converting/comparing "date" and "timestamp") and all positions and messages added in the right order.

The position data can just be added as it is. However, the message should be placed inside of <item> tags, a <date> tag should be added (based on the messages' unix time) and the <Message> tag should be replaced by <m:Message type="received"> tags.

Unfortunately not a "simple" merging, especially as the size of the log files lays between 5 MB and 700 MB... :-/

A result could look like this:

<item>
        <date>14.7.2012 11:08:07.222</date>
        <MyPosition>
        // Position data
        </MyPosition>
</item>
<item>
        <date>14.7.2012 11:09:10.867</date>
        <m:Message type="received">
        // content of the <Message>
        </m:Message>
</item>
<item>
        <date>14.7.2012 12:10:11.447</date>
        <m:Message type="received">
        // content of the former <Message>
        </m:Message>
</item>
<item>
        <date>14.7.2012 12:13:07.112</date>
        <MyPosition>
        // Position data
        </MyPosition>
</item>
<item>
        <date>14.7.2012 12:17:11.227</date>
        <m:Message type="received">
        // content of the former <Message>
        </m:Message>
</item>
...

Are there any tools which support a merging like that? Or is there any simple way to solve this using java?

I really appreciate any tips on how to solve this matter.

Was it helpful?

Solution

This XSLT 2.0 transformation (for convenience containing the small message-log sample inline):

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:m="some:M" exclude-result-prefixes="xs m">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:variable name="vDateU0" select="xs:dateTime('1970-01-01T00:00:00')"/>

 <xsl:variable name="vMessages">
    <Message>     // some content of the message
        <subTag>
            <timeStamp>1342264087</timeStamp>
        </subTag>     // other content of the message
    </Message>
    <Message>     // some content of the message2
        <subTag>
            <timeStamp>1342264089</timeStamp>
        </subTag>     // other content of the message2
    </Message>
 </xsl:variable>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/">

  <xsl:variable name="vProcessedMessages">
   <xsl:apply-templates select="$vMessages/*"/>
  </xsl:variable>

  <xsl:variable name="vProcessedTrace">
   <xsl:apply-templates select="/*/*"/>
  </xsl:variable>


  <xsl:perform-sort select="$vProcessedMessages/*|$vProcessedTrace/*">
    <xsl:sort select="xs:dateTime(date)"/>
  </xsl:perform-sort>

 </xsl:template>

 <xsl:template match="Message">
  <xsl:variable name="vUnixDuration" select=
   "concat('PT', */timeStamp, 'S')"/>
  <item>
   <date><xsl:sequence select=
    "$vDateU0 + xs:dayTimeDuration($vUnixDuration)"/>
   </date>
   <m:Message type="received">
     <xsl:sequence select="text()[1]"/>
   </m:Message>
  </item>
 </xsl:template>

 <xsl:template match="date/text()">
  <xsl:variable name="vdatePart" select="substring-before(., ' ')"/>

  <xsl:variable name="vYear" select=
  "substring-after(substring-after($vdatePart, '.'), '.')"/>

  <xsl:variable name="vMonth" select=
  "substring-before(substring-after($vdatePart, '.'), '.')"/>

  <xsl:variable name="vDay" select="substring-before(., '.')"/>

  <xsl:variable name="vFormattedMonth" select=
  "if(string-length($vMonth) eq 1)
    then concat('0', $vMonth)
    else $vMonth
    "/>

  <xsl:variable name="vFormattedDay" select=
  "if(string-length($vDay) eq 1)
    then concat('0', $vDay)
    else $vDay
    "/>

  <xsl:value-of select=
  "concat($vYear,
          '-',
          $vFormattedMonth,
          '-',
          $vFormattedDay,
          'T',
          substring-after(., ' ')
          )"/>
 </xsl:template>
</xsl:stylesheet>

when performed on the provided Trace-log XML document:

<items>
    <item>
        <date>14.7.2012 11:08:07.222</date>
        <MyPosition>         // Position data         </MyPosition>
    </item>
    <item>
        <date>14.7.2012 12:13:07.112</date>
        <MyPosition>         // Position data         </MyPosition>
    </item>
</items>

merges the two logs as required:

<item>
   <date>2012-07-14T11:08:07</date>
   <m:Message xmlns:m="some:M" type="received">     // some content of the message
        </m:Message>
</item>
<item>
        <date>2012-07-14T11:08:07.222</date>
        <MyPosition>         // Position data         </MyPosition>
    </item>
<item>
   <date>2012-07-14T11:08:09</date>
   <m:Message xmlns:m="some:M" type="received">     // some content of the message2
        </m:Message>
</item>
<item>
        <date>2012-07-14T12:13:07.112</date>
        <MyPosition>         // Position data         </MyPosition>
</item>

Note: In the real case the Message-log will be obtained using the document() function.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top