Processing hl7 type message using xslt or regex, or combination of two (XSLT 1.0)

Question 1

Given that your Hl7 message is "|^~\&" encoded and not in an XML format, it is not clear how you will be using an XSLT 1.0 processor for your task. Can you describe your processing pipeline in greater detail? Your snippets are not complete messages, and it is not clear whether you will be starting with complete messages or attempting to parse isolated fields handed to a larger processing task through parameters or something.

If your processing starts with a complete HL7 message, I would suggest looking into the HAPI project, or a similar set of libraries, to have the messages converted from |^~\& to </> format, then invoking your XSLT on that version of the data. (You could also use the HAPI libraries in a full-Java solution. In either case, there are code examples at the HAPI site and at an Apache site on HL7.) If you are not interested in using Java at all, but are open to partial non-XSLT solutions, there are other projects that provide similar serialization options (e.g., Net::HL7 for Perl, nHAPI for VB/C#, etc.).

If you have isolated "|^~\&" encoded data in an otherwise XML formatted file, then I would suggest looking into the str:tokenize function in the XSLT 1.0 exslt functions. (XSLT 2.0 has a built-in tokenize function.) You can have str:tokenize split your data on the field or component separators, then create elements using the tokenized substrings.

Here is a stylesheet

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:str="http://exslt.org/strings"
    extension-element-prefixes="str"
    version="1.0">

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="data">
        <Location>
        <xsl:for-each select="str:tokenize(.,'|')">
            <xsl:call-template name="handle-field">
                <xsl:with-param name="field" select="."/>
            </xsl:call-template>
        </xsl:for-each>
        </Location>
    </xsl:template>

    <xsl:template name="handle-field">
        <xsl:param name="field"/>
        <xsl:variable name="components" select="str:tokenize($field,'^')"/>
        <item>
            <when><xsl:value-of select="$components[1]"/></when>
            <UnitName><xsl:value-of select="$components[2]"/></UnitName>
            <room><xsl:value-of select="$components[3]"/></room>
            <bed><xsl:value-of select="$components[4]"/></bed>
        </item>
    </xsl:template>

</xsl:stylesheet>

that runs over this input

<?xml version="1.0" encoding="UTF-8"?>
<data>20130601003203^GBMC^XXYZ^110|20130602130600^Sanai^ABC^|20130602150003^John Hopkins^J615^A|</data>

to produce this output with xsltproc:

<?xml version="1.0"?>
<Location>
  <item>
    <when>20130601003203</when>
    <UnitName>GBMC</UnitName>
    <room>XXYZ</room>
    <bed>110</bed>
  </item>
  <item>
    <when>20130602130600</when>
    <UnitName>Sanai</UnitName>
    <room>ABC</room>
    <bed/>
  </item>
  <item>
    <when>20130602150003</when>
    <UnitName>John Hopkins</UnitName>
    <room>J615</room>
    <bed>A</bed>
  </item>
</Location>

Question 2

Your source message is in a string form, you need to create a parser that uses regex to split the message based on first pipes and then carat. refer to Unable to parse ^ character which has my original code for the parser and the solution gives a different approach to it.

After you have individual elements you need to add it to your xml as nodes.