Question

I have a HUGE file with a lot of HL7 segments. It must be split into 1000 (or so ) smaller files. Since it has HL7 data, there is a pattern (logic) to go by. Each data chunk starts with "MSH|" and ends when next segment starts with "MSH|".

The script must be windows (cmd) based or VBS as I cannot install any software on that machine.

File structure:

MSH|abc|123|....
s2|sdsd|2323|
...
..
MSH|ns|43|...
...
..
.. 
MSH|sdfns|4343|...
...
..
asds|sds

MSH|sfns|3|...
...
..
as|ss

File in above example, must be split into 2 or 3 files. Also, the files comes from UNIX, so newlines must remain as they are in the source file.

Any help?

Was it helpful?

Solution

This is a sample script that I used to parse large hl7 files into separate files with the new file names based on the data file. Uses REBOL which does not require installation ie. the core version does not make any registry entries.

I have a more generalised version that scans an incoming directory and splits them into single files and then waits for the next file to arrive.

Rebol [
    file: %split-hl7.r
    author: "Graham Chiu"
    date: 17-Feb-2010
    purpose: {split HL7 messages into single messages}
]

fn: %05112010_0730.dat
outdir: %05112010_0730/

if not exists? outdir [
    make-dir outdir
]

data: read fn
cnt: 0
filename: join copy/part form fn -4 + length? form fn "-"
separator: rejoin [ newline "MSH"]
parse/all data [
    some [
        [ copy result to separator | copy result to end ]
        (
            write to-file rejoin [ outdir filename cnt ".txt" ] result
            print "Got result"
            ?? result
            cnt: cnt + 1
        )
        1 skip
    ]
]

OTHER TIPS

HL7 has a lot of segments - I assume that you know that your file has only MSH segments. So, have you tried parsing the file for the string "(newline)MSH|"? Just keep a running buffer and dump that into an output file when it gets too big.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top