HL7 v2X and v3 data modeling

Question 1

I would under no circumstances attempt to model anything using the HL7 v3 RIM. The reason is that this schema is very generic, deferring much of the metadata to the message itself. Are you familiar with an EAV table? The RIM is like that.

On the other hand, HL7 v2 should be a fairly simple basis for a DB schema. You can create tables around segment types, and columns around field names.

I think the problem of pulling in everything kills the project and you should not do it. Typically, HL7 v2 messages carry a small subset of the whole, so it would be an utter waste to build out the whole thing, and it would be very confusing.

Further, the version of v2 you model would impact your schemas dramatically, with later versions, more and more fields become repeating fields, and your join relationships would change.

I recommend that you put a stake in the sand and start with v2.4 which is pretty easy yet still more complicated than most interfaces actually in use. Focus on a few segments and a few fields. MSH and PID first.

Add an EAV table to capture what may come in that you don't yet have in your tables. You can then look at what comes into this table over time and use it to decide what to build next. Your EAV could look like this MSG_ID, SEGMENT, SET_ID, FIELD_NAME, FIELD VALUE. Just store the unparsed HL7 contents of the field value.

Question 2

HL7 is not a "tight" standard inputs and expected outputs vary depending on the system you are talking to. In this case the adding in a broker such as Mirth, Rhaposdy or BizTalk is a very good idea.

What ever solution you employ make sure you can cope with "non standard" input and output as you will soon find things vary. On the HL7 versions 2X and 3 be aware that very few hospitals have the version 3 most still run 2X.

I have been down the road of working with a database that tried to follow the HL7 structure, it can work however it will take time and effort. Given that you have a tight dead line maybe break out the bits of the data you will need to search on and have fields (e.g. PID segment 3 is the patient id would be useful to have) the rest can go in your varchar. Also if you are not indexing on the column you could use varchar(max).

As for your Guids in the database, this can work fine, but be careful not to cluster any indexes using the Guid as this will fragment your data. Do your research here and if in doubt go for identity columns instead.

I'll recommend the entity framework too, excellent ORM, well worth learning.

So my overall advice. Go for a hybrid for now, breaking out what you need. Expect it to evolve over time breaking out the pieces of HL7 into their own areas as needed. Do write a generic HL7 parser (not too difficult I've done it a couple of times) and keep it flexible. But most of all expect the HL7 to vary in structure don't treat the specification as 100% truth you will get variations.

Question 3

I can only comment on the CDA (and some very limited HL7v2) side of things based on personal experience.

We receive and send CDA documents wrapped in HL7v3 wrappers from external vendors (as well as internal systems -- see below). The wrappers contain the metadata for things like sending/receiving systems/dates and other high-level data. The very limited message metadata is stripped and stored in the message data repository. Inside the wrapper, is the actual CDA, which is then taken and stored as XML datatype in the SQL database.

Using this model we can then search at the metadata level, but also narrow it down based on the CDA using Xpath queries. It makes the database much simpler...I can't even imagine creating columns based on the CDA schema.

As for making clients follow the CDA schema, as a part of the project we've created an implementation guide which clients must follow if they want to have their messages accepted.

Using the implementation guide + schematron + BizTalk and XSD validation, we only accept messages which follow the CDA schema. We then check some data fields using schematron validation and reject if any of those fail. This is relayed to the sender using an HL7v3 message back to them with the specific error message and/or fields that are invalid. This is a point at which a message will be stored in the database.

This is all done in BizTalk/SQL Server. And since the CDA schema is very much pre-defined by the HL7 group, you can make the consumers of this system follow the schema. This is unlike what I've seen with HL7v2 where it seems people just bend the schema as needed.

For the HL7v2 side of things, I'm 99% certain that "we" (as in, "my company") are storing the messages much in the same way. Except since since the HL7v2 schema is so open, we're not validating and just accepting/storing all messages. An HL7v2 parser has been written to parse the HL7v2 using the variations of schemas we know about.

In my project's case, we are sending HL7v2 from our HCIS --> Mirth --> BizTalk which then follows the Implementation guide + CDA Schema along with an XSLT transform to map the HL7v2 to CDA THEN submits it to the OTHER BizTalk CDA Submission service as though it was an external vendor.

That's a ton of reading right now, so please ask questions, as I'd like to talk about it.

Question 4

In most cases it's a waste of time to try to create a normalized relational data model to persist HL7 V2 or V3 data. I would recommend just storing entire messages or documents as single XML column values. Then query using SQLXML and/or XQuery. All modern relational databases support this now.

Question 5

Modeling on HL7 can be a pain.

I would do the following;

use the standards described in HL7 for staging tables, that way even if you have varchar(1024) and they are null it does not hurt you
create your actual table to be populated from the staging table as per the standards that you have enforced or will enforce.

This means that you have 500+ columns from the message but only 10 or 50 make sense, you will need to model only your 50. Yes, this has a lopside, tomorrow you want to make more meaning then it will increase from 50 to 75, the historical messages will not have information; which is fine but you will need to factor into the design.