XML element names

https://stackoverflow.com/questions/3721507

03-10-2019
|

Question

I need to redefine an XML document and schema for my company. The document in question is split into a number of sections that each contain information about a medication, for example;

<dosage>overview of dose info
   <elderly>doses for elderly patients</elderly>
   <children>doses for children</children>
</dosage>
<administration>info about administering the med...</administration>

I strongly believe that the element names should be changed to reflect what the element is eg <section> with an attribute describing the content: <section displayName='dosage'>. Not all of my colleagues agree.

Is my thinking correct and can anyone provide guiding principles for element nomenclature that they have found useful in practice?

Solution

Consider the case of elderly and children. The tag should define what it is -- in this case they are both dosage instructions specific to a certain type of person. But using children and elderly doesn't communicate this information -- there's no relationship there. If instead it were <instructions target="elderly">...</instructions>, that relationship is maintained. Both are instructions for different targets.

For the dosage and administration sections, both of those could be considered to be properties of the medication. What you do here depends on the structure of the whole document and how it will be parsed. It seems to me that dosage is very distinct from administration. If you were defining this as an object in an OOL, you would have:

class Medication
{
    Dictionary<string, string> dosageInstructions; //or <PersonType, string>, preferably
    string administrationInfo;
}

Both of these are different properties, and there's no real parallel between them (well, other than that they're both properties of the medication). I don't think it would be useful to abstract that any more than it already is, but it's something that could be argued either way based on the structure of the entire document and how it's going to be used.

For example, if you are going to print out a list of key-value pairs, (for example, one key is administration and that value is the info) for a bunch of different properties, then that's the way to go. But dosage has a distinct structure from administration, so I don't think that that particular abstraction would be useful. If every medication has a fixed set of possible properties (dosage, administration info, etc) that will all be treated differently, then in my opinion it would be logical to use distinct tags for all of them.

As far as general guiding principles, I generally think "how would I define this document as an object," then consider what the XML serialization of that object would be. This works for me because I'm far more used to working with objects, but your mileage may vary. And there are certainly cases where that's not the best approach -- for example, if you're truly representing a document, like HTML, then that's not the way to go. But if you're using XML to define a regular data structure, it should generally work.

OTHER TIPS

I have found it that generally it is a bit clearer to have the XML defined as in the example you provided.

<dosage>
   <elderly>doses for elderly patients</elderly>
   <children>doses for children</children>
</dosage>
<administration>info about administering the med...</administration>

As an extreme example of your proposed nomenclature you could end up with this:

<field name="dosage">
    <field name="elderly">doses for elderly patients</field>
    <field name="children">doses for children</field>
</field>

Of course, in the end it all depends on the specific application, but generally I would try to abstract enities and properties from the real world to XML as much as it is needed, but not more.

So in this example "section" element is an overabstraction.

I think that's going a bit far. I follow a rule of, does it make semantic sense out of context? Section might make sense out of context but you know you're losing semantic information that is relevant. So what do we need to know about it? That it contains doosage information. So perhaps dosageinfo would be better?

Following the same approach for elderly and children we would assume these elements represent elderly people and children. Um... not really. If their names reflect what they do, they'd be something more like:

<dosageinfo>
<dosage recipient="elderly">Blah</dosage>
<dosage recipient="children"></dosage>
</dosageinfo>

That said, this is certainly not a formal method - I've never actually seen a formal method proposed.

Whilst I'm here, and having significant experience with handling clinical data in various ways, I'd also suggest you try and get some of your free text into formalised XML data, even if you have to use Natural Language Parsing to glean some of it. Any formalised data, even AI-gleaned data so long as its properly represented as such, can make querying the information much easier in future. It might not be relevant to your scenario, but I feel it's worth considering.

Data in free text is only useful as information. Data in relationships is data and information.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow