Java, dealing with XML and JPA Annotated Classes

https://softwareengineering.stackexchange.com/questions/415882

15-03-2021
|

Question

I use xjc to compile XSD files to Java Classes, and want to edit/extend them to make them persistable through JPA.

I cant figure out what the best "Coupling?" would be and how to organize it, if I modify the xjc compiled Classes to be persistable, I would lose the Ability to recompile the XSD if there are any changes.

Even if I didn't need to recompile, in some cases, I still wouldn't be able to serialize/deserialize Data that was collected before my Schema changed.

I don't want to completely "decouple", since I want to keep the ability to Persist the imported XML Object with just a few lines of code (not having to write a sort of "converter" for each XML Entity).

Has anyone been faced with the same issue? How did you manage to solve it.

Solution

I think the biggest problem you're facing here is that for classes that have to be "visible" to something outside your code, it is important to have one source of truth for what counts as valid data. Otherwise you'll end up with endless work hours spent on syncing up the many definitions and their implementations. Either the specifications are out-of-sync (or get out-of-sync over time) or the converters are incompatible with the latest versions of the specs etc. In your case, the classes are both visible to

whatever is producing/consuming the XML data that is converted to/from the Java classes
and to the database behind JPA which imposes its own type and data integrity requirements onto your data.

Your approach is in danger of violating this important principle if you're not careful about it. Either the XSD is the source of truth or the JPA-definitions are or something else entirely is, but according to this principle n-1 of n definitions have to be dependent on the n-th.

So what is the solution? As always, it strongly depends on what you want to do.

Maybe there aren't that many classes and maybe they never change or change only very little and very infrequently. In that case you may decide to accept the cost of having two sources of truth.

How would you do that? As you've already said, it's very fragile to modify auto-generated code. It's also needlessly work intensive to have separated (sub)classes just for the JPA-annotations, because that would necessitate lots of boiler-plate code for converters. Luckily the JPA Spec allows for XML-configuration of the ORM mappings instead of the usual annotation based configuration. Have a look at the orm.xml file. With it you can define a class to be a JPA entity without any modifications to its source code.
If your XSD is simple enough, it may be possible to generate the database structure directly from the XSD. There are tools for that. That doesn't give you the ORM-mapping though. But if the XSD is simple enough, it may also be possible to auto-generate the orm.xml file from the XSD. This would re-establish the XSD as your one-source-of-truth.
If 1. and 2. are no options for you, because your data structure is too complicated, maybe it is possible to have third specification that can serve as the source-of-truth for both and is able to generate both the XSDs and the JPA definitions in some form. In principle you could have an augmented XML file that contains both the XSD and the ORM definitions as well as an XSLT that extracts the two halves. (Of course you could use any other (meta-)language instead of XML for the master spec if you like) Then your compilation process would need another step in front. But beware: depending on the complications in your data structure this maybe head-scratchingly hard to debug / modify if the need arises.
You could have the Java Code be the source of truth and export the XSD instead. Of course, then you could use JSON export instead of XML much easier. I don't think that's very likely, because XSDs are usually given from external specifications and not generated on a whim, but hey, maybe you are the exception to the rule?

In any case you need to be aware that the finer points of XMLs and ORMs like JPA are not necessarily compatible with each other and your desire for a complete unification may be doomed from the start. It may simply be the case that some aspects of your XML data cannot be converted into database land and vice versa. For example: A concept of transactions does not exist in XML land. JPA has (limited) knowledge of stored procedures, database triggers and other (semi)advanced features of databases. I am not aware of anything in XML land that's somewhat similar to that. And conversely, there is no close cousin of XSLT in database land.

On top of that: JPA can be very inefficient if you're handling large amounts of data; it sometimes writes horrendous SQL if your configuration isn't optimised; it has some not-completely-straight-forward behaviour with transactions, lazy-loading etc. It is great for abstracting away the details of databases, but as with any such abstraction it can cost you if you don't know what you're doing. It may simply be impossible to get all the finer points correct when you leave all of that to an automated tool.

You should have that in mind so that you can provide a good interfaces both for adding missing features on either end and for fine-tuning of the mappings.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange