Question

I'm refactoring a Java EE project to use MarkLogic, and would like to take advantage of MarkLogic's built-in HTML Tidy functionality. Is it possible to make use of HTML Tidy from the MarkLogic Java API? Or am I going to need to use a third-party API that lets me run XQuery commands directly?

Or is this a fool's errand, and I should just use HTML Tidy in my code?

Was it helpful?

Solution

Please see the com.marklogic.client.example.cookbook.DocumentWriteTransform example distributed with the Java API, which uses a server-side XQuery transform on document write to invoke xdmp:tidy().

OTHER TIPS

The example Erik cited installs an XQuery transform. It's name is html2xthml.xqy -- it's packaged somewhere in the distribution. The example has one method for installing the transform and then an example of how to invoke it. The invocation part is at line 126.

writeMgr.write(docId, writeHandle, transform);

Just above that you'll see how the transform is created and configured.

The idea is that you can use REST (via the Java API) to install the transform at /v1/transforms/html2xhtml and then invoke it during a document PUT (using this java write method) with the transform name as parameter.

So the reference you're looking for is in the XQuery transform, not the Java source file.

I would be inclined to leave tidy in the Java layer, as long as you are planning to keep Java in the picture anyway. Running tidy in the JVM gives you more control: you can install whatever version of jtidy you like, and even patch it yourself. Also tidy can be fairly CPU-intensive, so running it in the JVM layer would keep it from competing with database queries.

Of course you might have other strong incentives to run tidy in MarkLogic. For example you might be planning to allow direct REST integration with your MarkLogic code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top