Content Validation in JCR

https://stackoverflow.com/questions/6019159

14-11-2019
|

Question

We are evaluating a few technologies to build a repository of WSDLs, and XSDs used within our organization. One of the options we have is to use Apache JackRabbit, that implements JCR 1.0 and 2.0. It almost meets our expectations on uploading contents, authentication and versioning. However, we are also planning to upload several pieces of metadata (e.g., createdBy,lastModifiedBy,lastModifiedTime, etc.) with the WSDLs and XSDs to the repository. We have read through several of the posts on StackOverflow, JCR specs and wiki pages in the JackRabbit's website, but did not quite understand - how the metadata that we are uploading can be validated ? For example, if we upload the metadata as an XML -formated content, we want the repository to validate the XML against an XML schema. In terms of the JCR API, is there a way to enable validation of XML while importing the XML content through Session.importXML ?

Solution

You might try looking at ModeShape. It's also an open source (LGPL-licensed) JCR implementation, but it has the notion of 'sequencers' that automatically derives information from the uploaded files and stores that information as structured content (e.g., subgraphs of nodes and properties) in the repository, where it can be searched, queried, and accessed like any other repository content. ModeShape has quite a few sequences already, but doesn't yet have WSDL or XSD sequencers (they're scheduled to appear within the next release, around the end of May 2011).

I'm the project lead for ModeShape, and I too am using it for storage of WSDL and XSD files (as well as other file formats). In fact, we're using JCR repositories to store all kinds of structured metadata.

As you mention, JCR does provide a way to import content, but the XML files that are imported are of one of two formats defined by the JCR specification (system view and document view). The System View XML format uses JCR-specific elements and attributes, whereas the Document View maps elements into nodes and attributes into properties (its actually a bit more nuanced). And because this import process will result in additional repository content (nodes and properties), JCR repositories do validate this structure using JCR's node type mechanism.

Here's an example of an XML file in Document View format:

<?xml version="1.0" encoding="UTF-8"?>
<Hybrid xmlns:car="http://www.modeshape.org/examples/cars/1.0" 
        xmlns:jcr="http://www.jcp.org/jcr/1.0" 
        xmlns:nt="http://www.jcp.org/jcr/nt/1.0" 
        xmlns:mix="http://www.jcp.org/jcr/mix/1.0" 
        jcr:primaryType="nt:unstructured" 
        jcr:uuid="7e999653-e558-4131-8889-af1e16872f4d"
        jcr:mixinTypes="mix:referenceable">
    <Toyota_x0020_Prius jcr:primaryType="car:Car" 
          jcr:mixinTypes="mix:referenceable" 
          jcr:uuid="e92eddc1-d33a-4bd4-ae36-fe0a761b8d89" 
          car:year="2008" car:msrp="$21,500" car:mpgHighway="45" 
          car:model="Prius" car:valueRating="5" car:maker="Toyota" 
          car:mpgCity="48" car:userRating="4"/>
    <Toyota_x0020_Highlander jcr:primaryType="car:Car" 
          jcr:mixinTypes="mix:referenceable" 
          jcr:uuid="f6348fbe-a0ba-43c4-9ae5-3faff5c0f6ec" 
          car:year="2008" car:msrp="$34,200" car:mpgHighway="25" 
          car:model="Highlander" car:valueRating="5" car:maker="Toyota" 
          car:mpgCity="27" car:userRating="4"/>
</Hybrid>

Here, 'Hybrid' is an 'nt:unstructured' node that contains two nodes of type 'car:Car' nodes. The 'car:Car' node type is defined as follows:

[car:Car] > nt:unstructured, mix:created
  - car:maker (string)
  - car:model (string)
  - car:year (string) < '(19|20)\d{2}'  // any 4 digit number starting with '19' or '20'
  - car:msrp (string) < '[$]\d{1,3}[,]?\d{3}([.]\d{2})?'   // of the form "$X,XXX.ZZ", "$XX,XXX.ZZ" or "$XXX,XXX.ZZ" 
                                                           // where '.ZZ' is optional
  - car:userRating (long) < '[1,5]'                        // any value from 1 to 5 (inclusive)
  - car:valueRating (long) < '[1,5]'                       // any value from 1 to 5 (inclusive)
  - car:mpgCity (long) < '(0,]'                            // any value greater than 0
  - car:mpgHighway (long) < '(0,]'                         // any value greater than 0
  - car:lengthInInches (double) < '(0,]'                   // any value greater than 0
  - car:wheelbaseInInches (double) < '(0,]'                // any value greater than 0
  - car:engine (string)
  - car:alternateModels (reference)  < 'car:Car'

If this node type is registered within the JCR repository, it will ensure that your imported content structure is valid per the node type definition.

If you're talking about validating the values of content (e.g., metadata values, structure of binary files, etc.), I'm not aware of any JCR repository implementation that can do this out of the box. JCR repositories are more general purpose, so this would be something that your application can do by using JCR event listeners to observe when new XML files (or content) are being uploaded into the repository, fetching the binary content that was just uploaded, and using other libraries to perform the validation.

Finally, you talk about storing extra properties on your uploaded files. I've wrote a blog post some time ago that talks about how define and use mixin node types do this with JCR 'nt:file' and 'nt:folder' nodes.

Hope this helps.

OTHER TIPS

As Randall says, the JCR API doesn't provide hooks to validate content while you're storing it.

One common pattern is to upload data to an intermediate location in the JCR tree, say /incoming, and have JCR observers watch this incoming data, validate it and move it to its final location if valid.

Another option is to use Apache Sling [1] which provides an OSGi-based scriptable application layer on top of a JCR repository. With Sling you can intercept HTTP POST requests, for example, to validate data before storing it.

[1] http://sling.apache.org

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow