Pregunta

In my JSON serialized data I have nested objects:

{
  "A" : { "A1": 1,
          "A2": 2 },
  "B" : { "B1": 3,
          "B2": 4 }
}

Due to given constraints I cannot influence, I need to flatten the structure. This means, every object with a depth greater than 1 has to be encoded as string. Applied on the example above this would like so:

{
  "A" : "\{\"A1\": 1, \"A2\": 2\}"
  "B" : "\{\"B1\": 3, \"B2\": 4\}"  
}

Since I need to express this constraint in JSON Schema I am pretty much bound to its syntactical rules. I guess either the type for these objects will then be either string or object.

{
  "title": "My Schema",
  "type": "object",
  "properties": {

    "A": {
      "type": "string vs. object" 
    "B": {
      "type": "string vs. object"
}
¿Fue útil?

Solución

I agree, either you choose the object or string type. I have looked into the JSON Schema documentation and I could not find anything to express the constraint as clear as needed. Hence, a short discussions of the two approaches there are to my mind.

Type String

JSON Schema defines seven primitive types, including object. A string is simply defined as a JSON string. The RFC 4627 defines a JSON string as follows

A string is a sequence of zero or more Unicode characters

This would apply to your case, even though the content of the string has to be restricted. The question is how to communicate the restriction. I would use a description to reference to another subschema. You can even define a pattern for the string an encode the subschema as regular expression. This, however, will be very error prone and not human readable at all. It could, however, be used for better schema validation of the data.

{
  "title": "My Schema",
  "type": "object",
  "properties": {

    "A": {
      "type": "string".
      "description": "Please refer to http://... for the subschema."
    },
    "B": {
      "type": "string"
      "description": "Please refer to http://... for the subschema."
    }
}

This has the advantage, that it is unmistakably clear that the JSON provider has to put a string into that property. The disadvantage is that the complete schema cannot be viewed as once, the description might be overseen and it is also cumbersome in the look up process. In the end it will be very confusing when seeing type string but a object is defined in the subschema.

Type Object

By using simply the type as it is you avoid all the disadvantages of using a string. The problem here really is that the description stating that is has to be a string encoding will be overlooked.

{
  "title": "My Schema",
  "type": "object",
  "properties": {

    "A": {
      "description": "Must be encoded as string",
      "type": "object",
      "properties": { "A1": { "type": "string" }, "A2": { "type": "string" } }
    },
    "B": {
      "description": "Must be encoded as string",
      "type": "object"
      "properties": { "A1": { "type": "string" }, "A2": { "type": "string" } }
    }
}

You can always make something completely bogus, like using the type string and defining properties for it, but this will be invalid JSON Schema.


I would recommend you to use the Type Object approach. While there is this constraint using the string type will always lead to degrading the data behind it. Constraints can be enforced in other ways to. Watch who provides the data, communicate the constraint to all parties, block data that is not valid with respect to this constraint etc.

And who knows, maybe this constraint will not there forever and if that changes, you would need to change the schema again in the other case you only would need to drop the comment stating the requirement of string encoding.

Otros consejos

I know you've already selected an answer, but I thought I'd just mention the principles at work here.

JSON Schema tries to avoid doing "semantic" validation - by that, they mean validation of data within scalar types (such as enforcing string formats or numerical precision).

If you want to document the internal format of a string value like this, you can either use a "format" value (a custom one presumably, as there is no suitable one in the standard).

... OR you could use "media". The value of this schema keyword is an object, which can have a property "type", which specifies a media type for string values. So your properties would look something like:

{
    "type": "string",
    "media": {
        "type": "application/json"
    }
}

Validators are allowed to ignore "format", and are highly unlikely to validate using "media" (it's not even tangentially mentioned), but in terms of describing your flattened data format, "media" is the most accurate way to go.

(However, as the accepted answer states, treating this as an bizarre serialisation method rather than a data constraint is in many ways a more elegant solution).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top