Suggestions for structuring complex json structures?

https://softwareengineering.stackexchange.com/questions/307206

11-12-2020
|

Pergunta

I can't find many tips for how to design complex json structures beyond the obvious tips of not trying to nest too deeply, using defined data types, etc.

For example, if I have a location that needs to have security scanning done on all of its segments and devices within the segments, there are many options of how I could do this.

{
    "site": "Site 1",
    "segments": [
        {
            "name": "Segment 1",
            "devices": [
                {
                    "name": "Device 1",
                    "scans": [
                        {
                            "type": "discovery",
                            "date": "2016-01-12",
                            "phase": "10",
                            "remediate": "0"
                        },
                        {} ...
                    ]
                },
                {} ...
            ]
        },
        {} ...
    ]
}

For this example, a few questions come to mind:

Is it okay to use the property "name" twice, since they are on different levels? I've read that it's better to keep the property names short for parsers. Therefore, should you use it twice? Or change them to "seg_name" and "dev_name", for example?
You can see a clear pattern for "segments", "devices", and "scans" where they are each an array of objects.

I could change it to something like this:

{
    "site": "Site 1",
    "segments": {
        "Segment 1": {
            "Device 1": {
                "discovery": {
                    "date": "2016-01-12",
                    "phase": "10",
                    "remediate": "0"
                },
                "exploit": {} ...
            },
            "Device 2": {} ...
        },
        "Segment 2": {} ...
    }
}

The issue I can see popping up with this format is that if you wanted to have a property for all of the segments, you would have to put it at the root level, instead of inside the "segments" property, since the property name could possibly conflict with a segment name. However, it is less nested, which is a plus.

I'm wondering if there are some guidelines of which situations are best suited for a certain format?

If it's really dependent on what language you are using it for, I would be sending the data between JavaScript and PHP.

Solução

I'm no expert on JS or PHP so there might be some caveats I'm not aware of, but these are some ideas that come to my mind when I see this.

First off, I think your first idea is great. To answer the points you are raising:

Is it okay to use the property "name" twice, since they are on different levels? I've read that it's better to keep the property names short for parsers. Therefore, should you use it twice? Or change them to "seg_name" and "dev_name", for example?

If I have a type MyType and want to store its name, I'd really expect it to be MyType.Name, not MyType.MyTypeName. Generally, a property should not be concerned with what its parent is, that's the parent's responsibility.

In JSON, all the key-value pairs should be scoped so that there is a difference between json["key1"].Name and json["key1"].Elements[0].Name as those are two separate entities living in different nodes. In case some parser doesn't respect this, it's probably rigged far beyond of what is acceptable json anyway.

If only reasoning would be parser speed, I don't think that adding a few characters to a string would make that much of a difference on the other hand.

You can see a clear pattern for "segments", "devices", and "scans" where they are each an array of objects.

The correct way to model a 1:N relationship (i.e. one segment has N devices) is using a collection, an array of objects in case of JSON, so you were originally on the right track.

If I wanted to go through all the devices on a given segment, I'd rather do it in a simple array accessing loop than checking if device1 is defined on the segment, then device2 is defined on the segment, ...

Concerning the deep nesting, the issue with that is most presumably human-readability. This can be mitigated by splitting the json into subparts, children nodes in the current code becoming the new roots.

To achieve that, it might be a good idea to think about how you query and send the data instead of formatting it differently (or even making it poorly structured). Consider this communication scenario:

Client queries available segments
Server replies with a list of available segmentIds
Client queries a segment by segmentId
Server replies with the segment data which in turn contains deviceIds
Client queries a device by segmentId and deviceId

...

This is obviously the extreme opposite of what you have right now, a very heavy client-server ping-pong which would be really inefficient in real world.

It's up to you to make the decision of how each payload should look, considering both number of requests to acquire sensible amount of data and the size of data in each response.

But this really feels to me more of an networking architecture problem, not really connected to whether you serialize to json, xml or binary.

If you are really concerned with the width due to nesting, there is no globally accepted standard. There was 80 characters width "limit" in the early days due to terminal screen size. Apart from strict hardware limits, I don't think it's necessary to stick to some number, maybe around 120-150 could be a nice guideline these days.

However, left to your values in JSON is meaningless whitespace so if you press your decrease indent-shortcut a few times, nothing bad happens. Given that your JSON will be handled by computers most of the time anyway, I think you can drop the readability requirement altogether in face of a good design.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange