Whether to embed linked resources in REST API

https://softwareengineering.stackexchange.com/questions/313079

13-12-2020
|

سؤال

I am building a REST API where clients can query user-sent messages, like this:

GET http://example.com/api/v1/messages?from=0&to=100

Response:

[
    {
        "id": 12345,
        "text": "Hello, world!"
    },
    {
        "id": 12346,
        "text": "Testing, testing"
    },
    ...
]

Now, I need to include the name of the user who sent the message. I can think of 4 ways to store this information, but I can't decide which is best:

Option 1:

[
    {
        "id": 12345,
        "sender_id": 16,
        "text": "Hello, world!"
    }
]

This method is the most efficient for large scale - if the client queries the API many times, they can cache a map of user ID to name and reuse that. However, for one-off queries, it doubles the amount of API calls that the client would have to perform (once for the message list, and another to find the name for a given user ID).

Option 2:

[
    {
        "id": 12345,
        "sender_name": "John Smith",
        "text": "Hello, world!"
    }
]

This method is simplest for client consumption, but if the API ever needs to be changed to include the sender ID (e.g. for linking from message to user), I would need to have two "sender" fields in the message object, sender_id and sender_name, which is essentially a worse version of option 3.

Option 3:

[
    {
        "id": 12345,
        "sender": {
            "id": 16,
            "name": "John Smith"
        },
        "text": "Hello, world!"
    }
]

This approach embeds the sender object into each message, which makes it future-proof and only requires a single API call per query. However, it adds a lot of redundant information if there are many messages and few users (for example, querying all the messages sent by a single user).

Option 4:

{
    "users": [
        {
            "id": 16,
            "name": "John Smith"
        },
        ...
    ],
    "messages": [
        {
            "id": 12345,
            "sender_id": 16,
            "text": "Hello, world!"
        }
    ]
}

This solves the redundancy problem with #3, but it adds a lot of complexity to the structure of the response object.

المحلول

You should probably go for option 1, but made more RESTful. It may also make sense to provide something like option 3/4. There's nothing forcing you to pick only one.

However, what you should do is replace ids with links. Instead of having a "sender_id", you should just have a URI pointing at that user resource. Notice how this means I don't need to know what URI to append the "sender_id" to or how to format it. Unfortunately, JSON does not have a type for links. You will want to look at hypermedia formats embedded in JSON, e.g. HAL or JSON-LD.

Now you are right that if I want to fetch the record, I need to make two requests. The good news though, is those requests will be cached by the normal mechanisms of HTTP. A content delivery network can even cache resources across users reducing load on your servers and latency for the users. Of course, I'm not precluded from having a resource that returns all the users or the data for all the users given a list of user URIs so that I can batch requests. You could consider OData as a organized means of providing such query capabilities, but you can certainly do something simpler. For example, when returning the messages, you could also add a link to, say, "recent contacts" that has the expanded data for all the users in the current list of messages. Notice how I say provide a link and not add a messages/users/ resource. You may well add such a resource, but I shouldn't need to know your URI structure to find it. This makes it easier for me and allows you to change your URIs in the future without breaking consumers.

Sometimes there will be transactional guarantees you'll want to enforce. For example, in the above, I could get the messages, then request the recent contacts, and that list may differ from the list of sender URIs I have in the messages due to a race condition. In cases where it is important to guarantee that kind of atomicity, you should embed the resources. Some of the hypermedia formats I referenced above provide a mechanism for that. Usually, these situations will naturally correspond to a new resource. Remember, you do not need a 1-1 correspondence between resources and "entities" or whatever. In fact, such a 1-1 correspondence probably means you are doing something wrong. Overall, though, you should try to minimize points that need such transactional guarantees. In the above example, it would be unnecessary, since even if a sender was missing from that list of "recent contacts", I could just request their information individually.

نصائح أخرى

I think it depends on the needs as well. Let's say the client app is showing a list of 100 elements, and each message has to have sender displayed. If we place all required data in the JSON (i.e. sender ID and sender name) then we do just one request. If there are links to senders instead, we have 101 requests. Don't tell me this is efficient :)

But I agree that URIs are good practice, for example when you implement a feature when user may click the sender name and see more info about him. So my take on this structure would be following:

[
    {
        "id": 12345,
        "sender": {
            "link": "www.api.example.com/users/16",
            "name": "John Smith"
        },
        "text": "Hello, world!"
    }
]

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى softwareengineering.stackexchange