Is it a bad idea to pass JSON objects on the query string for an API “search” operation?

https://softwareengineering.stackexchange.com/questions/403251

06-03-2021
|

سؤال

I'm building an API endpoint for a UI grid to search, filter, and display a list of domain objects, let's call them "widgets." In the past, I would have built this with a list of named query string parameters, like this:

GET /api/v1/widgets?type=2&name=what&from=2019-12-31&to=2020-01-03&pagesize=25&page=2&sort=name,-createdate

This would result in SQL something like

SELECT <selectlist>
FROM widget
WHERE type = 2
    AND name LIKE 'what%'
    AND createdate >= '2019-12-31'
    AND createdate <= '2020-01-03'
ORDER BY name ASC, createdate DESC
LIMIT 25, 25;

I've had a co-worker propose that instead of a long list of parameters, we pass a couple of JSON objects on the query string, like this:

"filters": {
    { "column": "type", "operator": "=", "value": 2 },
    { "column": "name", "operator": "like", "value": "what" },
    { "column": "createdate", "operator": ">=", "value": "2019-12-31" },
    { "column": "createdate", "operator": "<=", "value": "2020-01-03" }
},
"pagination": {
    "page": "2",
    "pagesize": "25",
    "sort": [ "name", "createdate" ],
    "sortdirection": [ "asc", "desc" ]
}

Which, after encoding, looks something like this as a URL (sorry if I messed up the encoding, I made this up as an example):

GET /api/v1/widgets?filters=[%7B%22column%22:%22type%22,%22operator%22:%22%3D%22,%22value%22:2%7D,%7B%22column%22:%22name%22,%22operator%22:%22like%22,%22value%22:%22what%22%7D,%7B%22column%22:%22createdate%22,%22operator%22:%22%3E%3D%22,%22value%22:%222019-12-31%22%7D,%7B%22column%22:%22createdate%22,%22operator%22:%22%3C%3D%22,%22value%22:%222020-01-03%22%7D]&pagination=%7B%22page%22:%222%22,%22pagesize%22:%2225%22,%22sort%22:[%22name%22,%22createdate%22],%22sortdirection%22:[%22asc%22,%22desc%22]%7D

We are in JavaScript on both client and server, so parsing the JSON object is not a difficult task. And I realize that the structure of the JSON object for querying would have to be altered, as it does not account for some things. Disregarding that, the basic question is:

Is this a bad/good idea? I see advantages and disadvantages, but I'm trying to look past my personal bias.

المحلول

Its not a brilliant idea. The URI is not really a good place for data of unpredictable length, and although there is no 'official' maximum length, many webservers apply their own limit (IIS is 2083 characters, for example). Some webservers also have restrictions on acceptable characters in URIs.

There are also other considerations that are generic to URIs. Are the contents of the query in any way sensitive? Many servers will log URIs and query string parameters which may leave sensitive data in places you dont want it.

For this kind of variable length data, using HTTP Post would be best solution.

If you did still want to use the query string, another option would be to base64 the JSON string, meaning you wouldnt need to worry about escaping the spaces and special characters. This would mean the URI was not 'human readable', but I would argue that URI encoded JSON isn't particularly easy to read either.

نصائح أخرى

What's bad is passing massive amounts of data in the URI, so a POST would be better in my opinion. Because of possible limitiations in size, because of problems passing arbitrary data, because of security concerns. That's of course a simple change. You generate the JSON, then you decide how to send it.

Check with the database guys that it is not difficult to generate efficient queries from the data in your JSON. I assume it isn't. And generating the JSON shouldn't be difficult. JSON gives you a chance to create more complex queries if you feel the need.

I went through almost the same analysis with a former coworker. He was adding a new API endpoint and wanted to pass in a complex JSON object into the query string instead of separate parameters.

Like yourself, I had reservations.

Exactly what the pros and cons are will depend on your specific product needs, architecture, team structure, and other factors.

In our case, I found the tradeoffs were simply:

Pro: Saved half a line of code in the API implementation

Cons:

Swagger (our API documentation tool) would have no intuitive way to document the structure of this API. API documentation, accessibility, and discoverability were important to us.
Inconsistent with the rest of the world. Adhering to common conventions and expectations simplifies things and helps avoid mistakes.

That said, I don't think this pattern is necessarily bad. In certain cases, like building up a dynamic set of search filters, it might make sense. Your use case might be a good candidate for this.

I don't totally agree with the pushback against GET that you have received in other answers. GET is the semantically correct HTTP verb because you are getting data. You're not mutating any state. It's idempotent and cacheable. POST is not.

GET requests can accept a body, just like POST requests and POST requests can accept a query string. There is a misconception that GET means query string and POST means body. They are orthogonal.

Factors I'd consider in your situation:

Documentability, if this is important to your organization and/or product
Tooling - Most HTTP routing frameworks have middleware to automatically parse query string parameters and expose them as a map. For example, in a node.js express app, I can check req.query.filters to get the filters. You'd have to do your own parsing if you go the JSON route, but that's not hard. It's a simple JSON.parse call.
Validation - There are modules which can automatically validate a request based on a schema you supply. Moving your inputs to a JSON object may create a black box, forcing you to validate inputs yourself (and we all know that inadequate input validation is one of the leading causes of security vulnerabilities)
Complexity - Do you really need to support infinite filters or even a dynamic list of filters? Do the operators need to be configurable like that? This JSON object design is highly extensible. If there are only 4 things to filter on and they always work the same way, hard coding will save you a ton of time.
Testing overhead - following from the complexity point above, every combination of criteria needs to be tested. If your API allows for any operator across any field, you need to test each one of these cases. If your clients only use the API a single way, then you have a bunch of unused use cases you are stuck supporting. For example, if the frontend always performs a wildcard search, then testing the = case for name is a waste because it's never really used, but your API supports it, so it better work.

Despite all my concerns with this approach, Elasticsearch does the JSON pattern and it works quite well for them. But Elasticsearch is a search engine. The set of filters being passed in, operators, etc. need to be dynamic because the JSON structure is actually meant to be a search query. Elasticsearch supports any sort of user-defined schema, so they expose a general query language as JSON.

If you have a few inputs on a web page and they map directly to known SQL predicates, you may be going overkill with the JSON pattern.

This is why it really matters what your product actually is.

Solve for the problem you have, not a problem that you may have some day.

Note that you don't need to (see RFC 7320 section 2.4) use the standard form data format (with & and =) (see the URL Standard section 5) for the query string, you can just make up a format as long as it only uses allowable characters (letters, numbers, -, ., _, ~, !, $, &, ', (, ), *, +, ,, ;, =, :, @, /, and ? (see RFC 3986)).

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى softwareengineering.stackexchange