Question

Is there a (roughly) SQL or XQuery-like language for querying JSON?

I'm thinking of very small datasets that map nicely to JSON where it would be nice to easily answer queries such as "what are all the values of X where Y > 3" or to do the usual SUM / COUNT type operations.

As completely made-up example, something like this:

[{"x": 2, "y": 0}}, {"x": 3, "y": 1}, {"x": 4, "y": 1}]

SUM(X) WHERE Y > 0     (would equate to 7)
LIST(X) WHERE Y > 0    (would equate to [3,4])

I'm thinking this would work both client-side and server-side with results being converted to the appropriate language-specific data structure (or perhaps kept as JSON)

A quick Googling suggests that people have thought about it and implemented a few things (JAQL), but it doesn't seem like a standard usage or set of libraries has emerged yet. While each function is fairly trivial to implement on its own, if someone has already done it right I don't want to re-invent the wheel.

Any suggestions?

Edit: This may indeed be a bad idea or JSON may be too generic a format for what I'm thinking.. The reason for wanting a query language instead of just doing the summing/etc functions directly as needed is that I hope to build the queries dynamically based on user-input. Kinda like the argument that "we don't need SQL, we can just write the functions we need". Eventually that either gets out of hand or you end up writing your own version of SQL as you push it further and further. (Okay, I know that is a bit of a silly argument, but you get the idea..)

Was it helpful?

Solution

Sure, how about:

They all seem to be a bit work in progress, but work to some degree. They are also similar to XPath and XQuery conceptually; even though XML and JSON have different conceptual models (hierarchic vs object/struct).

EDIT Sep-2015: Actually there is now JSON Pointer standard that allows very simple and efficient traversal of JSON content. It is not only formally specified, but also supported by many JSON libraries. So I would call it actual real useful standard, although due to its limited expressiveness it may or may not be considered Query Language per se.

OTHER TIPS

I'd recommend my project I'm working on called jLinq. I'm looking for feedback so I'd be interested in hearing what you think.

If lets you write queries similar to how you would in LINQ...

var results = jLinq.from(records.users)

    //you can join records
    .join(records.locations, "location", "locationId", "id")

    //write queries on the data
    .startsWith("firstname", "j")
    .or("k") //automatically remembers field and command names

    //even query joined items
    .equals("location.state", "TX")

    //and even do custom selections
    .select(function(rec) {
        return {
            fullname : rec.firstname + " " + rec.lastname,
            city : rec.location.city,
            ageInTenYears : (rec.age + 10)
        };
    });

It's fully extensible too!

The documentation is still in progress, but you can still try it online.

Update: XQuery 3.1 can query either XML or JSON - or both together. And XPath 3.1 can too.

The list is growing:

jmespath works really quite easy and well, http://jmespath.org/ It is being used by Amazon in the AWS command line interface, so it´s got to be quite stable.

jq is a JSON query language, mainly intended for the command-line but with bindings to a wide range of programming languages (Java, node.js, php, ...) and even available in the browser via jq-web.

Here are some illustrations based on the original question, which gave this JSON as an example:

 [{"x": 2, "y": 0}}, {"x": 3, "y": 1}, {"x": 4, "y": 1}]

SUM(X) WHERE Y > 0 (would equate to 7)

map(select(.y > 0)) | add

LIST(X) WHERE Y > 0 (would equate to [3,4])

map(.y > 0)

jq syntax extends JSON syntax

Every JSON expression is a valid jq expression, and expressions such as [1, (1+1)] and {"a": (1+1)}` illustrate how jq extends JSON syntax.

A more useful example is the jq expression:

{a,b}

which, given the JSON value {"a":1, "b":2, "c": 3}, evaluates to {"a":1, "b":2}.

The built-in array.filter() method makes most of these so-called javascript query libraries obsolete

You can put as many conditions inside the delegate as you can imagine: simple comparison, startsWith, etc. I haven't tested but you could probably nest filters too for querying inner collections.

If you are using .NET then Json.NET supports LINQ queries over the top of JSON. This post has some examples. It supports filtering, mapping, grouping, etc.

ObjectPath is simple and ligthweigth query language for JSON documents of complex or unknown structure. It's similar to XPath or JSONPath, but much more powerful thanks to embedded arithmetic calculations, comparison mechanisms and built-in functions.

Example

Python version is mature and used in production. JS is still in beta.

Probably in the near future we will provide a full-fledged Javascript version. We also want to develop it further, so that it could serve as a simpler alternative to Mongo queries.

Another way to look at this would be to use mongoDB You can store your JSON in mongo and then query it via the mongodb query syntax.

OK, this post is a little old, but... if you want to do SQL-like query in native JSON (or JS objects) on JS objects, take a look at https://github.com/deitch/searchjs

It is both a jsql language written entirely in JSON, and a reference implementation. You can say, "I want to find all object in an array that have name==="John" && age===25 as:

{name:"John",age:25,_join:"AND"}

The reference implementation searchjs works in the browser as well as as a node npm package

npm install searchjs

It can also do things like complex joins and negation (NOT). It natively ignores case.

It doesn't yet do summation or count, but it is probably easier to do those outside.

Here's some simple javascript libraries that will also do the trick:

  • Dollar Q is a nice lightweight library. It has a familiar feel to the chaining syntax made popular by jQuery and is only 373 SLOC.
  • SpahQL is a fully featured query language with a syntax similar to XPath (Homepage, Github
  • jFunk is an in progress query language, with a syntax similar to CSS/jQuery selectors. It looked promising, but hasn't had any development beyond its in initial commit.

  • (added 2014): the jq command line tool has a neat syntax, but unfortunately it is a c library. Example usage:

    < package.json jq '.dependencies | to_entries | .[] | select(.value | startswith("git")) | .key'

In MongoDB, this is how it would work (in the mongo shell, there exist drivers for a language of your choice).

db.collection.insert({"x": 2, "y": 0}); // notice the ':' instead of ','
db.collection.insert({"x": 3, "y": 1});
db.collection.insert({"x": 4, "y": 1});

db.collection.aggregate([{$match: {"y": {$gt: 0}}}, 
                         {$group: {_id: "sum", sum: {$sum: "$x"}}}]);
db.collection.aggregate([{$match: {"y": {$gt: 0}}}, 
                         {$group: {_id: "list", list: {$push: "$x"}}}]);

The first three commands insert the data into your collection. (Just start the mongod server and connect with the mongo client.)

The next two process the data. $match filters, $group applies the sum and list, respectively.

SpahQL is the most promising and well thought out of these, as far as I can tell. I highly recommend checking it out.


I've just finished a releaseable version of a clientside JS-lib (defiant.js) that does what you're looking for. With defiant.js, you can query a JSON structure with the XPath expressions you're familiar with (no new syntax expressions as in JSONPath).

Example of how it works (see it in browser here http://defiantjs.com/defiant.js/demo/sum.avg.htm):

var data = [
       { "x": 2, "y": 0 },
       { "x": 3, "y": 1 },
       { "x": 4, "y": 1 },
       { "x": 2, "y": 1 }
    ],
    res = JSON.search( data, '//*[ y > 0 ]' );

console.log( res.sum('x') );
// 9
console.log( res.avg('x') );
// 3
console.log( res.min('x') );
// 2
console.log( res.max('x') );
// 4

As you can see, DefiantJS extends the global object JSON with a search function and the returned array is delivered with aggregate functions. DefiantJS contains a few other functionalities but those are out of the scope for this subject. Anywho, you can test the lib with a clientside XPath Evaluator. I think people not familiar with XPath will find this evaluator useful.
http://defiantjs.com/#xpath_evaluator

More information about defiant.js
http://defiantjs.com/
https://github.com/hbi99/defiant.js

I hope you find it useful... Regards

  1. Google has a project called lovefield; just found out about it, and it looks interesting, though it is more involved than just dropping in underscore or lodash.

    https://github.com/google/lovefield

Lovefield is a relational query engine written in pure JavaScript. It also provides help with persisting data on the browser side, e.g. using IndexedDB to store data locally. It provides SQL-like syntax and works cross-browser (currently supporting Chrome 37+, Firefox 31+, IE 10+, and Safari 5.1+...


  1. Another interesting recent entry in this space called jinqJs.

    http://www.jinqjs.com/

    Briefly reviewing the examples, it looks promising, and the API document appears to be well written.


function isChild(row) {
  return (row.Age < 18 ? 'Yes' : 'No');
}

var people = [
  {Name: 'Jane', Age: 20, Location: 'Smithtown'},
  {Name: 'Ken', Age: 57, Location: 'Islip'},
  {Name: 'Tom', Age: 10, Location: 'Islip'}
];

var result = new jinqJs()
  .from(people)
  .orderBy('Age')
  .select([{field: 'Name'}, 
     {field: 'Age', text: 'Your Age'}, 
     {text: 'Is Child', value: isChild}]);

jinqJs is a small, simple, lightweight and extensible javaScript library that has no dependencies. jinqJs provides a simple way to perform SQL like queries on javaScript arrays, collections and web services that return a JSON response. jinqJs is similar to Microsoft's Lambda expression for .Net, and it provides similar capabilities to query collections using a SQL like syntax and predicate functionality. jinqJs’s purpose is to provide a SQL like experience to programmers familiar with LINQ queries.

Check out https://github.com/niclasko/Cypher.js (note: I'm the author)

It's a zero-dependency Javascript implementation of the Cypher graph database query language along with a graph database. It runs in the browser (tested with Firefox, Chrome, IE).

With relevance to the question. It can be used to query JSON endpoints:

load json from "http://url/endpoint" as l return l limit 10

Here's an example of querying a complex JSON document and performing analysis on it:

Cypher.js JSON query example

You could use linq.js.

This allows to use aggregations and selectings from a data set of objects, as other structures data.

var data = [{ x: 2, y: 0 }, { x: 3, y: 1 }, { x: 4, y: 1 }];

// SUM(X) WHERE Y > 0     -> 7
console.log(Enumerable.From(data).Where("$.y > 0").Sum("$.x"));

// LIST(X) WHERE Y > 0    -> [3, 4]
console.log(Enumerable.From(data).Where("$.y > 0").Select("$.x").ToArray());
<script src="https://cdnjs.cloudflare.com/ajax/libs/linq.js/2.2.0.2/linq.js"></script>

PythonQL offers an embedded syntax that IMHO is an improvement on SQL, principally because group, window, where, let, etc. can be freely intermixed.

$ cat x.py
#coding: pythonql
data = [{"x": 2, "y": 0}, {"x": 3, "y": 1}, {"x": 4, "y": 1}]
q = [x match {'x': as x, 'y': as y} in data where y > 0]
print(sum(q))
print(list(q))

q = [x match {'x': as x, 'y': as y} as d in data where d['y'] > 0]
print(sum(q))

This code shows two different answers to your question, depending on your need to handle the entire structure or just the value. Execution gives you the expected result.

$ python x.py
7
[3, 4]
7

if you want to use pure javascript try this:

var object = { result: { data: { point1: "x", value: 2 }, foo: { bar: 7 } } },
    path = 'result.data.value',
    getValue = (o, p) => p.split('.').reduce((r, k) => r[k], o);

console.log(getValue(object, path));

I'll second the notion of just using your own javascript, but for something a bit more sophisticated you might look at dojo data. Haven't used it but it looks like it gives you roughly the kind of query interface you're looking for.

The current Jaql implementation targets large data processing using a Hadoop cluster, so it might be more than you need. However, it runs easily without a Hadoop cluster (but still requires the Hadoop code and its dependencies to get compiled, which are mostly included). A small implementation of Jaql that could be embedded in Javascript and the a browser would be a great addition to the project.

Your examples above are easily written in jaql:

$data = [{"x": 2, "y": 0}, {"x": 3, "y": 1}, {"x": 4, "y": 1}];

$data -> filter $.y > 0 -> transform $.x -> sum(); // 7

$data -> filter $.y > 0 -> transform $.x; // [3,4]

Of course, there's much more too. For example:

// Compute multiple aggregates and change nesting structure:
$data -> group by $y = $.y into { $y, s:sum($[*].x), n:count($), xs:$[*].x}; 
    // [{ "y": 0, "s": 2, "n": 1, "xs": [2]   },
    //  { "y": 1, "s": 7, "n": 2, "xs": [3,4] }]

// Join multiple data sets:
$more = [{ "y": 0, "z": 5 }, { "y": 1, "z": 6 }];
join $data, $more where $data.y == $more.y into {$data, $more};
    // [{ "data": { "x": 2, "y": 0 }, "more": { "y": 0, "z": 5 }},
    //  { "data": { "x": 3, "y": 1 }, "more": { "y": 1, "z": 6 }},
    //  { "data": { "x": 4, "y": 1 }, "more": { "y": 1, "z": 6 }}]

Jaql can be downloaded/discussed at http://code.google.com/p/jaql/

You can also use Underscore.js which is basically a swiss-knife library to manipulate collections. Using _.filter, _.pluck, _.reduce you can do SQL-like queries.

var data = [{"x": 2, "y": 0}, {"x": 3, "y": 1}, {"x": 4, "y": 1}];

var posData = _.filter(data, function(elt) { return elt.y > 0; });
// [{"x": 3, "y": 1}, {"x": 4, "y": 1}]

var values = _.pluck(posData, "x");
// [3, 4]

var sum = _.reduce(values, function(a, b) { return a+b; });
// 7

Underscore.js works both client-side and server-side and is a notable library.

You can also use Lo-Dash which is a fork of Underscore.js with better performances.

Whenever possible I would shift all of the querying to the backend on the server (to the SQL DB or other native database type). Reason being is that it will be quicker and more optimized to do the querying.

I know that jSON can be stand alone and there may be +/- for having a querying language but I cannot see the advantage if you are retrieving data from the backend to a browser, as most of the JSON use cases. Query and filter at the backend to get as small a data that is needed.

If for whatever reason you need to query at the front-end (mostly in a browser) then I would suggest just using array.filter (why invent something else?).

That said what I think would be more useful is a transformation API for json...they are more useful since once you have the data you may want to display it in a number of ways. However, again, you can do much of this on the server (which can be much easier to scale) than on the client - IF you are using server<-->client model.

Just my 2 pence worth!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top