Question

I'm looking for alternative data storage methods to SQL (That is to say, I do not want to use SQL, even for queries) and came across a few based on JSON. Talking with friends who do database work, they said I shouldn't consider these, but wouldn't elaborate. What are the potential (and practical) drawbacks to using JSON as a data storage file format?


I figured JSON would be better than SQL for these reasons:

  1. JSON is strictly defined and doesn't have flavors (Oracle, Microsoft, MySQL, etc.)
  2. Since Google started making Chrome, JS interpreters have made reading, parsing, and outputting JS (and thus JSON) a very fast and easy process.
  3. Database output could be pure JSON, erasing the need for a middle-man interpreter for browsers, etc.

among others...

Was it helpful?

Solution

I think you might want to take a look at NO-SQL databases: https://en.wikipedia.org/wiki/NoSQL

If you like using JSON-like data, then one I have personally used is MongoDB.

I have not used it as a main/single source of my app data, but only for secondary purposes. But, I guess, you can try using it as your main data storage too (I think many people do).

What I have tried, and was quite satisfying, was MongoDB with C# and using MongoVue as a GUI application for executing queries and interaction with the DB. I was not very happy with MongoVUE, but it seems that it was the best option at the time.

However, SQL DBs are very good at defining relationships in your data. E.g. referencing an entry on table A from an entry on table B, and that kind of stuff. Using those relationships, you can join tables and do many interesting things. I think, it is good for you to get some experience on this field as well.

MongoDB is not build for defining relationships (as far as I understand). It has the concept of "documents", where you store information in a JSON like format (with nested key/values). You can query documents, but joining seems like hacking your way around its normal usage: How do I perform the SQL Join equivalent in MongoDB? Also, ensuring data consistency (in a truly reliable manner) when using relationships in MongoDB seems pretty impossible to me. But even if I am wrong and it is possible, it will be 10 times harder achieving it than with SQL DBs.

But you can have a look at the list in WikiPedia and there might be a better alternative than MongoDB for you.

But you can use pure JSON as well with no DB system.

So, in summary JSON-like storage has (at least) these issues:

  • Not good at defining and utilizing relationships
  • When using relationships, data integrity (or more likely, reference integrity) is hard.
  • If you are not using a good DB system, but you just dump JSON into a file, when that file becomes too big you will have performance issues. Imagine querying a 1GB JSON encoded array of objects to get the ones you want. You will have to load the entire array on memory, run through the whole of it (since you will have no indices) and then (if you have not run out of memory and your connection -when using a network- has not expired) you will get a result. Most NO-SQL DBs like MongoDB and most SQL DBs have no such problems (at least within reasonable amounts of data). They are fine-tuned, they support indexing, references, permissions, roles and you can also define executing code at the DB level (like triggers and stored procedures). Certainly they are more complex, but that complexity may be required most of the times to achieve the end result.

OTHER TIPS

JSON, or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML.

You are more looking at the comparison between database vs flat-file storage really.

Even when using a relational DB, data integrity (or referential integrity) is still hard because rows are, usually, timestamped. Quite often foreign keys are not enforced because of this. When an row update occurs you have 2 choices. Firstly, 'forget' the previous version. Secondly, update the original row and copy the previous version into a timestamped 'non-relational' history table where foreign keys are useless. Most business data requires updates. Features for maintaining referential integrity in Relational databases are useless for this type of business data (which represents most enterprise data). What is needed is a Temporal Database, or an abstraction layer which presents a user with the appropriate version of a row based on a Time context. Ideally in 2 dimensions i.e. transaction time and business time (aka valid time).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top