Structuring a database to handle unknown name/value pairs

https://stackoverflow.com/questions/11007165

14-06-2021
|

Question

Here's the idea: I expect to be receiving thousands of queries, each containing a certain amount of name value pairs; these start off as associative arrays, so I have fairly good control over what can happen to the data. These NVPs vary dependent on the source. For example, if the source is "A", I could receive the array (in JSON for ease of explanation): {'Key1':'test1','key2':'test2'} but if the source was "B", I could receive {'DifferentKey1':'test1','DifferentKey2':'test2'} I'm selecting which keys I want to store in my database, so in this case I could only want to select DifferentKey1 from source B's array, and discard the rest.

My main issue here is that these arrays could technically be completely unrelated content wise. They have a very general association (they're both arrays containing stats) but they're very different (in that the sources are different, ie. different games/sports).

I was thinking SQL: storing a table filled with games and their respective ids would be a good way of linking general NVP strings. For example:

Games table:
| id | name |
-------------
  1    golf
  2    soccer

NVP table
| id | game_id | nvp
   1      1      team1score=87;team2score=94;team3score=73;
   2      2      team1score=2;team2score=1;extratime=200;numyellowcards=4;

Hope that's clear enough. Do you see what I mean though? If there's an indeterminant amount of data that I may use, how can I structure a table? Thanks.

Edit: I guess I should note, obviously this set up WOULD work, however is it the best performance wise? Maybe not? I'm not sure, let's see what you guys can come up with!

Solution

SQL databases are great for highly relational data - but in a case like this where the data is not relational and there is no fixed schema, you might be better off using a NoSQL solution. There are a lot and I haven't used them enough to be sure what would work best for you. If your data can fit in RAM, then redis is great.

OTHER TIPS

The common way of storing name/value pairs in a relational database is known as "Entity/Attribute/Value". You'll find a lot of discussion on Stack Overflow.

It all depends on what your application wants to do with the data. Storing is it easy - querying is much harder.

If you're building a sports application, you are likely to have domain concepts you want to support - for football, show a league position based on games played. For golf, show the number of birdies or eagles. You will probably want to show all the games a particular team/player has played in a season.

Some things are easy to build in a relational database, and have amazing performance over huge data sets. Find the highest-scoring game ever, find the last game in the 1998 season, find all the games featuring player x - all a great fit, as long as you can build a schema that represents those domain concepts.

From what you write, it does sound like you will have a fixed number of sports; the data coming in to your system sounds like it's not particularly structured, but you should be able to structure that to a domain model. If that's true, I recommend building a relational schema that reflects the domain logic of each sport.

If that's not true - if you can't reason about the domain in advance - the relational model is a bad fit, and NoSQL is probably better. But you will run into the same problem - extracting meaning from name/value pairs is going to be hard!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow