
I'm pulling in JSON data from two different sources that have roughly the same information, but in different shapes. Let's say I'm pulling in data about music albums, and one of the API responses looks like this:

  "id": "6289d24d-1a49-499b-90a7-8f9e43b4d6af",
  "artist-credit": [{
    "name": "The Hot Sardines"
  "title": "The Hot Sardines",
  "date": "2014",
  "media": [{
    "tracks": [{
      "position": 1,
      "id": "5928bc3b-a5b9-4149-8a1e-b224707dab5a",
      "title": "Bei Mir Bist Du Schoen",
      "artist-credit": [{
        "name": "The Hot Sardines"
      "length": 246000,
    },  {
      "length": 246000,
      "number": "11",
      "artist-credit": [{
        "name": "The Hot Sardines"
      "title": "(I Don't Stand) A Ghost of a Chance With You",
      "id": "be97c530-d2d4-4a68-a777-637a2c23852e",
      "position": 11
    "track-count": 11
  "packaging-id": "ec27701a-4a22-37f4-bfac-6616e0f9750a"

While the other looks like:

  "album_type" : "album",
  "artists" : [ {
    "id" : "2BTZIqw0ntH9MvilQ3ewNY",
    "name" : "Cyndi Lauper"
  } ],
  "id" : "0sNOF9WDwhWunNAHPD3Baj",
  "name" : "She's So Unusual",
  "release_date" : "1983",
  "tracks" : {
    "items" : [ {
      "artists" : [ "some artists"],
      "disc_number" : 1,
      "duration_ms" : 305560,
      "id" : "3f9zqUnrnIq0LANhmnaF0V",
      "name" : "Money Changes Everything",
      "track_number" : 1,
      "type" : "track",
    }, {
    } ],
    "total" : 13
  "type" : "album",
  "uri" : "spotify:album:0sNOF9WDwhWunNAHPD3Baj"

I'd like to be able to query this data via SQL in PostgreSQL, with the table structure looking like:

album               (id, title,    release_date, spotify_id, musicbrainz_id)
track               (id, title,    album_id,     duration,   spotify_id, musicbrainz_id, position)
artist              (id, name,     spotify_id,   musicbrainz_id)
artist_track_credit (id, track_id, artist_id)
artist_album_credit (id, album_id, artist_id)

From what I've gathered there are a couple of ways to do this.

One would be to take each API response, convert them into SQL INSERT queries in software and then send those queries to the database like so:

INSERT INTO album (title, release_date, etc) VALUES (?, ?, ? ...);

-- Get the album ID back and use it in the following queries

INSERT INTO track (title, album_id, etc) VALUES (?, ?, ? ...), (?, ?, ? ...), ...;

INSERT INTO artist (name, etc) VALUES (?, ...), ... ON CONFLICT DO NOTHING

-- Get the new artist ids back (if created) and use them in the following queries

INSERT INTO artist_track_credit VALUES ...
INSERT INTO artist_album_credit VALUES ...

The problem here is that it is pretty messy, and requires several round trips to the DB in order to get the album ID to insert with the tracks, then the artist IDs to insert with the artist and album credits.

One alternative is to insert the API response directly into the database, and then use database functions to manipulate the data and insert it into the relational structure. The downside here is that I'm not as familiar with writing functions for postgresql, but if this is the best solution I'd go for it.

Another alternative is to take advantage of postgresql's JSON handling abilities and insert the API responses directly into the database, and then develop indices and views on that JSON data to simulate ordinary relational data. This seems promising, but I'm not sure how to handle the data having different shapes. Is it possible to write a set of views or indices for each schema, and then some kind of union that will take, say SELECT * FROM album and convert that into SELECT * FROM musicbrainz_album_view UNION SELECT * FROM spotify_album_view without having to write it out every time?

What is the best way to handle this scenario?

War es hilfreich?


I recommend not to store the data as JSON in the database, particularly if you want to query them with SQL. Storing the data will be easy that way, but your queries will be much more complicated and slower.

If you want to use several SQL statements in client code or a function in the database should mostly depend on the question if you want to keep your business logic in the application or in the database.

There shouldn't be too many round trips necessary if you use RETURNING as in

INSERT INTO artist (name)
VALUES ('Jethro Tull')

You can even insert several rows with a construct like

   INSERT INTO artist (name)
   VALUES ('Jethro Tull')
INSERT INTO album (name, artist_id)
SELECT ('Catfish Rising',
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit dba.stackexchange
scroll top