Question

I have different types of posts a user can create:

  • TextPost
  • ImagePost
  • VideoPost

The frontend client needs to retrieve the last 10 posts from a user. I am wondering how to model this structure correctly in a relational database and the data structure for sending the data to the frontend.

My proposed solution:

A post table with columns:

  • user_name
  • create_date
  • text_post_id
  • video_post_id
  • image_post_id

A row has only one of the foreign keys _id columns set. To get the last 10 post the backend will do a select on the post table to get the last posts. Next it will do do another select query for each of the post types. Example:

select * from text_post where text_post_id in (?1)

The data is then mapped in a List of post object. A post object having the fields textPost, videoPost, imagePost. Again only one of them is set at a time. The list is then ordered by create date for each post. Finally we send the response to the frontend which will iterate over the list and display each post type accordingly:

[
  {
    "videoPost": null,
    "textPost" : {},
    "imagePost": null
  },
  {
    "videoPost": {},
    "textPost" : null,
    "imagePost": null
  }
]

Is this a good solution that will scale and allow for the possibility of adding more post types in the future?

Was it helpful?

Solution

Yes, this is a decent way to handle this problem on the back-end side:

  • you handle each class as a distinct table (Post, TextPost, ImagePost and VideoPost),
  • you use composition to model the association between a Post and its specialistion XxxPost, with the Xxx_Post_id
  • The only problem is that if you add more post types, you will have to add more ids in your table, and review/update all the code using the Post table.

A slightly more flexible variant would be to map the classes using the Class Table Inheritance, reusing the Post_id (I.e. the Post_id is used as unique primary key for post data, and there are no longer XXX_Post_id):

  • This solution is pretty close to yours
  • You can add new types of post by creating the relevant tables. The existing code would work unchanged. Only the code where you need to work on specific post data would have to evolve.
  • You could use table information to determine the kind of posts if you do not want to have a column that tells what kind of post you're dealing with.

The front-end interface should in any case undergo a critical review:

  • the interface could provide a unique id to retrieve posts based on their id
  • disclosing the inner details by replicating different post categories is not necessarily the best option. If you'd add a new category of post, all the front-end code would have to evolve.
  • a better option would IMHO to provide the very generic common part and then a set of additional data indicating that is relevant for the type of post, but well isolated.

It would then look like (only as example):

[
   {
      Post_id: NNNNN,
      date: XXXX-XX-XX, 
      content: {
                  type: Text, 
                   ...           // additinoal text specific
               }
   },
   {
      Post_id: MMMMMM,
      date: XXXX-XX-XX, 
      content: {
                  type: Video, 
                   ...           // additinoal video specific
               }
   }
] 

OTHER TIPS

You are started off in the right direction with having separate tables for posts, images and videos, but your data model has gone awry in two ways:

  1. Add a "post_text" column to your "posts" table and make it nullable instead of a separate "text_post" table. If this is optional (which it is for video and image posts) then the database can reflect that with a NULL value.

  2. Foreign keys are on the "posts" table pointing to the other tables.

A more normalized data model would have a "post_id" column on the image and video tables. If you want images and videos associated with other entities than posts (which wouldn't be too far fetched) then the image and video tables should not have a foreign key back to the posts table. Instead, create JOIN tables associating a post with an image, and a post with a video. This gives you a properly normalized data model.

Posts table

  • post_id (primary key)
  • post_text (nullable)
  • user_name
  • create_date

Images table

  • image_id (primary key)
  • image_data or image Id to external hosting service

Videos table

  • video_id (primary key)
  • video_data or video Id to external hosting service

Post Images table

  • post_id (foreign key to posts, composite primary key)
  • image_id (foreign key to posts, composite primary key)

Post Videos table

  • post_id (foreign key to posts, composite primary key)
  • video_id (foreign key to videos, composite primary key)

If you want posts limited to 1 video each, then add a unique constraint to the post_id column in Post Videos. Same thing for the post_id column in Post Images if you want only 1 image per post. This gives you a great amount of flexibility in your data model. And until you measure an actual problem with JOIN-ing on multiple tables, go for the normalized model first. Modern relational databases should be acceptably fast if you have proper indexes and foreign keys on the tables. Only de-normalize if all other means fail to give you acceptable performance, which could include archiving old data or optimizing SQL queries first.

The frontend data model could use a little modification too, depending on the constraints you place in the database or application:

[
    {
        "post_id": 1,
        "date": YYYY-MM-DD,
        "post_text": null,
        "images": [],
        "videos": [
            {
                "video_id": 23,
                "url": "http://domain.com/videos/23"
            }
        ]
    }, {
        "post_id": 2,
        "date": YYYY-MM-DD,
        post_text: "Blah blah blah",
        "images": [],
        "videos": []
    }, {
        "post_id": 3,
        "date": YYYY-MM-DD,
        "post_text": null,
        "images": [
            {
                "image_id": 14,
                "url": "http://domain.com/images/14"
            }
        ]
    }
]

Alternately if you only want 1 video or image change "videos" to "video" and make it a single object instead of an array, and do the same sort of thing for "images".

I would challenge you to think of the "post type" as a reflection of the data a post contains rather than an attribute of a post, but there isn't anything wrong with your middle tier adding a "type" attribute after inspecting the data for a post to make things easier on the front end.

With your current model, you have to change a lot of stuff whenever you want to add a new post type:

  • Add a new _id column to the post table
  • Create a new table to store content for the new post type
  • Add a new field to your response structure

That doesn't seem very scalable to me. Instead, I would just store a type column in the post table indicating the type of post and a content column that stores the post content (either a blob or a pointer to an external storage location). Then mirror the same structure in the response, for example:

[
  {
    "type": "text",
    "content": ...
  },
  {
    "type": "video",
    "content": ...
  }
]

You no longer have to change your database/response schemas to support a new post type, you can simply introduce a new post type identifier and change the front-end to be able to render the new type. And if you store the content in the same table, you only have to do a single query instead of N+1 (where N is the number of post types).

Licensed under: CC-BY-SA with attribution
scroll top