Question

Background

I am working on an app for a client that includes some social networking features. I was originally developing the mobile front-end, but circumstances have left me in charge of developing the back end as well.

As a general background, our system allows users to follow other users and receive notifications about those that they are following, as you'd expect from a social network. A caveat is that only a small subset (at most a few hundred) users will be followable, with the expectation that most of the user base will be following at least one of these individuals.

On the UI side, we will have a notification button with a number on it, and clicking the button will take you to the notification screen.

The problem

I've been researching strategies for implementing notifications and most resources I have found point to creating one or more notification tables in the database. (An example I like is the accepted answer here: https://stackoverflow.com/questions/9735578/building-a-notification-system ).

The thing that is throwing me off is that most database-driven strategies for notifications require inserting a row for each notification for each follower. So if a thousand people are following Sally, we insert a thousand rows into the corresponding table. Is that scalable? What happens if we get to the point where tens or hundreds of thousands of users are following Sally and she's making a few dozen posts per day?

My original idea had been to handle everything with queries: the number on the notification button would be obtained by requesting row-counts on content posted more recently than the last time you visited the notification screen, while individual notifications would be generated from more detailed queries when you visited the notification screen. This approach would require no writes or extra storage, but is inflexible and would probably hammer the server pretty hard.

SETUP

The backend (as established by the previous developer) uses CodeIgniter and a MySQL database. It is currently running on a crappy GoDaddy shared hosting account, but I assume (hope?) this will be upgraded before we go into production and the hosting package will be scaled with user growth.

Currently our only front-end is a mobile app, but we plan to later build a website as well. I am not concerned at this time with obtaining real-time push updates from the server about the notifications.

ADDENDUM

I do not specialize in backends and I'm in over my head in that department. The client knows it, and I've done my best to try to explain the scope of a project of this nature, but they have made it clear that at this point they will not trust anyone else to work on the project. We probably have another month of work to do before we can start adding testers and I can get any kind of performance metrics. I really can't estimate how many users we might have or what hardware we might be on in the next 5 years, but I think the client is hoping for hundreds of thousands of users or more.

I hope this is specific enough of a problem to be posted here; I can refine it if need be. Please ask if you have any questions or I've omitted important details.

tl;dr

  • Does a database-driven notification system have negative implications for long-term scalability when all of the users are only following some of the same few hundred people?
  • Is there a way to make the notifications database-driven without needing a separate notification row for each notification for each follower?
  • Would an entirely query-driven notification system be scalable, or have any advantages besides not writing any data to the DB?
  • Am I overthinking this too early? Should I just build something that works for now and we can worry about optimizing it if it becomes a problem, given that the client has a limited budget and we don't know yet if the final product will be popular?
Was it helpful?

Solution

So if a thousand people are following Sally, we insert a thousand rows into the corresponding table. Is that scalable?

Yes, provided the database tables are properly indexed.

What happens if we get to the point where tens or hundreds of thousands of users are following Sally and she's making a few dozen posts per day?

You'll generate a few dozen tens or hundreds of thousands of notification records per day for Sally, assuming you want to keep track of every notification in perpetuity. The percentage of users like Sally with that kind of traffic is always very small.

My original idea had been to handle everything with queries: the number on the notification button would be obtained by requesting row-counts on content posted more recently than the last time you visited the notification screen, while individual notifications would be generated from more detailed queries when you visited the notification screen.

This seems unnecessarily complicated. If you need detailed statistics about notifications, just store the notifications.

Does a database-driven notification system have negative implications for long-term scalability when all of the users are only following some of the same few hundred people?

That's why it works... a small number of people always generate the vast majority of the traffic.

Is there a way to make the notifications database-driven without needing a separate notification row for each notification for each follower?

Yes... Don't store the notifications; just send the notification emails, in fire-and-forget style. Or, store the notifications for a certain period of time, and then discard them. Or, discard each notification after it has been read.

Would an entirely query-driven notification system be scalable, or have any advantages besides not writing any data to the DB?

I'm not sure what you mean by this. If you want to query notifications, you have to store them in the database. Otherwise, there is nothing to query.

Am I overthinking this too early?

Talk to someone who can help you design a properly normalized, indexed database with the correct tables in it. I see no reason why such a database couldn't effectively handle the scenarios you describe.

A real-life example

As far as I know, Stack Exchange stores everything in perpetuity, including all notifications. They use database technology similar to MySql, and some caching technologies. While their hardware and storage space is substantial, the amount of traffic they get is a good problem.

Licensed under: CC-BY-SA with attribution
scroll top