Question

I am building a large web app that will help a "region manager" to manage multiple schools in multiple districts.

In total, there are about 400,000 students & teachers.

On top of managing the obvious things like grades, etc. We will also have to manage attendance (daily).

I am used to building web apps at a smaller scale, which I deploy to Heroku. Given a system of this scale, should I be thinking about using a non-relational DB from now or should I just stick to PostgreSQL and do specific optimizations to ensure high-speed and data integrity?

If it isn't clear, the main concern is one of the system being so slow for managing so many records across so many tables - in a relational db system.

Also, what are some common optimizations I can do to ensure speed - if the recommendation is to use a relational DB? The biggest, most obvious one is using indexes on the most commonly accessed information.....anything else like that would be greatly appreciated.

Thanks.

P.S. My team is split on what we should go with, so you guys will lend a useful voice in helping tip the balance :)

Was it helpful?

Solution

Stick with postgresql. Why would something else be better?

With the little info you provided, I can guess your performance is going to probably come down to two things:

  1. Proper indexes on the right columns
  2. Caching with rails and probably redis

Postgresql offers a datastore on disk. Caching pages with redis allows database queries and rendered parts of HTML to be cached in memory, as to avoid touching disk.

OTHER TIPS

Designing proper indexes is an important part of performance architecture, but you don't design indexes for information, you design indexes for queries. And it has little to do with the choice between relational and non-relational database, since both demand you design "proper" indexes. For more details, see my presentation How to Design Indexes, Really.

In addition to performance, please be sure to consider security concerns. Not that NoSql databases are necessarily bad, but it's different. Approaches to securing non-sensitive data can be different.

If you are storing any Personally Identifiable Information, weigh your options carefully, and if you're not sure what the differences are, go with what you know how to secure.

Also, it might not hurt to consider segregating data - some Relational and some not. If you have the flexibility to architecture the system from scratch, whatever works best in your situation is what's right for you.

Suggested reading:

Suggested viewing:

Facebook runs MySQL. I don't know if 400K people means 400K users for this system: I don't think so, but still Facebook is orders of magnitude greater and uses MySQL.

Here is the fact: scaling is hard. If a NoSQL backend was enough to scale easily, noone nowadays would really start with a relational database, don't you think? I know this is not really an answer to your question, but I think there is simple no answer to this.

Use whatever you are comfortable with, use what excites you most, use what you think you'll be using for the next years, or use what you think it's easy to buy support for. Don't care scalability in the account for this choice, because until you face a problem you can't know how to solve. You can't even know if it exists altogether.

BTW, there are lots of considerations about fault tolerance, caching and other things which will have a way greater impact on your performance than SQL vs NoSQL. BTW, not all relational database engines are the same (neither all of NoSQL datastores are...)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top