Question

I am building an app to support 200,000+ registered users, and want to add an addressbook functionality for each user to import their own contacts (e.g. name, address, email, etc). Each user will have c.150 different contacts, with 10-15 fields for each record.

My question is simple: given the volume of users and the number of contacts for each user, is it better to create individual tables for each user's addressbook, or one single table with a user_id lookup for that associated user account?

If you could explain why from a performance perspective, that would be much appreciated.

UPDATE: Specifications

In response to questions in comments, here are the specifications: I will be hosting the database on AWS RDS (http://aws.amazon.com/rds). It will primarily be a heavy read load, rather than write. When write is accessed, it will be a balance between INSERT and UPDATE, with few deletes. Imagine the number of times you view vs edit your own addressbook.

Thanks

Was it helpful?

Solution

Specific answer in response to specifications One table for contacts' data, with an indexed foreign key column back to user. Finding a particular user's contacts will require about 3 seeks, a relatively small number. Use a SSD if seeks are bottlenecking you.

If your 15 columns have 100 bytes each, and your have 150 of those, then your maximum data transfer per user is of the order 256k. I would design the application to show only the contact data required up front (say the top 3 most useful contact points -- name, email, phone), then to pull more specifics when requested for particular contacts. In the (presumably) rare cases when you need all contacts' info (eg export to CSV) consider SELECT INTO OUTFILE if you have that access. vCard output would be less performant: you'd need to get all the data, then stuff into the right format. If you need vCard often, consider writing vCard out when database is updated (caching approach).

If performance requirements are still not met, consider partitioning on the user id.

General answer

Design your schema around KISS and your performance requirements, while documenting the scalability plan.

In this particular situation, the volume of data does not strike me as being extreme, so I would lean KISS toward one table. However, it's not clear to me the kind of queries you will be making -- JOIN is the usual performance hog, not a straight SELECT. Also what's not clear to me is your SELECT/UPDATE mix. If read-heavy and by user, a single table will do it.

Anyway, if after implementation you find the performance requirements aren't met, I would suggest you consider scaling by faster hardware, different engine (eg MyISAM vs. InnoDB -- know what the differences are for your particular MySQL version!), materialized views, or partitioning (eg around the first letter of the corresponding username -- presuming you have one).

OTHER TIPS

Have a Single table, but partition the table by the starting alphabet of the user like all Last Names starting with A will be loaded into 1 partition. All names starting with B will be loaded into another partition.

You could also do some amount of profiling to find the right distribution key.

I'm not a DBA, but I suggest you properly normalize the database, add indexes, etc and not bugger it up to meet a possible nonexistent performance issue. If possible, have a DBA review your schema. I don't think 20,000 users is excessive. All 200,000 users are not likely to hit the update button in the same x milliseconds it takes to process one person's input. Only a few will be logged in at any time and most of them will be filling out data or staring at existing data on the web page rather than hitting that update button. If by chance a bunch of them do hit it at the same time, there will probably be a performance wait rather than a crash. Here is a rough layout for your schema (mileage may vary):

User
long userID primary key
String firstName
String lastName

Contact
long contactID primary key
long userID foreign key
String firstName
String lastName

Address
long addressID primary key
long contactID foreign key

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top