Question

I have a collection of users.

Every user has some basic attributes : name, adress, zipcode ...

But every user has also some much more space-taking attributes: statistics.

The statistics attribute contains an array that is more than 10 times bigger than all other attributes put together.

My question is the following:

Should I split my collection into a user collection and a user stats collection ? The user collection would contain the basic attributes, and the user_stats collection would be the users but with the statistics. Then I could use the user collection when I want to retrieve only basic information, and the user_stats collection when I really need the stats.

If I for example search all names of all users:

userCollection.find({},{"name":true});

Will they be a difference in performance between the case where I have only one collection and the case where I have two ? Will this difference be linear (imo that will be the case, and the linear factor would be the ratio of the size of one document)?

The general question that it infers is: does mongodb lose in performance when the size of the documents becomes bigger (selecting the same attributes) ?

UPDATE

The attributes in the statistics are in an array that will become larger over time (The more the user uses the app). They is no limit to this array, but most of the users (90%) won't have a statistics attribute that is larger than 10 times the other attributes. However, a small part of the users (1-2%) have a statistics attribute which is the size of 500 times the other attributes.

I need this data so I'm not concerned about storage, but more about performance.

They are basically two cases where I fetch data from the users collection:

  • When I show a list of users, I will not fetch the statistics: I only project the name attribute and some others, which are limited in space.

  • When I show one user, I fetch the statistics: Basically I will project all statistics attributes + some others.

Was it helpful?

Solution

well, it seems to me that if the size of the user record is much smaller than the size of the statistics, then you are not winning anything by moving statistics to another collection. If statistics are large, they are large in any collection. Right? Besides, everything has limitations on the growth -- you cannot just grow it forever. I dont know how granular your statistics need to be, but could you possibly record "summarized" statistics instead of detailed ones? May be your statistics can be summarized per hour or per day?

OTHER TIPS

It depends how statistics data is affecting your query performance. If your query latency is low then the split may not worth. Even if you are projecting only needed fields, entire document is loaded in memory.

Note: Unbound arrays is not a good design.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top