Question

Take the following statement for example:

'Get results for ALL Active students of a Course'.

Naturally, this query has 2 parts,

  1. Get all active students of the course.
  2. Get results of those active students.

The end result can be achieved with a single query using JOIN. Now, in some situations, some students may have their result missing (the teacher have not yet input mark). In such case, their result will be shown as 0 (or as , it doesn't matter). In this situation, the result of the 2nd query won't contain All the active students. But, it can be handled using LEFT JOIN to always show all students.

Now, the conditions to determine Active Students requires JOINING 2/3 tables now and may change in future. Similarly, logic for calculating mark may also change.

So, there are 2 ways to solve this. I am currently using Single Query approach as it uses a Single DB query to get all necessary data. Also, DBs are best at handling and generating data.

On the other hand, writing separate SQL queries to separately retrieve Students and Results and combining them on service logic would make maintaining the code very easy. But this will probably be slower, especially for large result set.

Another thing to consider is that there can be more queries in the form of "Get (some data) for All Active students of a Course". So, a change in requirements will affect ALL the queries.

So, considering execution speed and ease of coding what would be the better approach. I like the break and combine approach but unsure about the speed overhead. What would you do in this approach?

Platform: NodeJs, DB: MySQL

Was it helpful?

Solution

In the case of an SQL engine such as MySQL, the best option is to let the DB engine and its optimizer do the job: send the complex query and get the results.

There are several reasons for that:

  • SQL engines have in general an optimizer that is able to chose the best access strategy, taking into account dynamically the size of the tables, the existence of index, previous joins. The more complex the query, and the more the engine will outperform your handcrafted code.
  • Letting the engine do the work has also performance advantages: if you have some conditions in your query, you might have to transfer much more useless data between the DB and your app than if you subcontract the work to the DB engine
  • Of course managing smaller queries in the application services would facilitate database independence. But the tighter change control that you expect could also backfire: if you start to use aggregate functions and grouping, you will have to write much more code, so that the "easy maintenance" argument is offset by the much more code to maintain and longer development time.
  • Furthermore, some data-structure evolutions are easier to tackle on the SQL side

You can very well combine the benefits of both worlds:

  • to facilitate maintenance on the DB side, you can encapsulate frequently used joins, with database views.
  • not everything has to be joined every time. You could very well keep using lazy loading of complex domain objects, not using joins for an indivual domain object-by-domain object access.
  • however, for mass operation (like your student scores), you could consider a query pattern that exploits DB engine capabilities to their maximum extent. Dis does not prevent the use of a proxy objects.
Licensed under: CC-BY-SA with attribution
scroll top