Question

I have a somewhat complex query with multiple (nested) sub-queries, which I want to make available for the applications developers. The query is generic and generates a view with computed values over a collection of data sets, and the developer is expected to need only some records from what the query returns (i.e. they will limit the result for some entity's ID or a date range or some such).

I can see 3 ways to implement this:

  1. Let the developers embed the query into each application and add their own WHERE clauses as needed.
  2. Create a stored procedure that accepts as parameters all the conditions I expect developers to need (for the sake of the argument lets say that I can predict what will be needed for the foreseeable future), and the procedure will run the complex query and filter it according to the parameters passed.
  3. Implement the query as a view with several sub views (because MySQL doesn't allow sub-queries in views) and have the developers use this as a table and use WHERE to have each application applies the filters they need. Currently I'm looking at 3 additional sub-views, mostly because some sub-queries are used multiple times and doing them as sub-views prevents duplication - otherwise it could have been worse ;-).

What will be better performance wise? (assuming all indexing is equivalent in all cases) Go for worst case scenarios, if you may.

what will be better in code maintenance terms, do you think?

Was it helpful?

Solution

I like questions that define "good" - you've specifically asked about performance and maintainability, which allows answers to talk about that trade-off.

From a performance point of view, I don't think there's likely to be any difference between the 3 options, as long as the queries and data fit within your expected scenarios. I'd test with 100 times more data, and potentially widening the "where" clause to see what happens, but the indexing structure etc. is more likely to affect the performance than whether you execute the same SQL from a stored proc, through a view, or from a client application.

The best way to answer that question is to test it - there are, of course, many specific details that could invalidate the general "I'd expect x, y, or z" answers we overflowers can give. If performance is a critical concern, use a database filling tool (Redgate make on, I've used DBMonster in the past) and try all 3 options.

From a maintenance point of, view, I'd provide an option 4, which - in my view - is by far the best.

Option 4: build a data access library which encapsulates access to your data. Have the library expose methods and parameters to refine the selection of records. Consider using the specification pattern (http://en.wikipedia.org/wiki/Specification_pattern). Use whatever queries are best inside the library, and don't bother the developers with the implementation details.

If that doesn't work - heterogeneous application code, too much of a change for a simple requirement - I'd evaluate the options as follows:

  1. Embedded SQL: depending on the number of times this SQL is re-used, this may be okay. If there's only one part of the code that runs the SQL, it's logically similar to the data access library. If, however, the same snippet needs to get re-used in lots of places, it's a likely source for bugs - a small change in the SQL would need to be repeated in several places.

  2. Stored procedure: I generally dislike stored procedures for maintenance reasons - they tend to become brittle by over-loading, and create a procedural way of thinking. For instance, if you have other requirements for using this SQL calculation in a separate stored procedure, very quickly you end up with a procedural programming model, with stored procs calling each other.

  3. Views: this is probably the best choice. It puts the specific data logic in a single place, but promotes the use of set-based logic because the access route is through a SELECT statement, rather than by executing a procedural statements. Views are easy to incorporate into other queries.

OTHER TIPS

If well implemented, any of the three solutions would be fine for manteinance, but bear in mind how would you treat each of them in a migration process (code or database migration).

If the query is big, the stored procedure will give you a bit of extra performance due to less bandwith overhead because it's sending a smaller sized query. You may also gain a little extra security with this solution.

For a manteinance solution, I would prefer the 1st and 2nd solution, coz you can make any changes on the query without doing any database changes. If you choose the 1st solution, I would wrap the query call within a function so you'll have only one place to make changes.

From a developer point of view, I would choose the view solution beacuse is the most transparent one, I mean it's like querying just a regular table, you can check table structure with a describe command, or just select the fields and conditions you need to query, or join with another table, etc...

About the where clause flexibility, you can achieve it with any of the proposed solutions. You can add a where parameter in your wrapping function (1), you can add a where parameter to the stored procedure but be cautious with injections (2), or the developer can add a where clause as usual with the view (3)

Having in mind that in MySQL views are not temporary tables, if the query is very complex this solutions wouldn't be the best if the query is used a lot and in different ways (disabling cache performance boost). I would consider a temporary table solution (counter table) that updates each time period with a programmed task / cron (for example a day, a week, whenever needed) or gets updated by setting the propper triggers. This solution could improve performance quite a bit.

Hope this helps, I like the view solution the most but maybe it's more complex to develope from a database point of view.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top