Question

I have a system that end users will need to query but they are not guaranteed to be well versed in query writing. We wanted to ensure every SELECT statement ends with a LIMIT 100000. I have thought of some regex to do this and some tools. It seems the native Query Rewrite is a good option but we needed to use ProxySQL for other things, thus I wanted to see if anyone knows of a way to force this.

It seems like the patter or regex would be something like: anything that starts with select needs to have a LIMIT on the final line followed by a space and int equal to or less than 100000 then maybe a semi colon.

Has anyone had any luck doing this?

Was it helpful?

Solution

Forget it. There are ways to write "simple" queries, even with a small LIMIT, that can take hours to run.

How, if you switch to MariaDB-5.5.21 (or later), there is a "LIMIT ROWS EXAMINED" that would be relatively effective in stopping runaway queries.

An example of where LIMIT does not help much:

SELECT ...
    FROM big_table
    WHERE lots_of_rows_kept
    GROUP BY one_column
    ORDER BY another_column
    LIMIT 1

It will

  1. fetch lots of rows.
  2. GROUP BY, possibly involving a sort of all the rows.
  3. Sort again, this time for the ORDER BY.
  4. Deliver one row.

That is, lots of time and effort went into steps 1,2,3; the final LIMIT had very little impact on the overall time.

Check out Proxy servers -- some of them might have a feature wherein they kill any process running longer than X seconds.

Query Rewrite My old thoughts on Query Rewrite: http://mysql.rjweb.org/doc.php/queryrewrite

What's the worst?

So, someone writes a long-running query. It will hog some resources and slow down other queries, but most likely won't kill the system in any way. Hopefully, he will be embarrassed and try harder next time.

Anecdotes

I have dealt with several time-series applications. The first thing I do is build a web site to provide likely information. And, behind the scenes, I build Summary tables so that queries against them are better (sometimes 10x) than against the raw (Fact) table.

The web pages present the data in an easier to read manner than the non-programmers can get via clumsy SQL. And I can test them to see that they won't harm the system. When I am finished, I have no fear of people pounding on my web pages that in turn hit a billion-row dataset.

Yes, I build in LIMITs in various places -- after all, who wants to scroll through a million-row web page (should it ever finish rendering)? I even give them the ability to change the limit from the sane default I provide. But they rarely do.

And I listen to their requests. I try to quickly build whatever they ask for. (This keeps them from demanding direct SQL access.)

Summary Tables are the key to success. They don't want item by item results; they want sums/averages by day (or week or hour or ...)

It is easier for me to write the SQL than to explain the nuances of the table.

And, yes, there is always normalization, so JOINs are required.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top