Question

I got confused with a seemingly simple concept. Mysql defines deterministic function as a function that

always produces the same result for the same input parameters

So in my understanding, functions like

CREATE FUNCTION foo (val INT) READS SQL DATA
BEGIN
   DECLARE retval INT;
   SET retval = (SELECT COUNT(*) FROM table_1 WHERE field_1 = val);
   RETURN retval;
END;

are not deterministic (there is no guarantee that delete/update/insert does not happen between 2 calls to the function). At the same time, I saw many functions which do pretty much the same, i.e. return value based on result of queries, and declared as DETERMINISTIC. It looks like I'm missing something very basic.

Could anyone clarify this issue?

Thanks.

Update Thanks for those who answered(+1); so far it looks like there is a widespread misuse of DETERMINISTIC keyword. It is still hard to believe for me that so many people do it, so I'll wait a bit for other answers.

Was it helpful?

Solution

From the MySQL 5.0 Reference:

Assessment of the nature of a routine is based on the “honesty” of the creator: MySQL does not check that a routine declared DETERMINISTIC is free of statements that produce nondeterministic results. However, misdeclaring a routine might affect results or affect performance. Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices. Declaring a deterministic routine as NONDETERMINISTIC might diminish performance by causing available optimizations not to be used. Prior to MySQL 5.0.44, the DETERMINISTIC characteristic is accepted, but not used by the optimizer.

So there you have it, you can tag a stored routine as DETERMINISTIC even if it is not, but it might lead to unexpected results or performance problems.

OTHER TIPS

DETERMINISTIC results does not refer to different results sets being returned at different times (depending on what data has been added in the mean time). Moreover it is a reference to the result sets on different machines using the same data. If for example, you have 2 machines which run a function including uuid() or referencing server variables then these should be considered NOT DETERMINISTIC. This is useful for example in replication because the function calls are stored in the binary log (master) and then also executed by the slave. For details and examples see http://dev.mysql.com/doc/refman/5.0/en/stored-programs-logging.html

The use of DETERMINISTIC is thus (99% of the time) correct, not to be considered misuse.

I think that your routine is deterministic. The documentation is not very clear and this has led to many people being very confused about this issue, which is actually more about replication than anything else.

Consider a situation where you have replication set up between two databases. The master database keeps a log of all the stored routines that were executed including their input parameters, and ships this log to the the slave. The slave executes the same stored routines in the same order with the same input parameters. Will the slave database now contain identical data to the master database? If the stored routines create GUIDs and store these in the database then no, the master and slave databases will be different and replication will be broken.

The main purpose of the DETERMINISTIC flag is to tell MySQL whether including calls to this stored routine in the replication log will result in differences between the master database and the replicated slaves, and is therefore unsafe.

When deciding if the DETERMINISTIC flag is appropriate for a stored routine think of it like this: If I start with two identical databases and I execute my routine on both databases with the same input parameters will my databases still be identical? If they are then my routine is deterministic.

If you declare your routine is deterministic when it is not, then replicas of your main database might not be identical to the original because MySQL will only add the procedure call to the replication log, and executing the procedure on the slave does not produce identical results.

If your routine is non-deterministic then MySQL must include the affected rows in the replication log instead. If you declare your routine as non-deterministic when it is not this will not break anything, but the replication log will contain all of the affected rows when just the procedure call would have been enough and this could impact performance.

You're not missing anything. This function is non-deterministic. Declaring it deterministic won't cause your database to melt but it might affect performance. From the MySQL site: "Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices." But MySQL does not enforce or check if your declared deterministic routine is actually deterministic---MySQL trusts that you know what you are doing.

Deterministic is important if you have replication turned on or may use it one day. A non-deterministic function call that causes a row change (update or insert) for instance will need to be replicated using binary (row-based) where as a deterministic function can be replicated statement based. This becomes interesting when looking at your SQL examples above, which ones will happen the same (give the same result) when replicated using statement based, and which should be replicated using the result obtained in the master (row-based). If the statements are executed with the appropriate locking and can be guaranteed to execute in the same order on the Slave then they are indeed deterministic. If the locking / statement order that the Slave uses (no concurrency, serial processing of statements in the order they are started) means the answer can be different, then the function should be non-deterministic.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top