Question

I have many so-called summary columns throughout my database to count number of rows in other tables. For example, counting the number of comments by a user (a column in the user table). Currently, I update this column by an extra query:

UPDATE user_table SET comments = comments + 1 WHERE user_id='x'

Now I want to introduce Triggers to my database instead of running the second query. Two Triggers for each column: one DELETE and one INSERT. Then, I will have 30 - 40 Triggers!!!

Benchmarks show that using Triggers is about 50% faster than executing a second query. Apparently, it is reasonable to use the modern features of mysql. However, the reason that I am asking this question is that many programmers are strongly against Triggers, as it makes coding vague. Moreover, adding lots of Triggers may make Information_Schema messy.

Am I on the right track?

Was it helpful?

Solution

The major problems with triggers are:

a) They are difficult to debug

b) Trigger cascades can be fatal to your database - take for example if you have an after update trigger in one table which updates another table, which also has an after update trigger, then you will run into lots of issues

A few questions:

a) How often are the "computed" summaries used within your application?

b) How often are the "computed" summaries required to be updated?

c) Would you consider creating a "reporting" table which is refreshed on a regular basis with the major "computations" that you need?

OTHER TIPS

There are several options for summary fields such as you have.

  • Calculate when needed. This is the preferred starting point until usage indicates you need to de-normalize the value.
  • Use triggers to maintain the de-normalized value in the appropriate table. This trades off the overhead of maintaining the value versus the cost of retrieving it as needed.
  • Use a batch process to update de-normalized values on a periodic basis (when system load is low). This trades off accuracy against resource usage during peak times.

You will note I didn't include your option of a second update to maintain the value. This is because that is prone to being missed in some cases causing the data to be permanently inaccurate. The solution for this is a batch audit/update to detect and correct inaccuracies. This is needed for the initial value setting when using triggers, and should be kept around in case the triggers are disabled at any time.

As has been noted, you don't want the de-normalized values to have triggers on them. These are the cases I use triggers:

  • Maintain audit fields or tables. (Create/Update dates and users. Before image snapshots in an audit table.)
  • Perform cross field table validations.
  • Validate state transformations on state fields.
  • Maintain de-normalized fields. (Added after validation that they will increase performance. Most commonly when the value changes far less often than it is retrieved.)

The reason many are against the use of triggers really has to do with the lack of control introduced by the need for a second query.

The second query needed for recording information is subjected to the MySQL environment without you having the ability to control storage engine specific aspects.

For example, I once tried to help troubleshoot a MySQL event whose root cause simply boiled down to the storage engine of the table being recorded. From my empirical analysis of that problem, I have sworn off recording audit trail info into InnoDB tables from within a trigger.

I have written earlier posts about the good, the bad, and the ugly of Triggers:

If you must have many triggers, my strong recommendation would be for you to use a combination of the BLACKHOLE storage engine and MySQL Replication.

EXAMPLE

Let's say you have a database that you want to record comments for a user. Have a table called audit_user with the userid of the user and a comments column:

CREATE DATABASE auditinfo;
USE auditinfo
CREATE TABLE audit_user
(
    userid INT NOT NULL,
    comments INT NOT NULL DEFAULT 1,
    PRIMARY KEY (userid)
) ENGINE=BLACKHOLE;

You next setup a trigger that performs this query

INSERT INTO auditinfo.audit_user (userid) VALUES (myuserid)
ON DUPLICATE KEY UPDATE comments = comments + 1;

OK great, but why have the audit_user table use the BLACKHOLE storage engine when nothing is stored in the table? Here is where MySQL Replication comes into play.

Setup a Slave whose sole purpose in life to to catch all auto trail data. Have the Slave replicate from the Master with the replicate-do-db=auditinfo option. Create the auditinfo database and audit_user table in the Slave as follows:

CREATE DATABASE auditinfo;
USE auditinfo
CREATE TABLE audit_user
(
    userid INT NOT NULL,
    comments INT NOT NULL DEFAULT 1,
    PRIMARY KEY (userid)
) ENGINE=MyISAM;

The binary logs of the Master has the command INSERT INTO auditinfo.audit_user (userid) VALUES (myuserid) ON DUPLICATE KEY UPDATE comments = comments + 1; recorded. That INSERT is transmitted to the Slave's relay logs. Replication causes that INSERt query to be executed.

So the question remains: What is the benefit of setting this up?

When it comes strictly to recording audit info, the Master does not have any heavy write I/O to slow it down. It simply records the SQL need to do the audit. That SQL is sent over to another machine (the Slave) for actual recording. Naturally, the Master is able to handle a lot more triggers when you setup auditing in this manner. If you do not use this combination of BLACHOLE/MySQL Replication, every query doing an INSERT will slow itself down and seeing hundreds of such queries with unveil significantly poor DB performance.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top