Question

I have the following tables:

CREATE TABLE base_event (
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    created_by ... -- some columns
);

CREATE TABLE transaction_events (
    event_id BIGINT UNSIGNED NOT NULL,
    transaction_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    merchant_id BIGINT UNSIGNED NULL DEFAULT NULL,   
    merchant_city VARCHAR (...) NULL DEFAULT NULL, -- Denormalize
    customer_id BIGINT UNSIGNED NULL DEFAULT NULL,
    customer_ip_address VARCHAR(...) NULL DEFAULT NULL, -- Denormalize
    ...
    FOREIGN KEY (event_id) REFERENCES base_event(id),
    FOREIGN KEY (customer_id) REFERENCES customers(id),
    FOREIGN KEY (merchant_id) REFERENCES merchants(id),
);

CREATE TABLE customers (
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    customer_ip_address VARCHAR(...) NULL DEFAULT NULL,
    ...
);

CREATE TABLE merchants (
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    ...
);

And my SELECT:

(SELECT t.*, c.name AS customer_name ...
FROM transaction_events t
JOIN customers c ON t.customer_id = c.id
JOIN merchants m ON t.merchant_id = m.id
WHERE t.customer_ip_address = 'abc' AND t.transaction_time > 'abc')
UNION DISTINCT
(SELECT t.*, c.name AS customer_name ...
FROM transaction_events t
JOIN customers c ON t.customer_id = c.id
JOIN merchants m ON t.merchant_id = m.id
WHERE t.merchant_city = 'abc' AND t.transaction_time > 'abc')

And my indexes are:

ALTER TABLE transaction_events 
ADD INDEX index_1 (customer_ip_address, transaction_time),
ADD INDEX index_2 (merchant_city, transaction_time);
  1. My query is in this form to avoid OR.
  2. I've denormalized to a degree for the sake of the indexes.
  3. I do not need to reference my base_event table for this query.
  4. The relation with transaction_events to customers and merchants is not 1-to-1 but 1-to-0-or-1.

My questions:

  1. I can get rid of the wildcard, but transaction_events has around 20 columns (Would it help creating any further indexes to speed up the query?
  2. Do I need to put any other composite indexes (that potentially reference my FKs) to further improve this query?
Was it helpful?

Solution

The WHERE clauses refer to t, so it is very likely that the Optimizer will start with t in each SELECT. You have the optimal indexes for them.

Then it needs to reach into the other two tables (merchants and customers) and get 1 (or 0) row from them. Those tables have the optimal index for the JOIN, namely PRIMARY KEY(id) in each case. (The FKs do not play any role in this query.)

t.* might slow things down if it is fetching large TEXT columns that you then ignore.

Since you need all the columns, then the only possible inefficiency is if each SELECT is fetching the same row redundantly, only to be dedupped by UNION DISTINCT. I think that that problem is not worth fixing. (The fix would be to have the UNION find and dedup only t.id; then join back to t to get the other 19 columns. The cost of the extra work may outweigh the benefit; I cannot tell.)

OTHER TIPS

Do you need all 20 columns from transaction_events? If not, then getting rid of * and specifying only the columns you need not only reduces the amount of data you're pulling back at one time but also reduces the chances of a sub-optimal query plan. It's possible the query plan generated will vary based on the columns in your SELECT clause.

You could test adding an index on the customer_id field and another index on the merchant_id field for your JOIN clauses and see if it improves performance and produces a better query plan. But this will require testing and comparison of the EXPLAIN for each case.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top