Complex query with two tables and multilpe data and price ranges
-
04-10-2019 - |
Question
Let's suppose that I have these tables:
[ properties ]
id (INT, PK)
name (VARCHAR)
[ properties_prices ]
id (INT, PK)
property_id (INT, FK)
date_begin (DATE)
date_end (DATE)
price_per_day (DECIMAL)
price_per_week (DECIMAL)
price_per_month (DECIMAL)
And my visitor runs a search like: List the first 10 (pagination) properties where the price per day (price_per_day field) is between 10 and 100 on the period for 1st may until 31 december
I know thats a huge query, and I need to paginate the results, so I must do all the calculation and login in only one query... that's why i'm here! :)
Questions about the problem
If there are gaps, would that be an acceptable property?
There're no gaps. All the possible dates are in the database.
If the price is between 10 and 100 in some sup-periods, but not in others, do you want to get that property?
In the perfect world, no... We'll need to calculate the "sum" of that type of price in that period considering all the variations/periods.
Also, what are the "first 10"? How are they ordered? Lowest price first? But there could be more than one price.
This is just an example of pagination with 10 results per page... Can be ordered by the FULLTEXT search that I'll add with keywords and these things... As I said, it's a pretty big query.
Solution
What you tell us is not precise enough. From your data structure and your question I assume:
- the price of a property can change in that period, and there would be a properties_price entry for each sub-period
- there should be no overlaps in the sub-periods, but the data structure does not guarantee that
- there can be gaps in the sub-periods
But there are still questions:
- If there are gaps, would that be an acceptable property?
- If the price is between 10 and 100 in some sup-periods, but not in others, do you want to get that property?
- Also, what are the "first 10"? How are they ordered? Lowest price first? But there could be more than one price.
Depending on the answers, there might be no single query doing the trick. But if you accept the gaps, that could return what you want:
SELECT *
FROM properties AS p
WHERE EXISTS -- property is available in the price range
(SELECT * FROM properties_prices AS pp1
WHERE p.id = pp1.property_id AND
pp1.price_per_day between 10 and 100 AND
(pp1.date_begin <= "2010-12-31" OR pp1.date_end >= "2010-05-01")) AND
NOT EXISTS -- property is in the price range in all sup-periods, but there might be gaps
(SELECT * FROM properties_prices AS pp2
WHERE p.id = pp2.property_id AND
pp2.price_per_day not between 10 and 100 AND
(pp2.date_begin <= "2010-12-31" OR pp2.date_end >= "2010-05-01"))
ORDER BY name --- ???
LIMIT 10
That query doesn't give you the prices or other details. That would need to go in an extra query. But perhaps my assumptions are all wrong anyway.
OTHER TIPS
This is similar to the answer given by @mdma, but I use a condition in the join clause for the price range, instead of the HAVING
trick.
SELECT p.id, MAX(p.name),
MIN(v.price_per_day) AS price_low,
MAX(v.price_per_day) AS price_high
FROM properties p
JOIN properties_prices v ON p.id = v.property_id
AND v.price_per_day BETWEEN 10 AND 100
AND v.date_begin < '2010-12-31' AND v.date_end > '2010-05-01'
GROUP BY p.id
ORDER BY ...
LIMIT 10;
I would also recommend creating a covering index:
CREATE INDEX prices_covering ON properties_prices
(property_id, price_per_day, date_begin, date_end);
This allows your query to run as optimally as possible, because it can read the values directly from the index. It won't have to read the rows of data from the table at all.
+----+-------------+-------+-------+-----------------+-----------------+---------+-----------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------+-----------------+---------+-----------+------+--------------------------+
| 1 | SIMPLE | p | index | PRIMARY | PRIMARY | 4 | NULL | 1 | |
| 1 | SIMPLE | v | ref | prices_covering | prices_covering | 4 | test.p.id | 6 | Using where; Using index |
+----+-------------+-------+-------+-----------------+-----------------+---------+-----------+------+--------------------------+
This can also be done as a GROUP BY, which I think will be quite efficient, and we get some aggregates as part of the package:
SELECT
prperty_id, MIN(price_per_day), MAX(price_per_day)
FROM
properties_prices
WHERE
date_begin <= "2010-12-31" AND date_end >= "2010-05-01"
GROUP BY
property_id
HAVING MIN(IF( (price_per_day BETWEEN 10 AND 100), 1, 0))=1
ORDER BY ...
LIMIT 10
(I don't have MySQL to hand so I haven't tested. I was unsure about the MIN(IF ...) but a mock-up using a CASE worked on SQLServer.)