Question

I have these tables and I'm trying to rotate the subscriber to column table to horizontal and filter its result based on multiple and/or conditions like the following:

WHERE  first_name LIKE 'm%' AND email LIKE '%com'

This is the fiddle

These are my two tables:

Fields Table

+----+------------+
| id |label       |
+----+------------+
|  1 | email      |
|  2 | first_name |
|  3 | last_name  |
+-----------------+

Subscribers Fields Table

+----+--------------+----------+---------------+-------------------+
| id | mail_list_id | field_id | subscriber_id | value             |
+----+--------------+----------+---------------+-------------------+
|  1 |            1 |        1 |             1 | mark@examble.com  |
|  2 |            1 |        2 |             1 | Mark              |
|  3 |            1 |        3 |             1 | Wood              |
|  4 |            1 |        1 |             2 | luan@domain.com   |
|  3 |            1 |        2 |             2 | Luan              |
|  4 |            1 |        3 |             2 | Charles           |
|  5 |            1 |        1 |             3 | marry@domain.com  |
|  6 |            1 |        2 |             3 | Anna              |
|  7 |            1 |        3 |             3 | Marry             |
|  8 |            2 |        1 |             4 | kevin@domain.com  |
|  9 |            2 |        2 |             4 | Kevin             |
| 10 |            2 |        3 |             4 | Faustino          |
| 11 |            2 |        1 |             5 | frank@examble.com |
| 12 |            2 |        2 |             5 | Frank             |
| 13 |            2 |        3 |             5 | Denis             |
| 14 |            2 |        1 |             6 | max@example.com   |
| 15 |            2 |        2 |             6 | Max               |
| 16 |            2 |        3 |             6 | Ryan              |
+----+--------------+----------+---------------+-------------------+

This is what I tried, but it caused issues that the email and first_name return 0 instead of value. Also it doesn't work with AND operator:

select 
  subscriber_id,
  MAX(case when field_id = '1' then value else 0 end) as email,
  MAX(case when field_id = '2' then value else 0 end) as first_name,
  MAX(case when field_id = '3' then value else 0 end) as last_name
from test_fields_table
WHERE (field_id = 3 AND value LIKE 'm%') OR (field_id = 1 AND value = '%com')
group by subscriber_id limit 100;

If I remove the WHERE condition, the query works with good performance.

I also tried to add my query in a subquery give it an alias and then search that generated virtual table using the alias field name instead of the field id, but in this case I will have to remove the limit parameter from the subquery in order to be able to search for the full table not just in the first 100 records, which causes very bad performance since this table will be too large 100-500 million records and I need to get the query result in under 4 seconds.

Was it helpful?

Solution

You can use HAVING to filter the columns you created:

select 
  subscriber_id,
  MAX(case when field_id = '1' then value else 0 end) as email,
  MAX(case when field_id = '2' then value else 0 end) as first_name,
  MAX(case when field_id = '3' then value else 0 end) as last_name
from test_fields_table

group by subscriber_id
HAVING email LIKE '%com' 
AND last_name LIKE 'M%'
limit 100;

See result

OTHER TIPS

Your field_1 schema is likely to be slower than EAV or JSON. Wordpress, for example, uses an EAV schema pattern -- WP users are often grumbling on this and other forums about poor performance. JSON has pros and cons.

For performance, you must have the more common search columns in a single table with suitable datatypes. Less common search columns can be buried in EAV or JSON and tested by the application.

To allow a customer to add a commonly-searched column requires teaching him about a few datatypes (date, datetime, money, float, integer, string), and fabricating an ALTER to add the column to the table. Adding an index gets messier because it should involve multiple columns. For example, INDEX(last_name), INDEX(first_name) is handy if you only search on one of those columns. But, if the user usually searches on both columns, then you need INDEX(last_name, first_name). This is hard to anticipate.

If your customers will have only a thousand rows, none of this matters much for performance. But, long before a million rows, all methods on the table suffer som or a lot from performance.

Tell me more about the application space. (Documents / General products / Specific products (eg cameras) / Weather sensors / Geographic locations / ...) Maybe I can give some more concrete tips.

"Find the nearest coffee shop" via latitude and longitude is especially tricky; it needs its own discussion. Its performance optimization does not apply to other applications, and vice versa.

Comments on your SQL:

WHERE (field_id = 3 AND value LIKE 'm%')
   OR (field_id = 1 AND value = '%com')
group by subscriber_id
limit 100;

Notes:

  • OR is especially hard to optimize; it is likely to lead to a full table scan, checking every row.
  • value LIKE %com cannot use INDEX(value) because of the leading wildcard. (REVERSE() may be part of a workaround.
  • Because of the GROUP BY, the entire table will be scanned before getting to the LIMIT. That is, the query will be slow regardless of the LIMIT.
  • LIMIT without and ORDER BY does not say which rows you will get.
  • The "field_N" technique fails to make it easy to test numeric data. The numbers 1,2,3,15,26,108 will sort as 1,108,15,2,26,3. (+0 is a workaround, but it defeats the use of any index. WP has this problem.)
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top