Question

I have the following tables:

"crawlresults"
id  |   url                 | fk_crawljobs_id
---------------------------------------------
1   |   shop*com/notebooks  |   1
2   |   shop*com/fridges    |   1
3   |   website*com/lists   |   2


"extractions"
id  | fk_extractors_id  | data          |   fk_crawlresults_id
---------------------------------------------------------------
1   |   1               | 123.45        |   1
2   |   2               | notebook      |   1
3   |   3               | ibm.jpg       |   1
4   |   1               | 44.5          |   2
5   |   2               | fridge        |   2
6   |   3               | picture.jpg   |   3
7   |   4               | hello         |   3
8   |   4               | world         |   3
9   |   5               | hi            |   3
10  |   5               | my            |   3
11  |   5               | friend        |   3


"extractors"
id  |   extractorname
----------------------
1   |   price
2   |   article
3   |   imageurl
4   |   list_1
5   |   list_2

I need to construct a select statement to get columns for each extractor in the extractors table that is used in the extractions table.

Example:

url                 | price     | article   | imageurl
--------------------------------------------------------
shop*com/notebooks  | 123.45    | notebook  | ibm.jpg
shop*com/fridges    | 44.5      | fridge    | NULL

I don't how much extractornames exists when I execute the select statement so it have to be dynamically built.

EDIT: I forgot to mention that it is possible that I have multiple "lists" in my extractions. In this case I need a the following result set.

Example 2:

url                 | list_1    | imageurl      | list_2
--------------------------------------------------------
website*com/lists   | hello     | picture.jpg   | NULL
website*com/lists   | world     | picture.jpg   | NULL
website*com/lists   | NULL      | picture.jpg   | hello
website*com/lists   | NULL      | picture.jpg   | my
website*com/lists   | NULL      | picture.jpg   | friend

Thank you!

Was it helpful?

Solution

You are looking for Dynamic pivot tables.

Code:

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(IF(pa.extractorname = ''',
      extractorname,
      ''', p.data, NULL)) AS ',
      extractorname
    )
  ) INTO @sql
FROM extractors;

SET @sql = CONCAT('SELECT c.url, ', 
  @sql, 
  ' FROM crawlresults c', 
  ' INNER JOIN extractions p on (c.id = p.fk_crawlresults_id)', 
  ' INNER JOIN extractors pa on (p.fk_extractors_id = pa.id)'
  ' WHERE c.fk_crawljobs_id = 1',
  ' GROUP BY c.id');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

Working fiddle


Basically your original query was generating a bogus @sql variable which didn't really extract data for each extractorname. You also don't need all of those joins for creating @sql. You only need each one of the property names (from extractor table) and a reference to the column holding the expect values (data).

When in doubt about the structure, write out a simple pivot query for a fixed set of properties. This way it becomes easy to identify the pattern for writing the dynamic query.

SELECT c.url, 
  MAX(IF(pa.extractorname = 'price', p.data, NULL)) AS price,
  MAX(IF(pa.extractorname = 'article', p.data, NULL)) AS article,
  MAX(IF(pa.extractorname = 'imageurl', p.data, NULL)) AS imageurl 
FROM crawlresults c 
  LEFT JOIN extractions p on (c.id = p.fk_crawlresults_id) 
  LEFT JOIN extractors pa on (p.fk_extractors_id = pa.id) 
WHERE c.fk_crawljobs_id = 1
GROUP BY c.id

As for the rest of your query it is fine, just keep in mind that the LEFT JOINS could be useful if there are no extractions for some crawlresults. Also if your table can contain more than one crawlresult per url / fk_crawljobs_id, grouping by url is a bad idea (MAX can potentially mix the results from multiple extractions).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top