Pergunta

I have 3 Mysql tables:

[block_value]

  • id_block_value
  • file_id

[metadata]

  • id_metadata
  • metadata_name

[metadata_value]

  • meta_id
  • value
  • blockvalue_id

In these tables, there are pairs: metadata_name = value And list of pairs are put in blocks (id_block_value)

(A) If I want height = 1080:

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "height" and value = "1080");

+---------+
| file_id |
+---------+
|      21 | 
|      22 |
(...)
|    6962 |
(...)
|    8146 | 
|    8147 | 
+---------+
794 rows in set (0.06 sec)

(B) If I want file extension = mpeg:

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "file extension" and value = "mpeg");

+---------+
| file_id |
+---------+
|    6889 | 
|    6898 | 
|    6962 | 
+---------+
3 rows in set (0.06 sec)

BUT, if I want:

  • A and B
  • A or B
  • A and not B

Then, I don't know what is the best.

For A or B, I tried A union B which seems to do the trick.

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "height" and value = "1080")
UNION
SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "file extension" and value = "mpeg");
+---------+
| file_id |
+---------+
|      21 | 
|      22 | 
|      34 |
(...)
|    6889 | 
|    6898 | 
+---------+
796 rows in set (0.13 sec)

For A and B, since there are no intersect in Mysql, I tried A and file_id in(B), but look at perfs (>4mn)...

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "height" and value = "1080")
and file_id in(
SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "file extension" and value = "mpeg"));

+---------+
| file_id |
+---------+
|    6962 | 
+---------+
1 row in set (4 min 36.22 sec)

I tried B and file_id in(A) too, which is a lot better, but I will never know how which one to put first.

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "file extension" and value = "mpeg")
and file_id in(
SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "height" and value = "1080"));

+---------+
| file_id |
+---------+
|    6962 | 
+---------+
1 row in set (0.75 sec)

So... what do I do now? Is there any better way for boolean operations? Any tip? Did I miss something?

EDIT: what data looks like:

This database contains a row in FILE table for each audio/video file inserted:

  • 10, /path/to/file.ts
  • 11, /path/to/file2.mpeg

There is a row in METADATA table for each potential information:

  • 301, height
  • 302, file extension

Then, a row in BLOCK table define a container:

  • 101, Video
  • 102, Audio
  • 104, General

A file can have several blocks of metadata, a BLOCK_VALUE table contains instances of BLOCKS:

  • 402, 101, 10 // Video 1
  • 403, 101, 10 // Video 2
  • 404, 101, 10 // Video 3
  • 405, 102, 10 // Audio
  • 406, 104, 10 // General

In this example, file 10 has 5 blocks: 3 Video (101) + 1 Audio (102) + 1 General (104)

Values are stored in METADATA_VALUE

  • 302, 406, "ts" // file extension, General
  • 301, 402, "1080" // height, Video 1
  • 301, 403, "720" // height, Video 2
  • 301, 404, "352" // height, Video 3
Foi útil?

Solução

I'm opening a new post only to keep the "correct" solution tidy..

Ok, sorry, it seemed that I was making the wrong assumption. I never thought about two blocks being defined exactly the same way.

So, since I'm a copycat, and I like my getting the AND from OR solution (:P), I got to these two solutions..

ORing: I like Chris's solution better...

SELECT DISTINCT file_id 
  FROM metadata_value MV 
    INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
    INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
   WHERE (metadata_name = "height" and value = "1080") 
      OR (metadata_name = "file extension" and value = "mpeg")

ANDing: I'll use your ORing version (the one with the UNION all

  SELECT FILE_ID FROM (
     SELECT DISTINCT 1, file_id 
             FROM metadata_value MV 
       INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
       INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
              WHERE (metadata_name = "height" and value = "1080")
     UNION ALL
     SELECT DISTINCT 2, file_id 
             FROM metadata_value MV 
       INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
       INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
              WHERE (metadata_name = "file extension" and value = "mpeg")
   ) IHATEAND
   GROUP BY FILE_ID
   HAVING COUNT(1)>1

Which gives:

+---------+
| FILE_ID |
+---------+
|    6962 |
+---------+
1 row in set (0.24 sec)

it should be a little less fast than the ORing seeing the performances you pasted and mines (I am 3 times as slow, time to upgrade -.-), but still significantly faster than the previous queries ;)

Anyway, how does the ANDing work? Put pretty simply, it just does the two separate queries and names the records according to the branch they come from, then counts the different file ids coming from them

UPDATE: another way of doing it without having to "name" the branches:

SELECT FILE_ID FROM (
    SELECT file_id 
        FROM metadata_value MV 
        INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
        INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
            WHERE (metadata_name = "height" and value = "1080")
    GROUP BY FILE_ID
    UNION ALL
    SELECT file_id 
        FROM metadata_value MV 
        INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
        INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
    WHERE (metadata_name = "file extension" and value = "mpeg")
    GROUP BY FILE_ID
    ) IHATEAND
GROUP BY FILE_ID
HAVING COUNT(1)>1

Here the results are the same (and performances as well) and I'm exploiting the fact that while UNION automatically sorts the duplicates and removes the duplicates, UNION ALL does not... which is perfect since I don't want them removed (and in general union all is also faster than union :) ), this way I can forget about naming.

Outras dicas

For "OR" why not try it without the UNION... am I missing something?

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "height" and value = "1080") 
OR (metadata_name = "file extension" and value = "mpeg")

For "AND", use an inner join on the metadata table twice to ensure to get only file_id's that meet both conditions...

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     AND (M.metadata_name = "height" and MV.value = "1080")
     INNER JOIN metadata M2 ON MV.meta_id = M2.id_metadata
     AND (M2.metadata_name = "file extension" and MV.value = "mpeg")
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 

"A" and not "B", use a left join rather than an inner join on the "B" condition. Add a WHERE clause specifying that you expect no results for "B"

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     AND (M.metadata_name = "height" and MV.value = "1080") 
     LEFT JOIN metadata M2 ON MV.meta_id = M2.id_metadata
     AND (M2.metadata_name = "file extension" and MV.value = "mpeg")
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE M2.id_metadata is NULL

OR version: (shameless copy and paste from ChrisCamp's answer)

 SELECT distinct file_id 
   FROM metadata_value MV 
      INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
      INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "height" and value = "1080") 
   OR (metadata_name = "file extension" and value = "mpeg") 

AND Version:

SELECT file_id 
  FROM metadata_value MV 
   INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
   INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
   WHERE (metadata_name = "height" and value = "1080") 
      OR (metadata_name = "file extension" and value = "mpeg") 
  group by file_id having count(1)>1

2 quicks notes about the AND version:

This is actually a way to define that Intersection in terms of the previous ORing..

When ANDind you have 3 possibilities:

  • none of the requested condition is satisfied (in the ORing it would not appear)
  • only one of them is satisfied (in the ORing it would appear once)
  • both are satisfied (in the ORing it would appear twice, if distinct is not specified).

So I just removed the distinct clause, put a group by, and selected the records being present twice.

Or just keep using the exists clause :)


Edit following comments:

ok, trying to keeping things simple... id_block_values satisfying one of the two conditions:

SELECT BLOCK_VALUE_ID
   FROM METADATA_VALUE MV
     INNER JOIN 
        METADATA M
     ON MV.META_ID=M.METADATA_ID
  WHERE (METADATA_NAME='height' AND VALUE='1080')
     OR (METADATA_NAME='file extension' AND VALUE='mpeg')

if you have more than 2 records here, u have a problem (duplication of metadata).

Now the ANDing

SELECT FILE_ID
  FROM BLOCK_VALUE BV
    INNER JOIN   
      (   SELECT BLOCK_VALUE_ID
            FROM METADATA_VALUE MV
            INNER JOIN 
                 METADATA M
              ON MV.META_ID=M.METADATA_ID
           WHERE (METADATA_NAME='height' AND VALUE='1080')
              OR (METADATA_NAME='file extension' AND VALUE='mpeg')
      ) X
  ON BV.ID_BLOCK_VALUE=X.BLOCK_VALUE_ID
 GROUP BY FILE_ID HAVING COUNT(1)>1

Still, I cannot understand why the previous query did not work.. I fear that if you remove the DIstinct clause in the or query as well, u would see some records more than twice, which does not make sense. Btw, just to be sure, could you please tell me what the primary keys of the tables are?

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top