Question

I've a scheme of two main tables: problem and tag and a relation (which is a many to many connector) table: problem_tags which excerpt of them is like:

Problem table:

----+------------------------------------+--------
 id |                name                | rating 
----+------------------------------------+--------
  1 | Special Permutation                |   1600
  2 | Binary String Reconstruction       |   1500
  3 | Special Elements                   |   1500
  4 | Alice, Bob and Candies             |   1300
  5 | K-th Not Divisible by n            |   1200
  6 | Same Parity Summands               |   1200
  7 | Sum of Round Numbers               |    800
  8 | Skier                              |   1400
  9 | Square?                            |    900

Tag table:

 id |           name            
----+---------------------------
  1 | constructive algorithms
  2 | dfs and similar
  3 | math
  4 | brute force
  5 | implementation
  6 | two pointers
  7 | binary search
  8 | data structures

Problem tags table:

 problem_id | tag_id 
------------+--------
          1 |      1
          2 |      1
          2 |      2
          2 |      3
          3 |      4
          3 |      5
          3 |      6
          4 |      5
          5 |      3
          5 |      7

My question is how can I filter out problems based on multiple tags, i.e. all problems that are tagged math and binary search and brute force; or all problems that are tagged math but not constructive algorithms; or for a more complex one all problems that are only tagged with math and implementation and nothing else?

Currently I've come up with something like this:

  1. Find all problem's ids that are tagged math (joining tag and problem_tags table)
  2. Find all problem's ids that are tagged binary search
  3. Find all problem's ids that are tagged brute force
  4. Get intersection of all above ids
  5. Select problems where their ids is in the above intersection

But my solution lacks when it reaches the second example (only tagged with selected tags) and I think it's not the most optimal and SQL-ish way for doing it.

Was it helpful?

Solution

I had just asked that on a different forum. The best solution for a 3rd Normal Form schema design was to select all that is needed in the linking table and compare the Cardinality (count) of the matching results.

The other forum was for Oracle. The answer you seek is the PostgreSQL version of this post: https://community.oracle.com/message/15613817#15613817

-- copy paste from original post
-- Oracle version
select *     
from my_recipes rp    
where rp.recipe_id  in  
(    
    select tg.recipe_id     
    from my_recipe_tags tg    
    where tg.tag_id in    (
                            -- expands string '1:2:3' into a table w/ 3 row    
                            select to_number(regexp_substr(:p_search_tags,'[^:]+', 1, level) )    
                            from  dual    
                            connect by regexp_substr(:p_search_tags, '[^:]+', 1, level) is not null    
                        )   
    group by tg.recipe_id                         
                      -- counts cardinality of Tags in search
    having count(*) = regexp_count(:p_search_tags, '[^:]+')                               
) ;

Untested (needs to be fixed)

select *
from PROBLEMS p
where p.id in (
    select pt.problem_id
    from PROBLEM_TAGS pt
        join TAGS t on pt.tag_id = t.tag_id
    where t.name in ('math','binary search', 'brute force') -- should be from a CTE
    group by pt.problem_id
    having count(*) = 3 -- count(*) from CTE
)

For the 2nd version, add AND p.id NOT IN clause.

For the 3rd version add a filter that ensures pt.problem_id has the exact cardinality of the queried tags.

and p.id in (
    select pt.problem_id
    from PROBLEM_TAGS pt
    group by pt.problem_id
    having count(*) = 2 -- you have two tags for 3rd one
                        -- you should pull Cardinality from CTE also
)
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top