Question

UPDATE: I am using the sql query shown in my question in production, but you are welcome to read the entire thread if you want to see an alternate approach to this, using sql with a UNION

I've experimented and made a result set to be used in a content search, but I want to make sure it's performance is the best it can be.

I have a table named SECTIONS which holds 2 levels of sections, i.e. level 1 (a section) and level 2 (a subsection), in an Adjacency List model

SECTIONS: id, parent_id, name

I query that table twice to get columns in the arrangement

sec_id, sec_name, subsec_id, subsec_name

( this is so I can create uri links like /section_id/subsection_id )

Now I join a separate table named PAGES where a page can be related to a section or a subsection (both not both) through the field section_id

-- columns to return
SELECT
s.id as section_id,
s.name as section_name,
ss.id as subsection_id,
ss.parent_id as subsection_parent_id,
ss.name as subsection_name,
p.section_id as page_section_id,
p.name as page_name

-- join SECTIONS into Sections and SubSections
FROM 
( select id, name from sections where parent_id=0 ) as s

LEFT JOIN
( select id, parent_id, name from sections where parent_id!=0 ) as ss

ON
ss.parent_id = s.id

-- now join to PAGES table
JOIN 
( select id, section_id, name from pages where active=1 ) as p

ON
(
p.section_id = s.id
OR
p.section_id = ss.id 
)
-- need to use GROUP BY to eliminate duplicate pages
GROUP BY p.id

I get duplicate pages in the result set, so I use GROUP BY pages.id to remove the duplicates, but it degrades performance a little.

Can you suggest a better way to eliminate duplicates?

I've thought of creating a column in the SECTIONS join that holds the Section ID OR the Subsection ID (depending on the type of row - section or subsection), and then use that to relate to the PAGES section_id, so there would not be duplicate rows, but I can't figure out how to do it.

Thanks

Was it helpful?

Solution 2

This is gonna be long :(

Note that I didn't use this approach in the end because it's performance was worse the my original attempt using GROUP BY

I had to modify the data table design for the PAGES table to include a new column to hold the id of the subsection that the page belonged to, so now the PAGES table has columns that indicate the section it belongs to, and the subsection also. That structure modification was only for testing and I did not use it in the final version.

Here is the query I created using the concept of a UNION between 2 queries.

SELECT
* 
FROM
  pages AS p
JOIN
-- create derived table of sections and subsections
  ( -- separate query to get sections (parent id = 0 )
    SELECT 
        s.id AS page_sec_id,
        s.id AS sec_id,
        s.name AS sec_name,
        NULL AS subsec_id,
        NULL AS subsec_name,
        s.parent_id AS parent_id
    FROM
        sections AS s
    WHERE
        s.parent_id = 0
   UNION
    -- separate query to get subsection (parent id != 0)
    SELECT
        ss.id AS page_sec_id,
        ss.parent_id AS sec_id,
        -- need to get section name, so had to use weird subquery
        (SELECT name FROM sections WHERE parent_id =0 AND id = ss.parent_id) AS sec_name,
        ss.id AS subsec_id,
        ss.name AS subsec_name,
        ss.parent_id AS parent_id
    FROM
        sections AS ss
    WHERE
        ss.parent_id != 0
   )  AS sss

ON
    -- specify how PAGES table is joined to this derived table of sections and subsections

    -- pages linked to sections only
        ( p.section_id = sss.sec_id AND p.subsection_id = 0 AND sss.parent_id = 0)
        OR
    -- pages linked to subsections only
        ( p.section_id = sss.sec_id AND p.subsection_id = sss.subsec_id )

This UNION query used 0.0388 seconds for 5 rows of Pages and 4 rows of sections/subsections, versus the original query which used 0.0017 seconds, so I stuck with the original as shown above in my question. BTW in my dev environment mysql is running on a P3 Katmai 450 Mhz 256 RAM to force me to write efficient queries :)

Thanks for reading, if you have additional thoughts & comments please add them.

OTHER TIPS

You get duplicate pages because you do not distinguish pages related to a level-1 section from those related to a level-2 section. Instead, treat pages in two separate groups:

-- pages related to a level-2 section
SELECT
    p.id, p.section_id, p.name,
    l1.id AS section_id, l1.name AS section_name,
    l2.id AS subsection_id, L2.name AS subsection_name
FROM pages AS p
JOIN sections AS l2 ON (
    l2.id = p.section_id AND
    l2.parent_id <> 0
)
JOIN section AS l1 ON (
    l1.id = l2.parent_id
)
WHERE active = 1

UNION

-- pages related to a level-1 section
SELECT
    p.id, p.section_id, p.name,
    l1.id AS section_id, l1.name AS section_name,
    NULL, NULL -- do not join with sub-sections, so as to avoid duplicates
JOIN sections AS p_ss ON (
    p_ss.id = p.section_id AND
    p_ss.parent_id = 0
)
WHERE active = 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top