Question

I need to find a way to eliminate a dependent sub-query.

I have a table of articles that can have multiple languages. The simplified table structure is as follows:

id, title, language, translation_set_id

1 A    en 0
2 B    en 2
3 B_ru ru 2
4 C    en 4
5 C_ru ru 4
6 D    en 6
7 D_fr fr 6

The translation_set_id is 0 when an article doesn't have translations, or is set to the id of the base translation. So B is the original English article, and B_ru is the Russian translation of the article.

I need a query that would allow me to return all Russian articles, or if they don't exist the original language articles. So it would return.

1 A    en 0
3 B_ru ru 2
5 C_ru ru 4
6 D    en 6

So far I have this:

SELECT id, title, language, translation_set_id
FROM articles a
WHERE 
  a.translation_set_id = 0
  OR (a.language = 'ru')
  OR (a.id = a.translation_set_id AND
       0 = (SELECT COUNT(ac.id)
            FROM articles ac
            WHERE ac.translation_set_id = a.translation_set_id 
            AND ac.language = 'ru')
     )

But this executes the sub-query for each row, creating a dependent query. Is there a way to eliminate the dependent query?

UPDATE: It seems that Neels's solution works, thanks!

But I was wondering if there is a way to generalize the solution to multiple language fallbacks? First try to get French, if that's not present, try Russian, and if that's not present, show base translation (English, or any other, depending on the original creation language)?

UPDATE2: I've built the query I needed for the updated question using Neel's solution and DRapp's solution. It can be found here http://www.sqlfiddle.com/#!2/28ca8/18 but I'll also past the queries here, for completeness sake.

Revised Data:

CREATE TABLE articles (
  id INT,
  title VARCHAR(20),
  language VARCHAR(20),
  translation_set_id INT);

INSERT INTO articles values
  (1,'A','en',0),
  (2,'B','en',2),
  (3,'B_ru','ru',2),
  (4,'C','en',4),
  (5,'C_ru','ru',4),
  (6,'D','en',6),
  (7,'D_fr','fr',6),
  (8,'E_ru','ru', 0),
  (9,'F_fr','fr', 0),
  (10,'G_ru','ru', 10),
  (11,'G_fr','fr', 10),
  (12,'G_en','en', 10);

Original query with 2 correlated sub-queries:

SELECT id, title, language, translation_set_id
FROM articles a
WHERE
  a.translation_set_id = 0
  OR (a.language = 'fr')
  OR (a.language = 'ru' AND
       0 = (SELECT COUNT(ac.id)
            FROM articles ac
            WHERE ac.translation_set_id = a.translation_set_id
            AND ac.language = 'fr'))
  OR (a.id = a.translation_set_id AND
       0 = (SELECT COUNT(ac.id)
            FROM articles ac
            WHERE ac.translation_set_id = a.translation_set_id
            AND (ac.language = 'fr' OR ac.language = 'ru'))
     );

Revised query:

SELECT  a.*
FROM articles a
LEFT JOIN articles ac ON ac.translation_set_id = a.id
  AND ac.language = 'fr'
LEFT JOIN articles ac2 ON ac2.translation_set_id = a.id
  AND ac2.language = 'ru'
WHERE a.translation_set_id = 0
  OR a.language = 'fr'
  OR (a.language = 'ru' AND ac.id IS NULL)
  OR (a.id = a.translation_set_id AND ac2.id IS NULL AND ac.id IS NULL);
Was it helpful?

Solution 2

Check out this SQL Fiddle:

http://www.sqlfiddle.com/#!2/c05d0/15

You can use this simple query to achieve your result.

SELECT  a.*
FROM articles a
LEFT OUTER JOIN articles ac ON ac.translation_set_id = a.translation_set_id 
AND ac.language = 'ru'
WHERE a.translation_set_id = 0
OR a.language = 'ru'
OR (a.id = a.translation_set_id AND ac.id IS NULL); 

OTHER TIPS

Per slight modification adjustment from Ypercube on a more simplified where clause, and your need to NOT be able to utilize coalesce(), I have revised to this below.

Get all articles that are either Translated = 0, OR The ID IS the same as the Translated indicating it must have been the original document before it was translated to something else. That said, you are guaranteed all original documents.

Now, the left-join. IF THERE IS a corresponding "Russian" article (or other language translation of interest), grab that ID and it's translated title along with it. So the returned record has both the original AND the translated references.

SELECT
      a1.id as OriginalAricleID,
      a1.title as OriginalTitle,
      a1.language as OriginalLanguage,
      a2.id as TranslatedAricleID,
      a2.title as TranslatedTitle
   from
      Articles a1
         LEFT JOIN Articles a2
            ON a1.id = a2.translation_set_id
            AND a2.language = 'ru'
   where
         a1.translation_set_id = 0
      OR a1.id = a1.translation_set_id 

It goes through the table once and no duplicates. The left-join points to the same articles table, but ONLY for the Russian language set based on the original article.

You could use a LEFT JOIN:

SELECT a.id, a.title, a.language, a.translation_set_id
  FROM articles a
 LEFT JOIN articles ac ON ac.translation_set_id = a.translation_set_id 
                      AND ac.language = 'ru'
 WHERE a.translation_set_id = 0
    OR (a.language = 'ru')
    OR (    a.id = a.translation_set_id 
        AND ac.id IS NULL
       )
 GROUP BY a.id, a.title, a.language, a.translation_set_id

Rewrite this part:

AND
       0 = (SELECT COUNT(ac.id)
            FROM articles ac
            WHERE ac.translation_set_id = a.translation_set_id 
            AND ac.language = 'ru')

into an anti-join condition:

AND NOT EXISTS (
                SELECT 1
                FROM articles ac
                WHERE ac.translation_set_id = a.translation_set_id 
                AND ac.language = 'ru'
)

This may speed-up the query, because MySql must always read all rows to obtain the count(),
but when using NOT EXISTS (or EXISTS) it stops reading the table when it finds 1st row that meets criteria.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top