Pregunta

I have a complex database scheme for a dictionary. Each object (essentially a translation) is similar to this:

Entry {
   keyword;
   examples;
   tags;
   Translations;
}

with

Translation {
    text;
    tags;
    examples;
}

and

 Example {
    text;
    translation;
    phonetic_script;
 }

i.e. tags (i.e. grammar) can belong to either the keyword itself, or the translation (grammar of the foreign language), and similar examples can belong either to the translation itself (i.e. explaining the foreign word) or the the text in the entry. I ended up with such kind of relational design:

entries(id,keyword,)
tags(tag)
examples(id,text,...)
entrytags(entry_id,tag)
entryexamples(entry_id,example_id)
translations(id,belongs_to_entry,...)
translationtags(transl_id, tag)
translationexamples(transl_id,example_id)

My main task is querying this database. Say I search for "foo", my current way of handling is:

query all entries with foo, get ids A
foreach id in A
   query all examples belonging to id
   query all tags belonging to id
   query all translations belonging to A, store their ids in B
   foreach tr_id in B
       query all tags belonging to tr_id
       query all examples belonging to tr_id

to rebuild my objects. This looks cumbersome to me, and is slow. I do not see how i could significantly improve this by using joins, or otherwise. I have a hard time modeling these objects to relations in the database. Is this a proper design?

How can I make this more efficient to improve query time?

¿Fue útil?

Solución

Each query being called in the loop takes at a minimum a certain base duration of time to execute, even for trivial queries. Many environment factors contribute to what this duration is but for now let's assume it's 10 milliseconds. If the first query matches 100 entries then there are at a minimum 301 total queries being called, each taking 10 ms, for a total of 3 seconds. The number of loop iterations varies which can contribute to a substantial variation in the performance.

Restructuring the queries with joins will create more complex queries but the total number of queries being called can be reduced down to a fixed number, 4 in the queries below. Suppose now that each query takes 50 ms to execute now that it is more complex and the total duration becomes 200 ms, a substantial decrease from 3000 ms.

The 4 queries show below should come close to achieving the desired result. There are other ways to write the queries such as using subquery or including the tables in the FROM clause but these show how to do it with JOINs. The condition entries.keyword = 'foo' is used to represent the condition in the original query to select the entries.

It is worth noting that if the foo condition on entries is very expensive to compute then other optimizations may be needed to further improve performance. In these examples the condition is a simple comparison which is quick to lookup in an index but using LIKE which may require a full table scan may not work well with these queries.

The following query selects all examples matching the original query. The condition from the original query is expressed as a WHERE clause on the entries.keyword column.

SELECT entries.id, examples.text
  FROM entries
 INNER JOIN entryexamples
    ON (entries.id = entryexamples.entry_id)
 INNER JOIN examples
    ON (entryexamples.example_id = examples.id)
 WHERE entries.keyword = 'foo';

This query selects tags matching the original query. Only two joins are used in this case because the entrytags.tag column is what is needed and joining with tags would only provide the same value.

SELECT entries.id, entrytags.tag
  FROM entries
 INNER JOIN entrytags
    ON (entries.id = entrytags.entry_id)
 WHERE entries.keyword = 'foo'';

This query selects the translation tags for the original query. This is similar to the previous query to select the entrytags but another layer of joins is used here for the translations.

SELECT entries.id, translationtags.tag
  FROM entries
 INNER JOIN translations
    ON (entries.id = translations.belongs_to_entry)
 INNER JOIN translationtags
    ON (translations.id = translationtags.transl_id)
 WHERE entries.keyword = 'foo';

The final query does the same as the first query for the examples but also includes the additional joins. It's getting to be a lot of joins but in general should perform significantly better than looping through and executing individual queries.

SELECT entries.id, examples.text
  FROM entries
 INNER JOIN translations
    ON (entries.id = translations.belongs_to_entry)
 INNER JOIN translationexamples
    ON (translations.id = translationexamples.transl_id)
 INNER JOIN examples
    ON (translationexamples.example_id = examples.id)
 WHERE entries.keyword = 'foo';
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top