質問

I have a many-to-many relationship between Books and Genres. For example "The Hobbit" Book may have the Genres "Kids", "Fiction" and "Fantasy".

Here's the schema:

CREATE TABLE "genre" (
    "id" integer NOT NULL PRIMARY KEY,
    "name" varchar(50) NOT NULL
)
;
CREATE TABLE "book_genres" (
    "book_id" integer NOT NULL REFERENCES "book" ("id"),
    "genre_id" integer NOT NULL REFERENCES "genre" ("id"),
    CONSTRAINT book_genres_pkey PRIMARY KEY (book_id, genre_id)
)
;
CREATE TABLE "book" (
    "id" integer NOT NULL PRIMARY KEY,
    "name" varchar(255) NOT NULL,
    "price" real NOT NULL
)
;

And the indexes:

CREATE INDEX "book_genres_36c249d7" ON "book_genres" ("book_id");
CREATE INDEX "book_genres_33e6008b" ON "book_genres" ("genre_id");
CREATE INDEX "book_5a5255da" ON "book" ("price");

Row counts:

  • genre: 30
  • book_genres: 800,000
  • book: 200,000

I am trying to write a query in SQL which brings back all the Books for specific Genres ordered by price without duplicates.

Here's my query which does this:

SELECT name, price 
FROM book 
WHERE book.id 
IN 
    (SELECT book_id 
    FROM book_genres
    WHERE genre_id = 1
    OR genre_id = 2)
ORDER BY price LIMIT 10

My problem is performance. This query can take up to 2000ms to execute. How can I improve the performance?

I have full control over the database (Postgres 9.3) so can add views, indexes or denormalise. I am also using Django so could perform multiple queries perform operations in memory using Python/Django.

役に立ちましたか?

解決

SELECT b.name, b.price
FROM book b
WHERE EXISTS (
    SELECT *
    FROM book_genres bg
    WHERE bg.book_id = b.id 
    AND bg.genre_id IN( 1 , 2)
    )
ORDER BY b.price 
LIMIT 10
        ;

The order by price+LIMIT can be a performance killer: check the query plan.

PLUS: replace the one-column indices by a "reversed" index: make book_id a FK into books.id and (maybe) omit the surrogate key id


CREATE TABLE book_genres
        ( book_id integer NOT NULL REFERENCES book (id)
        , genre_id integer NOT NULL REFERENCES genre (id)
        , PRIMARY KEY (book_id, genre_id)
        ) ;
CREATE INDEX ON book_genres  (genre_id,book_id);

他のヒント

In most cases you can improve you performance using JOIN instead of subqueries (Although it depends on many factors so ) :

SELECT * 
FROM 
(
   SELECT b.name, b.price 
   FROM book b JOIN book_genres g ON b.book.id = g.book_id 
                              AND g.genre_id = 1
   UNION

   SELECT b.name, b.price 
   FROM book b JOIN book_genres g ON b.book.id = g.book_id 
                              AND g.genre_id = 2
)
ORDER BY price LIMIT 10
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top