Pergunta

This is quite a confusing issue for me. I have a database full of baseball statistics. Running this query:

SELECT * FROM hits
JOIN stadiums ON stadiums.gameName = hits.gameName
JOIN players ON (players.gameName = hits.gameName AND players.id = hits.batter)
JOIN games ON games.gameName = hits.gameName
WHERE games.type = 'R'
LIMIT 50

Returns:

/* 0 rows affected, 50 rows found. Duration for 1 query: 0.218 sec. */

But running this query:

SELECT * FROM hits
JOIN stadiums ON stadiums.gameName = hits.gameName
JOIN players ON (players.gameName = hits.gameName AND players.id = hits.batter)
JOIN games ON games.gameName = hits.gameName
WHERE games.leagueLevel = 'mlb'
LIMIT 50

Hangs for a long time. The index on the games table is only games.gameName and nothing else.

SELECT DISTINCT type FROM games gives 8 single-character rows (VARCHAR 1) including one NULL.

SELECT DISTINCT leagueLevel FROM games gives 6 three-character rows (VARCHAR 5) including one NULL.

I have no idea why the second query would be extraordinarily slow while the first one runs just fine.

Thanks for your help.

Foi útil?

Solução

VIEWPOINT #1 : You many need to take a look at column value population

SELECT COUNT(1) rowount,type FROM games GROUP BY type WITH ROLLUP;
SELECT COUNT(1) rowcount,leaguelevel FROM games GROUP BY leaguelevel WITH ROLLUP;

From your question, I gather two things:

  1. The number of rows in games with type='R' must be a low number against the number of rows in the games table.
  2. The number of rows in games with leaguelevel='mlb' must be a high number (greater than 5% of the table) against the number of rows in the games table. (5% is a rule-of-thumb number in the eyes of Query Optimizers)

VIEWPOINT #2 : You may need to refactor this query

Notice that query will perform the WHERE portion after all JOINs are complete. If the WHERE portion could be performed earlier that could help reduce the time. Try reorganizing the query like this:

SELECT * FROM hits
JOIN stadiums ON stadiums.gameName = hits.gameName
JOIN players ON (players.gameName = hits.gameName AND players.id = hits.batter)
JOIN (SELECT * FROM games WHERE leagueLevel = 'mlb') games
ON games.gameName = hits.gameName
LIMIT 50;

VIEWPOINT #3 : Retrieve only the columns you really need

I see you have SELECT * and you have four tables (hits, stadiums, players, games). You will have a lot of duplicate data to drag into the query, particularly when dragging the gameName column from all four tables.

You should reorganize the query to bring only one gameName column:

SELECT hits.gameName,hits.*,players.*,staduims.*,games.* FROM hits
JOIN stadiums ON stadiums.gameName = hits.gameName
JOIN players ON (players.gameName = hits.gameName AND players.id = hits.batter)
JOIN (SELECT * FROM games WHERE leagueLevel = 'mlb') games
ON games.gameName = hits.gameName
LIMIT 50;

Additionally, if you do not need every column from the hits tables, then only include the column you know you will access. The same goes for players, stadiums, and games.

In other words, as an example, if you only need the playerName from the player table, then you do not need player.* in the SELECT. You will need just player.playerName.

VIEWPOINT #4 : You may need to index the leagueLevel column

You will need to do the following to make the needed index:

ALTER TABLE games ADD INDEX (leagueLevel);

Before doing so, run this

SELECT COUNT(1) rowcount,leaguelevel FROM games GROUP BY leaguelevel WITH ROLLUP;

Any value for leagueLevel whose count is greater than 5% of the table will cause the MySQL Query Optimizer not to use the index.

Outras dicas

I have no idea why the second query would be extraordinarily slow while the first one runs just fine.

You are not alone - it is very common to have to delve deeper into the specific whys and wherefores of query execution.

You need to learn to use the tools you have available, starting with explain and explain extended. Let us know how you get on...

You're just lucky enough to get the first query returns fast. I agree with @Jack Douglas, use explain, add needed indexes, and repeat until both queries improved much better.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a dba.stackexchange
scroll top