Domanda

I have one table that has userID, department, er. I've created one simple query to gather all this information.

SELECT table.userID, table.department, table.er FROM table;

Now, I want to group all er's that belong to the same department and perform this calculation

select sum(table.er)/3 as department_er from table group by table.department;

Then add this result as a new column in my first query. To do this I've created a UDF that looks like this

BEGIN
  DECLARE department_er FLOAT;
  set department_er = (select sum(er) from table where table.department = dpt);
  RETURN department_er;
END

Then I used that UDF in this query

SELECT table.userID, table.department, (select dptER(table.department)/3) as department_er FROM table

I've indexed my tables and more complex queries were dropped from 4+ minutes to less than 1 second. This seems to be pretty simple but is going on 10 minutes to run. Is there a better way to do this or a way to optimize my UDF?

Forgive my n00b-ness :)

È stato utile?

Soluzione

Try a query without a dependent aggregated subquery in SELECT clause:

select table.userID, 
       table.department as dpt,
       x.department_er 
from table 
join (
  select department,
         (sum(table.er)/3) As department_er 
  from table
  group by department
) x
ON x.department = table.department

This UDF function cannot be optimized. Maybe it seems to work in simple queries, but generally it can hurt your database performance.

Imagine that we have a query like this one:

SELECT ....., UDF( some parameters )
FROM table
....

MySql must call this funcion for each record that is retrieved from the table in this query
If the table contains 1000 records - the function is fired 1000 times.
And the query within the function is also fired 1000 times.
If 10.000 records - then the function is called 10.000 times.

Even if you optimize this function in such a way, that the UDF will be 2 times faster, the above query will still fire the function 1000 times.
If 500 users have the same department - it still is called 500 times for each user and calculates the same value for each of them. 499 redundant calls, because only 1 call is required to calculate this value.

The only way to optimize such queries is to take the "inner" query out of the UDF function and combine it with the main query using joins etc.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top