I'm working on an application that takes data from various sources and generates reports. Currently I'm changing it to do reports based on a the data from a given date in history, previously it only showed data from the state of things today.

One of my data sources is Bugzilla, so I need to get the Bugzilla data for a given date in history. I have a read-only connection to the Bugzilla database but no easy way to do anything else to the server (like install plug-ins, or put procedures in the database). Also the connection between the report server and the Bugzilla server is slow, so I'd like to do the calculations on the server rather than fetch the data and work things out on the reports server.

I actually have this working at a mostly acceptable speed, but I'm not sure if I'm doing it the best or 'right' way, I'm concerned that the speed might cease to be acceptable as we add more issues to the database.

So, my solution is below -- how would you do it.

For a bit of background, Bugzilla stores the current state of all bugs in a table (called 'bugs') and a history of the changes to each field in a table ('bugs_activity') that looks something like this:

fieldid   INTEGER,   -- References the fielddefs table
bug_when  TIMESTAMP, -- Time the change happend
added     TEXT,      -- New text for the field
removed   TEXT,      -- Old text for the field

The Bugzilla database is MySQL. I think the right way to do it is either with a stored procedure or a temporary table, but I don't have either option available to me. I know there are also reporting tools for Bugzilla but I don't have access to install them, also the reports I'm generating tie up data from other sources (and have specific formatting).

There's a local PostgreSQL database on the reports server so I could just periodically mirror all the data over to there, but I really don't want to do that as it seems a bit wasteful to store identical data in two places.

My solution is to build a table in a subselect that looks like the normal bugs table (for the data I'm interested in for a given report) and then use this select as the source to the normal select that works the same as query for the reports based on today's data.

SELECT bug_status, bug_id, op_sys, resolution, rep_platform   
  FROM (SELECT bug_id, 
        IFNULL((SELECT removed FROM bugs_activity a, fielddefs f  
                 WHERE a.fieldid = f.id   
                   AND bug_id = b.bug_id AND f.name = 'bug_status' 
                   AND bug_when >= '2012-01-01 00:00:00'  
                 ORDER BY bug_when DESC LIMIT 1), bug_status) AS bug_status,
    -- Repeat IFNULL clause for op_sys, resolution and rep_platform
        FROM bugs b 
        WHERE b.creation_ts <= '2012-01-01 00:00:00' ) bug_subselect
        -- Some other filters to reduce the bugs (specific product, ect)
      )
    -- More filters based on the new values that have been derived
     ;

Then I use that as an input to a select that counts the different statuses, etc.

This query turns out to be way too slow, I'm assuming because it's getting the entire results for the inner selects so it can order then and give me the top one.

I did try doing it by LEFT JOINing the bugs_activity table onto the bugs table several times and then doing the IFNULL queries on the results, that was fast but a little complex to maintain in the generation code so adapted it to this:

SELECT bug_status, bug_id, op_sys, resolution, rep_platform   
  FROM (SELECT bug_id,
    IFNULL((SELECT removed FROM bugs_activity a, fielddefs f 
             WHERE a.fieldid = f.id AND bug_id = b.bug_id AND f.name = 'bug_status'
               AND bug_when = (
                     SELECT MIN(bug_when) FROM bugs_activity a, fielddefs f 
                      WHERE a.fieldid = f.id 
                            AND bug_id = b.bug_id 
                            AND f.name = 'bug_status'
                        AND bug_when >= '2012-01-01 00:00:00' 
                          LIMIT 1 
                        )
             LIMIT 1), bug_status) AS bug_status,
        -- Repeat IFNULL clause for op_sys, resolution and rep_platform
        FROM bugs b 
        WHERE b.creation_ts <= '2012-01-01 00:00:00' ) bug_subselect
        -- Some other filters to reduce the bugs (specific product, ect)
      )
    -- More filters based on the new values that have been derived
      ;

You need both LIMIT 1's in there (I think) as some fields have managed to have two changes on the same timestamp (either a database glitch, maybe from an upgrade, or two users editing the same bug -- I'm not sure, I just know that it's in there and I need to deal with it).

This runs in around 3 seconds with no filters to reduce the bug list (which is the worst case and will almost never happen), and it runs faster with filters. The LEFT JOIN version runs in roughly the same speed (slightly slower) so I went with the one above. It's OK for the moment, but I can see it getting slow in the future -- I'll add a loading message in the GUI and there's already a message saying these reports can take longer to generate, I'm just wondering if I'm missing some trick to make it faster.

有帮助吗?

解决方案

If I am getting you correctly you could try this..

SET @tdate = '2012-01-01 00:00:00';

SELECT  
  b.bug_id
  ,CASE 
    WHEN s.removed IS NULL THEN b.bug_status
    ELSE s.removed
  END AS statusAtDate
  ,CASE 
    WHEN o.removed IS NULL THEN b.op_sys
    ELSE o.removed
  END AS apSysAtDate
FROM
  bugs AS b 
  LEFT OUTER JOIN (
    SELECT 
      a.bug_id
      ,a.bug_when
      ,a.removed
      ,a.bug_when
      ,@row_num := IF(@last=a.bug_id,@row_num+1,1) AS rnk
      ,@last:=a.bug_id
    FROM 
      bug_activity AS a
      INNER JOIN fielddefs AS f
        ON a.fieldid = f.id
          AND f.name = 'bug_status'
    WHERE
        a.bug_when <= @tdate
    ORDER BY 
      a.bug_id
      ,a.bug_when
    ) AS s
      ON b.bug_id = s.bug_id
      AND s.rnk=1
  LEFT OUTER JOIN (
    SELECT 
      a.bug_id
      ,a.bug_when
      ,a.removed
      ,a.bug_when
      ,@row_num := IF(@last=a.bug_id,@row_num+1,1) AS rnk
      ,@last:=a.bug_id
    FROM 
      bug_activity AS a
      INNER JOIN fielddefs AS f
        ON a.fieldid = f.id
          AND f.name = 'op_sys'
    WHERE
        a.bug_when <= @tdate
    ORDER BY 
      a.bug_id
      ,a.bug_when
    ) AS o
      ON b.bug_id = o.bug_id
      AND o.rnk=1

--repeat for resolution and rep_platform

sorry I don't have a db here to verify the code so sorry if there is typos or similar..

I don't know if that's how you were doing the left outer join before but does that help/work if you use a session variable for re-use?

not sure if this will be of any assistance at all seeing as you said that your left outer join method was running at the same speed anyway.. maybe mysql query optimiser can figure a better way of doing it without this :/

I am no optimisation expert by the way (far from it).. just saying what i would try other than the good suggestion to get some indexes on the go.

EDIT:

Another thing you could try.. I think this should work..

SELECT
  bug_id
  ,bug_status
  ,op_sys
  ,max(old_status)
  ,max(old_opSys)
(
SELECT  
  *
FROM
  bugs AS b 
  LEFT OUTER JOIN (
    SELECT 
      a.bug_id
      ,a.bug_when
      ,if(f.name = 'bug_status',a.removed,NULL) AS old_status
      ,if(f.name = 'op_sys',a.removed,NULL) AS old_opSys
      ,a.bug_when
      ,@row_num := IF(@last=a.bug_id AND@lastField=f.name ,@row_num+1,1) AS rnk
      ,@last:=a.bug_id
      ,@lastField:=f.name
    FROM 
      bug_activity AS a
      INNER JOIN fielddefs AS f
        ON a.fieldid = f.id

    WHERE
        a.bug_when <= '2012-01-01 00:00:00'
        AND f.name in( 'bug_status','op_sys')
    ORDER BY 
      a.bug_id
      ,f.name
      ,a.bug_when
    ) AS s
      ON b.bug_id = s.bug_id
      AND s.rnk=1
) AS T
  GROUP BY
    bug_id
    ,bug_status
    ,op_sys

I have left out the case or if statement from the outer select here.. I was thinking no matter which solution it might be worth testing how it runs doing the final checks in code rather than the DB? even if that works you might not opt for it but it might be worth checking.

as in something like:

<%= row->old_status ?: row->bug_status %>

(sorry if my PHP is off.. not really used it much)

seems like it should work? http://sqlfiddle.com/#!2/eff8c/1

其他提示

I suggest using the Bugzilla REST API interface rather than accessing the Bugzilla DB directly. Here is an example API to retrieve bugs created on a particular date.

https://api-dev.bugzilla.mozilla.org/test/1.3/bug?creation_date=2008-03-31

References:
https://wiki.mozilla.org/Bugzilla:REST_API
https://wiki.mozilla.org/Bugzilla:REST_API:Objects
https://wiki.mozilla.org/Bugzilla:REST_API:Search

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top