How to calculate overall progress in independent phases?

https://softwareengineering.stackexchange.com/questions/382003

16-02-2021
|

Question

When using a call-back to show progress on a multi-phase task, I don't know how to calculate the progress per phase well. One of the phases is collecting data, which differs in size from run to run.

Putting global percentages into the calls all over the code needs changes as soon as a phase is added/removed, and it does not work well with changing data size.
Using a fixed increment per phase does not help either, as the sum might end up being below or above 100% - or you again fiddle with the increments all over the code.
I even started thinking about recording the time taken, and using these numbers to make dynamic estimations during the next run - but that's overkill, I think.

Are there good ways to determine the progress per phase, somehow automatically?

Are there design patterns to avoid putting fixed numbers into the code?

Solution

A simple solution here if applicable (doesn't consolidate progress for asynchronous operations running independently of each other, though you can still use this method for those to compute percentages within them) and you can anticipate the number of phases in advance in an outer context (even if you can't anticipate how much work is required in each phase prior to invoking them) which leads to a fairly smooth progress bar* is like this:

It might not increment in progress in a perfectly consistent way across phases, but should still keep moving forward without going past 100%.

Pseudocode:

func progress(n):
    if stack_size < n:
        return (steps[n] / total[n]) +
               (1 / total[n]) * progress(n+1);
    else:
        return 0;

Hopefully the diagram and code already gives you an idea of the implementation using a "progress stack" and some simple arithmetic to compute the progress so far, and you can nest these as deep as you want (phases which break down into sub-phases which break down into sub-sub phases) without worrying about overflowing the bar and going past 100% or anything like that.

All you need to do is in an outer context, anticipate how many phases you have (3 in this case), in which case you increment the progress in 33.3% increments per phase. But when we call the function for phase 1, we might discover that this first phase needs 100 sub-steps to complete after calling the function, at which point that 33.3% chunk of progress, to speak, then gets incremented in 100 steps to advance from 0% to 33%, and so forth. The client code might take on this shape:

function some_outer_operation(progress)
{
    // begin performing a total of 3 units/steps of work
    progress.begin(3)
    phase1()
    progress.step()

    phase2()
    progress.step()

    phase3()
    progress.step()
}

function phase1()
{
     // begin performing a total of 100 units/steps of work
     progress.begin(100)
     for j in range(1, 100):
     {
         some_work()
         progress.step()
     }
}

... something of this sort and recursive calls to begin "subdivide" the outer chunk of progress to be incremented. When the number of steps reaches the total for a phase or sub-phase or sub-sub-phase or whatever, then you can pop off the stack. This does still require specifying numbers but doesn't require you to necessarily have that much information upfront about all the steps involved in all these nested function calls, only the number of steps/phases involving in a particular function and not the "sub-functions".

Oddly as simple as this solution is, I point it out because I've seen very convoluted "flat" solutions (without this kind of stack for nested operations) where the developers had to anticipate the total number of steps for everything upfront, and that sometimes requires x-raying the implementation of the functions being called and the function calls of the functions being called and so on. So this avoids that extensive amount of knowledge required upfront and only requires that a function needs to know how many steps it performs, not all the functions deeper into the call stack.

Of course it might not be perfectly consistent in the way it increments (ex: it might take 1 second to go from 0 to 33% and 3 seconds to go from 33% to 66%), but it can stay very smooth and responsive and constantly moving forward in progress provided the steps are granular enough in your "leaf" calls (your phase or sub-phase or sub-sub-sub-sub-phase or whatever).

I even started thinking about recording the time taken, and using these numbers to make dynamic estimations during the next run - but that's overkill, I think.

If it comes to that I'd just go with the advice to just show something going on without a progress bar, per se. That's just getting so fancy for so little in exchange if you can't anticipate even the number of phases involved in advance (ex: a heuristic which requires N passes over data where N cannot be anticipated in advance prior to the algorithm actually completing).

Chunky progress bars which don't increment very smoothly also don't feel so bad if you have like some animating thing next to it and can cancel the operation instantly, for example. It sufficiently communicates that the application hasn't just stalled and become completely unresponsive. So you might not even need to bother to "subdivide" the progress bar for sub-steps of the phases and just instead make sure the UI continues to update and shows something going on at a reasonable frame rate and responds quickly enough if the user can abort even if the bar isn't moving at such a smooth and rapid pace.

OTHER TIPS

Computers are bursty. Hard coding predictions is foolish.

Since you think recording performance is overkill the best thing you can do is offload that task to the user. Just clearly show what phase you're in. You can show a time elapsed to help the user get used to how long this takes on their system.

If your phases have a calculable number of iterations you can show where they are within that. But that will be deceptive if the work done after iterating is significant.

Really most users are happy if it looks like the computer is doing something. Prove to them that the computer isn't hung and they'll wait while you work.

The usual way to do this is to determine how many units of work are going to occur, then having the API return either the number completed or the percentage complete of the total. For something like a download manager, the number of units of work would be the sum of the number of bytes of all files selected for download, for example, or possibly just the total number of files to download. (Both are reasonable things to use for unit of work, but with just the number of files, one file might be 100 bytes, and another 100GB, meaning users of your API won't be getting the best information from your API in that case.)

In your case, you have the option of either breaking it into phases (it sounds like there's a collection phase and a processing phase, at least, maybe more?), where each phase is one unit of work. Do you have a way to tell how many things will be collected in the collection phase? If so, you could add up the number of things that need to be collected, and know how many units of work are done during the collection phase. (Then likewise, each of those needs to be processed presumably, so you know how many units of work in the processing phase, etc.) If not, then you need to either leave the phase as one whole unit or use an indeterminate progress indicator of some sort.

One thing I would avoid doing is adding more units of work after you've already told a client how many units of work there will be. It is very confusing for users (and no doubt developers using your API) when a progress bar regresses as time goes on instead of progressing monotonically.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange