What is the method to identify bottlenecks in a software engineering programme?

https://softwareengineering.stackexchange.com/questions/336896

02-01-2021
|

Question

I'm on a software development programme in financial services - with 100 developers, plus testers, BAs, PMs and other support staff.

We've read through Implementing Lean Software Development, and The Phoenix project, which both talk about identifying the bottleneck in your flow, and optimising it away. (Some similarities to critical path in a project).

Intuitively, we can identify the bottlenecks as, number of testing environments, amount of time and effort required for regression testing, size of the monolith, number of developers and so on. What we're trying to do is boil it down to the one bottleneck that holds everything else up. (Like a manufacturing process flow).

Applying Lean Software Development talks about value stream analysis - but doesn't quite go far enough to identify the one blocker that is critical to the whole system.

My question is: What is the method to identify the key bottleneck in a software engineering programme?

EDIT: Additional Assumptions:

In my environment - funding is allocated for large chunks of scope to be delivered at a specific date. In essence, the quality, scope and time is locked in up front. (With some variances for scope and time if absolutely required).
This means there lacks a concept of 'small pieces moving through the system'. There are only large projects with lots of stories (60+ stories - each with 10 days of work in them).
This is a somewhat waterfall-like environment (as much as Sarbanes Oxley dictates) with a separate System Integration Test and User Acceptance Test phase.

Solution

One method to identify the most important bottlenecks is to make it visible what stages the work items go through.

As a start, try to follow a couple of work items (new features, bugs, improvements, etc.) through the complete cycle from the item becoming known to the team until the point where it has been successfully deployed into production. Write down what steps need to be taken to go through the complete path to production and where the ticket might get placed on a pile waiting for someone else to continue work on it or waiting for some other reason.

This can all be made visible by using a kanban board. In its most simple form, a kanban board consists of a number of columns representing the work-stages and wait-states in the development process and sticky notes for the work items.
Each sticky note gets moved across the board according to where it is in the development process or what it is waiting for.

Using a kanban board, you can identify bottlenecks by seeing tickets pile up in a column or by seeing that tickets get pulled out of a column faster than that new tickets come in.

If a waiting column fills up faster than that tickets get removed, that is an indicator that the resources that the tickets are waiting for are overloaded.
If a "doing work" column contains significantly more tickets than team members that can work on them, that is an indication that the team is working on too many things at the same time (leading to inefficiency due to context switching) or that a waiting state was missed.
If a waiting column regularly runs completely empty (tickets go out faster than they come in), then that is an indication that the team pulling those tickets is under-utilized and/or over-staffed.

The key bottleneck is the column where these effects are most strongly visible.

OTHER TIPS

I think that Bart van Ingen Schenau's answer is very good. It's essentially a real-time value stream map. However, I do have some other suggestions that may help you on top of that answer.

First, consider tracking the time in each state for a task. Tools should be able to provide this. If your tool doesn't, or you are using a physical board, you can write the date of transition on each card. This will allow you to get the average time and identify phases that take a long amount of time. However, something else that you need to capture is the time waiting versus time active. Again, making notes on the card can help with this and at the end of a cycle, you can put the times from the cards into something to analyze.

Second, consider the size of the activities. If you are using a Kanban board like what Bart suggested, you may want to consider having smaller grained columns or creating value stream maps for what happens inside a column.

Once you know your times, optimize the longest times first. First, try to reduce "waste" time, or the time that isn't in an active state. Second, look at trying to reduce the time in process.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange