Normalizing story points across teams, is there a big problem?

https://softwareengineering.stackexchange.com/questions/380806

15-02-2021
|

Question

We have been thinking about comparing product size/effort at least roughly and this is what some suggested:

There are multiple products, each with one scrum team
All the scrum teams estimate their stories relative to a common reference story. Therefore, e.g. in Project A, they look at their story and estimate it as requiring twice as much effort than the reference story.
In the end, all the projects are estimated with relation to this reference story and are somehow comparable in terms of expected effort - if one project has 200 SP and the other 400 SP, it could be expected that it is roughly twice as much WORK.
Individual teams have their own velocities, nobody compares that because productivity is of course different.

An analogy: when digging a hole, I can say that a hole 10 meters deep will be 10x more work than a reference hole (which is 1 meter deep). One team will use an excavator and dig their 10 meters deep hole in 30 minutes. The other team will use a spade and spend 2 hours digging their 1 meter deep hole. But the amount of work done is still the same and can be compared (1 vs 10), regardless of productivity. Sure, SW is far from that simple to estimate but it should not be completely off.

Is there a problem with that? To me it seems fine as the teams only need to compare their work with a common reference story and assign points relative to it (as they would do when estimating using with their own reference story). It is completely fine if team A takes a day to finish a story point while team B takes two days, what matter is that the estimation is consistent.

Solution

I have tried this approach with several teams and it does not lead to cross-team efficiency. We always used a reference story, that everybody understood, as the basis and it had a non-1 value (2 in our case) in order to make sure that for things we knew were even smaller than that reference story we could give those a 1.

The concept falls apart as soon as the team is brought into the estimation room and needs to say what something is that is BIGGER than the reference story. Some teams, for some reason, see something as 4 times the reference story, others estimate at 2 times the reference story. They are both correct, because STORY POINTS ARE NOT HOURS.

As long as the team are being consistent in their sizings, the team estimates are valid and you can predict from the velocity that team achieves. But comparing across the teams does not work. Team A, regularly using larger increments from reference, will always seem to be accomplishing more story points in a sprint. Team B will be evaluated as 'low performing', when in actuality they just estimate on a different scale.

This was made even more clear to me when I took about 3 weeks of vacation, allowing for two estimation rounds to be done in my absence with a team lead that ran other teams that had 'higher' estimates. When I returned, our velocity had suddenly jumped dramatically and all the story points for the team were much higher, but we had not actually accomplished any more work.

(This also pointed out that I had an uneven influence on the size of the estimates done by the team)

In conclusion, using a reference story is very helpful. The organization can understand the estimation process and feel like everybody is standardizing. However, beyond that, I would not trust any alignment of estimate sizes across teams unless you also have the same people doing all the estimates and remove the team from the equation.

OTHER TIPS

This is a bad idea. The entire point of estimating by story points is to have an abstract unit that isn't directly comparable to time or across teams. You don't learn any useful information by trying to have story points sized the same between teams. You need a reasonably accurate velocity and total story points to do any real comparison, if team A estimates a 200 point project it doesn't necessarily mean its smaller than team B estimating a 400 point story, you need velocity as well to get a meaningful comparison. It's possible that its 4 sprints of work for team A and 5 for Team B. The more you try to have things be similar the more comparisons will happen, even informally some people will start making those comparisons. There will always be subtle pressure to not be the lowest velocity team in any multiple team environment, by attempting to keep story points consistent across teams this will make that pressure more pronounced.

The first thing I wonder about this is: what do you hope to gain by doing this?

Are you looking to figure out the relative size of projects? Well, I don't see this helping much (aside from giving you a very rough guess as to whether something is bigger or smaller). Story points are, by their nature, only applicable to the people who estimated them.

Story time to help illustrate that. One project I worked on was a website that was mostly CRUD pages but had one super complicated interface. This project was supposed to be a replacement for an old mainframe app and they wanted their main page to mimic the old system in terms of being able to handle rapid keyboard input. And this page had tons of moving parts where changing one input might enable or disable tons of other fields, change validation requirements, hide or show parts of the page, you get the idea.

We had one junior dev who did mostly frontend stuff and me to handle the UI. The junior dev could handle HTML, CSS and some basic jQuery. Had she estimated the effort it would have taken her, it would have been 100+ story points.

I was a senior dev at this point and I suggested we use Angular for that page. And I estimated the work at a total of 30-ish points. Even though there was a learning curve for our junior dev, we ended up with a better UI with less effort than we would have otherwise.

The whole point of that story is that the effort depends a lot on the people doing the work. An experienced team can come up with better designs, work faster or have less problems. Some of this is swallowed up in team velocity, some isn't.

Back to what are you trying to gain. Are you trying to compare how teams work and how good they are? Because I'm pretty sure you are going to get that whether you want that or not.

When story points are relative to a team, the only thing you can legitimately compare a team to is itself. You can track increases in velocity, more accurate estimating, etc. for a given team. But you can't say "Team A gets 50 points done in a 2 week sprint and Team B only gets 30 done. Why is Team B slacking so much?". Any attempt at this can get shut down instantly by remembering that 1 Team A point =/= 1 Team B point (plus a whole bunch of other arguments). But as soon as you have a way to say 1 point is 1 point across teams, now direct comparisons are inevitable. Whether people consciously want to make those comparisons or not, they will happen. And I can promise some manager somewhere will do the same and use it to declare that a team is "under performing".

I may be missing something here, but all I can see is that this will end up with little benefit and a lot of potential downsides. In short, don't do it.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange