문제

As the title says, why do developers (especially, but not only, new developers) habitually underestimate the work involved in 'greenfield' projects or 'total rewrites'?

We all know that software estimation is not a science, but most problems to be solved are not new, and many elementary problems have been getting solved again and again for several decades, so collectively there is a fair amount of accumulated experience of how long things take (certainly to a granularity of man-years or man-decades).

Very often a developer will look at something that has taken a 3-man team 5 years to deliver, and insist that it is a total mess. But that's 15 man-years to do it badly. Redoing it well may take even more, especially if it hopes to achieve even more in terms of complexity or integration, but even if it takes less, one would be braced for it to still take on the order of several years to redo from scratch.

The obvious answer to the question is ignorance, but I'm looking for a more structural explanation for why ignorance in this area of estimation appears so often to crop up. We have academic courses and we have professional forums where developers exchange knowledge and experience.

Why do we fail to systematically reproduce at least gross rules-of-thumb about how much time various kinds of project tend to consume?

도움이 되었습니까?

해결책

Estimation typically happen at a stage where the project is not specified in detail. The less detail in the specification, the simpler it looks.

When you start specifying in detail, you discover it was much more complex than you first thought. All the special cases, error conditions etc.

This is even the case when rewriting existing software. When looking at a software system from the outside, it looks a lot simpler than it really is. Most complexity is hidden. Unless the existing system have a complete and very detailed specification of its behavior, you will underestimate its complexity.

The solution might seem to be to specify everything in detail up front - the infamous waterfall methods. But this has other problems. Most importantly, the development process is a process of discovery. When the system is presented to the customer, they might realize that it doesn't actually solve their problem. The later this happens, the more costly it is. Therefore agile process uses short iterations with continuous feedback from the product owner. But this also means the project scope is changing all the time. And the initial estimate will be useless.

In addition to these fundamental problems of estimates there are some common pitfalls:

  • Conflating development-time with calendar time. If the estimate is 5 days development, this does not mean the task can be finished in five days! There will be overhead like meetings, installing the development environment, hardware trouble, having to assist with other projects etc. People can be sick, holidays etc.

  • Thinking developers are interchangeable. The person making the estimate might estimate how long would it take for me to solve this task. But this will of course vary from developer to developer depending on experience, familiarity with the system and so on.

  • New framework/platform. The new framework always looks amazing in the proof-of-concept prototype, and seem like it could cut down on development time. But when it is used in a large real-world project, it turns out there are some scenarios it doesn't handle well and require complex workarounds.


most problems to be solved are not new, and many elementary problems have been getting solved again and again for several decades, so collectively there is a fair amount of accumulated experience of how long things take

Solutions to recurring problems becomes standardized in libraries and tools. Using these standard solution saves a lot of time, but this just means the majority of the development time will be spent on the problems that does not have an out-of-box solution available. So estimates does not become easier even though productivity is improved.

다른 팁

Human nature. People in general suffer from biases and reasoning fallacies. The bigger the project, the more unknown unknowns there are. Combine that with a culture where is not, strictly speaking, acceptable to not give a delivery date (be it for a feature, or project in general) and you end up with a rough estimate. The more unknowns there are, the worse the estimate lives up to reality.

What I'd argue, though, is that the actual problem here is in the definition of "bad code". We tend to give that label to everything we don't understand (immediately or after some effort), but sometimes somehow "bad code" becomes "ok code" when someone explains it. A 100-foot view helps a lot but that is rarely available. Trying to understand how the systems works by only looking at the code is like looking at a chair through a microscope and trying to determine what it is. So a failure to properly estimate a project delivery date reflects the failure to understand the project in the first place.

What is a piece of software?

No seriously, answer that and we might be able to get around to estimating changes to it.

What most people will tell you is that its the code and digital artefacts used by that code. Fair enough.

But... It needs something to run on. Say an OS, or some other platform.

But... The platform needs more libraries, platforms, frameworks, and a expertly crafted grain of sand.

But... The expertly crafted grain of sand needs many more grains of sand interlinked by wires.

But... The wires need electrical current, so we need a power planet and a power distribution system.

... I could keep on going down, where does the software end exactly?

How about up. That software talks to other things. Like Humans, and robots.

... Those humans need training, and those robots need software to actually be able to use this software.

.. And those humans need organising into businesses, and the like to really guide that software into being useful.

... and we can keep going up. Again where does it end?

Leaky Abstractions

My point here is that each of these levels is an abstraction of sorts, and like all abstractions details leak.

Even if we take the original definition that software is just the code and digital artefacts (as good a definition as any), it has to deal with those leaky abstractions going up, and the leaky abstractions going down.

What leaks in is messy reality:

  • Bugs, Flaws, Glitches, and power outages from below
  • Common Sense Stupidity, Accidents, and Environmental Systems with positive and negative feedback loops running in size and power from tiny, through to large, quickly through to slowly.

System Types

So realising that software is one aspect of a System it makes sense to look over at what sorts of systems we have: S, P, and E

S Systems are our nice well-behaved problems like sorting. We can define the problem, and we have many ways of solving that problem that are well described and researched.

P Systems are ones where its clear what we want to solve, we have good problem definitions, we might even know an algorithm that will eventually solve the problem provided: good inputs, enough time, or a machine actually capable of running it. Weather prediction fits here:

  • We don't have perfect scanners that can detect the state of every atom on earth at once, so we have poor/approximate input.
  • If physics is correct then we have an algorithm that will solve it, its just not runnable on any machine that exists.
  • The algorithms that are runnable are approximations and can't solve the problem, but can give reasonable guesses.

But we have a very simple way to check if the answer was right, we observe the real weather.

E Systems are the worst. Solving the problem, changes the problem. Even partially solving the problem, changes the problem. Its easier to describe this kind of system using an example: Air traffic control.

There are five planes wanting to land. The software tracks this and assigns them runways. queueing two of them up because there are three runways. Except now a plane wants to take off. So it directs it to the third runway. Except on landing the plane crashes blocking runway three. So the plan has to be rerouted back to runway 2 except one of the landing planes is currently using the taxi way. And fire crews need to get to runway 3.So an operator needs to stop the next plane from landing on runway 2 to clear it, to clear the third runway. And we still have to operate landings and takeoffs without runway 3 for a while. Maybe some planes need to be redirected to another airport, or need to be told to not take off from those other airports because they won't be able to land.

See how quickly the problem went from assigning planes to runways, to stopping planes at other airports from taking off.

Now try estimating for that.

Novelty and Estimation

The core problem is this. If something already existed that solved the problem at hand, we would just copy it and be done. We do this everyday with applications like Word. The estimation would be just for the time to locate the solution and copy some files - something language, library, framework, and application authors have been working on for over half a century.

Unfortunately its not always possible to find the solution to the solved problem in a reasonable amount of time (say the time to just make it again). Sometimes when we find the solution something else gets in the way like: a business wanting to keep hold of its trade secrets, a developer who wants to be paid, or the solution requires a no longer obtainable component.

But copying only works for S and P type systems that have been explored and are no longer novel. You can't copy E systems because that means copying the world around them, so every E system is by definition novel.

Novel means we haven't explored it - Which means learning.

Now estimate how much learning you have to do to understand X.

You can't. The best we can do is find some examples that look similar and approximate.

A P System algorithm. Which means we don't have a usable exact algorithm, we only know how to verify the answer.


To boot we developers are also human, and humans are generally speaking optimists. You can go and review the psychology research to verify that.

They give the estimation they think will allow the project to be started.

Businesses are reluctant to refactor, programmers see a need for refactoring, but know that any estimates not considered "so short that it wont delay anything" will kill any refactoring before it starts.

Or maybe in a less cynical way, you could say they have a bias for being optimistic about projects they see as necessary.

I would add that there is probably a confirmation bias in your assumption. When estimates are too short its a problem, if they are too long you never notice.

There is always a part of unknown

Even if you could exactly estimate how to do something it's only valid with the information at hand at the time. Often the things we want to do change with time, or we missed some details. Also people are people, not machines. You can't have the same work output every day, all year long and that's not even accounting that people in a team can come and go. Talking about man-days and man-years seems like a good abstraction to size a team and forecast delivery, until you find out that not two people are the same and work the same.

You're not applying a cooking recipe but inventing the recipe

No two software are the same, you can't really say "ok we did this once we can redo it exactly the same way" that's because if it was already done, why would you want to redo it ? If you want to do something it's because it's not already done.

Hierarchical pressure

Project management and managers will always want the lowest cost possible, that's understandable. However they will often pressure developers or bargain for lower estimations in place of trusting developers.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 softwareengineering.stackexchange
scroll top