Question

There has been a considerable amout of discussion about code metrics (e.g.: What is the fascination with code metrics?). I (as a software developer) am really interested in those metrics because I think that they can help one to write better code. At least they are helpful when it comes to finding areas of code that need some refactoring.

However, what I would like to know is the following. Are there some evaluations of those source code metrics that prove that they really do correlate with the bug-rate or the maintainability of a method. For example: Do methods with a very high cyclomatic-complexity really introduce more bugs than methods with a low complexity? Or do methods with a high difficulty level (Halstead) really need much more amount to maintain them than methods with a low one?

Maybe someone knows about some reliable research in this area.

Thanks a lot!

OTHER TIPS

Good question, no straight answer.

There are research papers available that show relations between, for example, cyclomatic complexity and bugs. The problem is that most research papers are not freely available.

I have found the following: http://www.pitt.edu/~ckemerer/CK%20research%20papers/CyclomaticComplexityDensity_GillKemerer91.pdf. Though it shows a relation between cyclomatic complexity and productivity. It has a few references to other papers however, and it is worth trying to google them.

Have a look at this article from Microsoft research. In general I'm dubious of development wisdom coming out of Microsoft, but they do have the resources to be able to do long-term studies of large products. The referenced article talks about the correlation they've found between various metrics and project defect rate.

Finally I did find some papers about the correlation between software metrics and the error-rate but none of them was really what I was looking for. Most of the papers are outdated (late 80s or early 90s).

I think that it would be quite a good idea to start an analysis of current software. In my opinion it should be possible to investigate some populare open source systems. The source code is available and (what I think is much more important) many projects use issue trackers and some kind of version control system. Probably it would be possible to find a strong link between the log of the versioning systems and the issue trackers. This would lead to a very interesting possibility of analyzing the relation between some software metrics and the bug rate.

Maybe there still is a project out there that does exactly what I've described above. Does anybody know about something like that?

We conducted an empirical study about the bug prediction capabilities of the well-known Chidamber and Kemerer object-oriented metrics. It turned out these metrics combined can predict bugs with an accuracy of above 80% when we applied proper machine learning models. If you are interested, you can ready the full study in the following paper:

"Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction. In IEEE Transactions on Software Engineering, Vol. 31, No. 10, October 2005, pages 897-910."

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top