Domanda

I want to compare the changes of the source code of two project stages, e.g. the web application source code before it was scalable, and the scalable one.

For me it is interesting to show how many lines needed to be changed, removed or added to get from one to the other stage. I'm searching for a good distance metric that rewards less code and little code changes - the one I imagine would output a relative value:

0%   = "Both projects are the same"
50%  = "Half of the source code has been changed"
100% = "Both projects have nothing in common"

Intentionally I came up with a few solutions:

  1. diff: Maybe concat all files to a single source code file and run a diff against them. Problem here is that less code is better, but with this solution is counted as a plain change therefore punishing code removal.
  2. Levenshtein Distance: Calculates the changes needed to transform source code a to source code b. The result is a number of changes in characters. Problem here again is, that code removal is not rewarded but punished.
  3. Unified Code Count: Sets up rules how to consistently count lines of code, but is no descriptive distance metric between projects.

So I'm searching for a metric that is descriptive, rewards code removal and only counts in code changes or additions. It doesn't have to be source code specific, both projects use the same language. My personal feeling goes into the diff direction but I did not come up with a satisfactory descriptive metric.

What would you propose?

È stato utile?

Soluzione

If you want, you can make this into a really difficult research problem:

http://gate.ac.uk/sale/dd/related-work/tao-related/2007+Kagdi+Survey+for+mining+software+repositories.pdf

This approach: http://www.cs.kent.edu/~jmaletic/papers/icsm04.pdf looks like it's under active development here: http://www.srcml.org/ .

There's other, more general, code metrics tools listed here http://www.aniche.com.br/wp-content/uploads/2013/04/scam2013.pdf (though it looks like the tool advertised by the paper is down now). Apparently Sonar has the ability to look at metrics over time: https://en.wikipedia.org/wiki/SonarQube

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top