counting identifiers and operators as code size metric

https://softwareengineering.stackexchange.com/questions/415595

15-03-2021
|

题

I'm looking for a code metric for monitor and track over time the size of several projects and their components.

Also, I would like to use it for:

evaluate size reduction after refactoring
compare the size/length of two implementations of the same specification, even across languages.

I know there are cyclomatic complexity and ABC metrics for complexity, but in addition to that I want a separate metric about the length/size/volume/extension of some code regardless of their complexity.

Being aware of the advantages and disadvantages of SLOC, I wouldn't use it for these purposes, mainly because I'm trying to measure code that is in different styles or languages.

For example this method body has 3 SLOC:

  public static String threeLines(String arg1) {
    String var1 = arg1 + " is";
    String var2 = var1 + " something";
    return var2;
  }

Also this one:

  public String otherThreeLines(String arg1) {
    IntStream stream1 = Arrays.stream(arg1.split(";"))
      .sequential()
      .map(s -> s.replaceAll("[element", ""))
      .map(s2 -> s2.replaceAll("]", ""))
      .mapToInt(Integer::parseInt);

    double var1 = stream1.mapToDouble(Double::new).map(d -> d / 2).sum();

    return String.valueOf(var1);
  }

Clearly, the second one is "bigger" or "longer", has more to read and think about, so I would like it to have a higher value in the metric.

There is no aim to evaluate if some piece of code is good or bad because of this metric, it's just for statistical analysis.

It would also be nice if it were simple to implement, without the need to fully parse file language.

So, I'm thinking of counting identifiers, keywords, and operators. for example this fragment

String var2 = var1 + " something";

could be analyzed as [String] [var2] [=] [var1] [+] [" something"]; and have a score of 6

And this fragment from the second method:

double var1 = stream1.mapToDouble(Double::new).map(d -> d / 2).sum();

could be analyzed as [double] [var1] [=] [stream1].[mapToDouble]([Double]::[new]).[map]([d] [->] [d] [/] [2]).[sum()]; and receive a score of 14

So the size/length of the second one should be roughly 2x of the first one.

Are there any known code metrics that would show similar results?

解决方案

Indeed, the SLOC is used for simplicity and does not fully express the real complexity of code.

Use of increment, ternary or comma operators, or multiple statements are a phenomenon that significantly impacts the SLOC count (in one direction), whereas comments, clean functional decomposition and access control, also have an influence (in the other direction). Moreover, the design effort behind nice reusable OO code is not the same as copy/pasting pieces of code in several places.

So, whatever measurement you use, it will never be fully representative of the effort and will always have some bias. Even more if such indicators are promoted to performance indicators.

This being said, and remaining aware of their limitations, indicators can help:

The cyclomatic complexity goes further than your counting approach, since it also analyses the complexity of the control flow in a function/module.
OOP has significantly affected the relevance of such analysis. Chidamber&Kemerer’s CK set of metrics could therefore be an interesting start: it uses methods per class, depth of inheritance tree, coupling between classes and similar factors that express some facets of the complexity of OOP deliverables.
another set of metrics is the MOOD which also analyses inheritance, coupling and polymorphism, but also encapsulation

Most of these methods calculate size-independent indicators. But these are based on some measurable quantities which could be if interest for you.

许可以下： CC-BY-SA 和归因

不隶属于 softwareengineering.stackexchange