Significant difference between functional and procedural collection handling [closed]

https://softwareengineering.stackexchange.com/questions/401882

05-03-2021
|

Question

I'm planning an empirical study on function passing - specifically lambdas aka anonymous functions aka arrow functions. Now, although functional or even object-oriented approaches are highly favored over procedural/imperative programming nowadays, there seems to be very little empirical evidence for their superiority. Even though many different claims exist on why you're better off with higher-order programming¹, it's hard for me to construct cases that pose a chance for statistically significant differences.

The code is more expressive, telling what to do and not how

Claims like these are nice from a subjective and aesthetic point, but they'd have to map to empirical differences in productivity or maintainability in order to have leverage.

Currently I'm focused on Java's Stream API since it brought a major shift in how you write Java. For the industry this was accompanied by big rewrites, demand for employee training and updates to IDEs without much evidence that it's far better than what we had before.

Here's an example of what I mean where in my opinion the lambda-based implementation likely won't yield better results - even taking into account that participants would have to know the Stream API on top of the language API.

// loop-based
public int inactiveSalaryTotal(List<Employee> employees) {
    int total = 0;
    for (Employee employee : employees) {
        if (!employee.isActive()) {
            total += employee.getSalary();
        }
    }
    return total;
}

// lambda-based
public int inactiveSalaryTotal(List<Employee> employees) {
    return employees.stream()
                   .filter(e -> !e.isActive())
                   .mapToInt(Employee::getSalary)
                   .sum();
}

I personally rather suspect advantages regarding collection of streams but I'm doubting that the average Java developer knows the API surface well enough to not get stranded for certain tasks.

// loop-based
public Map<String, Double> averageSalaryByPosition(List<Employee> employees) {
    Map<String, List<Employee>> groups = new HashMap<>();
    for (Employee employee : employees) {
        String position = employee.getPosition();
        if (groups.containsKey(position)) {
            groups.get(position).add(employee);
        } else {
            List<Employee> group = new ArrayList<>();
            group.add(employee);
            groups.put(position, group);
        }
    }
    Map<String, Double> averages = new HashMap<>();
    for (Map.Entry<String, List<Employee>> group : groups.entrySet()) {
        double sum = 0;
        List<Employee> groupEmployees = group.getValue();
        for (Employee employee : groupEmployees) {
            sum += employee.getSalary();
        }
        averages.put(group.getKey(), sum / groupEmployees.size());
    }
    return averages;
}

// lambda-based
public Map<String, Double> averageSalaryByPosition(List<Employee> employees) {
    return employees.stream().collect(
            groupingBy(Employee::getPosition, averagingInt(Employee::getSalary))
    );
}

Specific Question

Can you construct an exemplary case where lambda-based collection handling outperforms procedural handling (looping) by much in regard to either comprehension time, writing time, ease of change or time it takes to perform a specific task like fixing a bug as well as counting certain calls or parameters. I'm also very much interested in how you think it performs better, because least LOC isn't really what I'm looking for here but rather something that can be measured in time - eventually with the average Java developer. Outperforming is also explicitly not meant in regard to runtime performance.

Examples can be pseudo-code or any language supporting both paradigms such as Java, JavaScript, Scala, C#, Kotlin.

¹ with the main focus on function passing, I'm assuming OOP and FP to be somewhat isomorphic for this purpose (type systems aside) since you could view an object as just a tuple of functions. With objects being able to accept and return other objects you've basically got higher-order functions

Solution

least LOC isn't really what I'm looking for here

But, why? Least LOC is what you should be looking for here. While lines-of-code does not make for a truly reliable maintainability measure, you will be hard pressed to disagree that it takes you more time to read 20 lines over 5 lines. Programming is not only about writing. It is also about reading.

Let me give you a very draft empirical way of judging the usefulness of "arrows".

Here is your code, copied almost verbatim, but with a minor change that probably makes all the difference in the world. This is what you will use to find out how useful "arrow-style" methods are in the end. This is your experimental material!

// loop-based
public Map<String, Double> whatDoesThisFunctionDo(List<Employee> employees) {
    Map<String, List<Employee>> groups = new HashMap<>();
    for (Employee employee : employees) {
        String position = employee.getPosition();
        if (groups.containsKey(position)) {
            groups.get(position).add(employee);
        } else {
            List<Employee> group = new ArrayList<>();
            group.add(employee);
            groups.put(position, group);
        }
    }
    Map<String, Double> averages = new HashMap<>();
    for (Map.Entry<String, List<Employee>> group : groups.entrySet()) {
        double sum = 0;
        List<Employee> groupEmployees = group.getValue();
        for (Employee employee : groupEmployees) {
            sum += employee.getSalary();
        }
        averages.put(group.getKey(), sum / groupEmployees.size());
    }
    return averages;
}

// lambda-based
public Map<String, Double> whatDoesThisFunctionDo(List<Employee> employees) {
    return employees.stream().collect(
            groupingBy(Employee::getPosition, averagingInt(Employee::getSalary))
    );
}

Show the first function to some people and measure the average time it takes them to understand the intended meaning. Then show the second function to some other (obviously) people, again measuring the average time to total comprehension. Ask them to simply produce a suitable name for the function. Compare the results. Calculate the comprehension improvement ratio. Maybe you can get some statistically significant differences there!

The code is more expressive, telling what to do and not how

Yes, it definitely is!

Claims like these are nice from a subjective and aesthetic point, but they'd have to map to empirical differences in productivity or maintainability in order to have leverage.

Well, apply the ratio obtained above (which is just empirical, based on the specific functions you showed to the coders) in the case of having a class with 5 methods, or 10 methods. Differences will tend to become significant pretty quickly!

From the outside world, I read the name of the functions you posted and I "get the point". But this should not distract you or anyone from the fact that someone, somewhere, will have to "get the point" not from the name, but from the content. Someone will also have to produce the content. Someone will have to review the content. There is nothing aesthetic or subjective about less effort typing, less effort thinking, less effort reviewing, less room for bugs, less training effort for newcomers, etc. These things directly translate to productivity and, as a result, correlate inversely with spent resources!

I understand this may not be the answer you are looking for, but, IMHO, there is objective, nontrivial and measurable value in keeping it short, simple and expressive.

OTHER TIPS

You have already accepted and answer and this is a bit of a side-note but I have never been really happy with the streams API in Java. The design is very OO-centric and (ironically) somewhat procedural which can lead to very ugly (and unreadable) code IMO. It's unfortunate since the features around function references that were introduced at the same time can avoid a lot of that noise. For that same reason, it's pretty easy to create reusable functions that clean things up. Here's how the first example looks once you introduce this:

public int inactiveSalaryTotal(List<Employee> employees) {
  return sum(map(filter(employees, Employee::isActive), Employee::getSalary);
}

This gives you something that is a bit more like how things are done in a true functional language. It's more concise and, I think, understandable. There are various ways these can be defined depending on your needs. I can provide some example implementations if needed.

As a general note, it would be incorrect to say that the "functional" code (or the code using Java streams) is always better than the "imperative" code (or the code using Java loops). In some cases it is much better, but in some cases it can make the code less readable. For a detailed analysis, see Item 45 (Use streams judiciously) in the third edition of the "Effective Java" book.

That being said, it is easy to find cases where your first stream example could be easily improved, while your first loop example couldn't. Imagine that you also need a function that calculates the total salary of the blue-haired employees, and similar functions for green-haired, orange-haired, etc. employees. In the imperative version you would end up with a copy-paste nightmare, but in the functional version you could just pass the filtering predicate as an additional function argument.

public int salaryTotal(List<Employee> employees, Predicate<Employee> condition) {
    return employees.stream()
                    .filter(condition)
                    .mapToInt(Employee::getSalary)
                    .sum();
}

Of course you could pass the Predicate to the procedural/imperative method (as a popular comment pointed out), but then it would become functional (a higher-order function). The essence of functional programming is to find declarative, readable ways to combine small, obviously correct functions into one complex functionality, and the Stream API is just one way to achieve this.

Another example would be the handling of future events. Imagine that you want to process all employees who will employed in the future: a nice unknown and potentially infinite list of events. They can't be iterated in a loop, but they still can be processed in a declarative way using a reactive stream library. The same would be very unreadable using imperative code.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange