Working through the single responsibility principle (SRP) in Python when calls are expensive

https://softwareengineering.stackexchange.com/questions/390266

23-02-2021
|

Question

Some base points:

Python method calls are "expensive" due to its interpreted nature. In theory, if your code is simple enough, breaking down Python code has negative impact besides readability and reuse (which is a big gain for developers, not so much for users).
The single responsibility principle (SRP) keeps code readable, is easier to test and maintain.
The project has a special kind of background where we want readable code, tests, and time performance.

For instance, code like this which invokes several methods (x4) is slower than the following one which is just one.

from operator import add

class Vector:
    def __init__(self,list_of_3):
        self.coordinates = list_of_3

    def move(self,movement):
        self.coordinates = list( map(add, self.coordinates, movement))
        return self.coordinates

    def revert(self):
        self.coordinates = self.coordinates[::-1]
        return self.coordinates

    def get_coordinates(self):
        return self.coordinates

## Operation with one vector
vec3 = Vector([1,2,3])
vec3.move([1,1,1])
vec3.revert()
vec3.get_coordinates()

In comparison to this:

from operator import add

def move_and_revert_and_return(vector,movement):
    return list( map(add, vector, movement) )[::-1]

move_and_revert_and_return([1,2,3],[1,1,1])

If I am to parallelize something such as that, it is pretty objective I lose performance. Mind that is just an example; my project has several mini routines with math such as that - While it is much easier to work with, our profilers are disliking it.

How and where do we embrace the SRP without compromising performance in Python, as its inherent implementation directly impacts it?

Are there workarounds, like some sort of pre-processor that puts things in-line for release?

Or is Python simply poor at handling code breakdown altogether?

Solution

is Python simply poor at handling code breakdown altogether?

Unfortunately yes, Python is slow and there are many anecdotes about people drastically increasing performance by inlining functions and making their code ugly.

There is a work around, Cython, which is a compiled version of Python and much faster.

--Edit I just wanted to address some of the comments and other answers. Although the thrust of them isnt perhaps python specific. but more general optimisation.

Don't optimise untill you have a problem and then look for bottlenecks

Generally good advice. But the assumption is that 'normal' code is usually performant. This isn't always the case. Individual languages and frameworks each have their own idiosyncracies. In this case function calls.
Its only a few milliseconds, other things will be slower

If you are running your code on a powerful desktop computer you probably don't care as long as your single user code executes in a few seconds.

But business code tends to run for multiple users and require more than one machine to support the load. If your code runs twice as fast it means you can have twice the number of users or half the number of machines.

If you own your machines and data centre then you generally have a big chunk of overhead in CPU power. If your code runs a bit slow, you can absorb it, at least until you need to buy a second machine.

In these days of cloud computing where you only use exactly the compute power you require and no more, there is a direct cost for non performant code.

Improving performance can drastically cut the main expense for a cloud based business and performance really should be front and centre.

OTHER TIPS

Many potential performance concerns are not really a problem in practice. The issue you raise may be one of them. In the vernacular, we call worrying about those problems without proof that they are actual problems premature optimization.

If you are writing a front-end for a web service, your performance is not going to be significantly affected by function calls, because the cost of sending data over a network far exceeds the time it takes to make a method call.

If you are writing a tight loop that refreshes a video screen sixty times a second, then it might matter. But at that point, I claim you have larger problems if you're trying to use Python to do that, a job for which Python is probably not well-suited.

As always, the way you find out is to measure. Run a performance profiler or some timers over your code. See if it's a real problem in practice.

The Single Responsibility Principle is not a law or mandate; it is a guideline or principle. Software design is always about trade-offs; there are no absolutes. It is not uncommon to trade off readability and/or maintainability for speed, so you may have to sacrifice SRP on the altar of performance. But don't make that tradeoff unless you know you have a performance problem.

First, some clarifications: Python is a language. There are several different interpreters which can execute code written in the Python language. The reference implementation (CPython) is usually what is being referenced when someone talks about "Python" as if it is an implementation, but it is important to be precise when talking about performance characteristics, as they can differ wildly between implementations.

How and where do we embrace the SRP without compromising performance in Python, as its inherent implementation directly impacts it?

Case 1.) If you have pure Python code (<= Python Language version 3.5, 3.6 has "beta level support") which only relies on pure Python modules, you can embrace SRP everywhere and use PyPy to run it. PyPy (https://morepypy.blogspot.com/2019/03/pypy-v71-released-now-uses-utf-8.html) is a Python interpreter which has a Just in Time Compiler (JIT) and can remove function call overhead as long as it has sufficient time to "warm up" by tracing the executed code (a few seconds IIRC). **

If you are restricted to using the CPython interpreter, you can extract the slow functions into extensions written in C, which will be pre-compiled and not suffer from any interpreter overhead. You can still use SRP everywhere, but your code will be split between Python and C. Whether this is better or worse for maintainability than selectively abandoning SRP but sticking to only Python code depends on your team, but if you have performance critical sections of your code, it will undoubtably be faster than even the most optimized pure Python code interpreted by CPython. Many of Python's fastest mathematical libraries use this method (numpy and scipy IIRC). Which is a nice segue into Case 2...

Case 2.) If you have Python code which uses C extensions (or relies on libraries which use C extensions), PyPy may or may not be useful depending on how they're written. See http://doc.pypy.org/en/latest/extending.html for details, but the summary is that CFFI has minimal overhead while CTypes is slower (using it with PyPy may be even slower than CPython)

Cython (https://cython.org/) is another option which I don't have as much experience with. I mention it for the sake of completeness so my answer can "stand on its own", but don't claim any expertise. From my limited usage, it felt like I had to work harder to get the same speed improvements i could get "for free" with PyPy, and if I needed something better than PyPy, it was just as easy to write my own C extension (which has the benefit if I re-use the code elsewhere or extract part of it into a library, all my code can still run under any Python Interpreter and is not required to be run by Cython).

I'm scared of being "locked into" Cython, whereas any code written for PyPy can run under CPython as well.

** Some extra notes on PyPy in Production

Be very careful about making any choices that have the practical effect of "locking you in" to PyPy in a large codebase. Because some (very popular and useful) third party libraries do not play nice for reasons mentioned earlier, it can cause very difficult decisions later if you realize you need one of those libraries. My experience is primarily in using PyPy to speed up some (but not all) microservices which are performance sensitive in an company environment where it adds negligible complexity to our production environment (we already have multiple languages deployed, some with different major versions like 2.7 vs 3.5 running anyways).

I have found using both PyPy and CPython regularly forced me to write code which only relies on guarantees made by the language specification itself, and not on implementation details which are subject to change at any time. You may find thinking about such details to be an extra burden, but I found it valuable in my professional development, and I think it is "healthy" for the Python ecosystem as a whole.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange