Preferred structure of Python class: subclassing and dependencies

https://stackoverflow.com/questions/23228276

07-07-2023
|

Question

I am constructing a class to handle drawing a figure based on string input. I want to be able to output the figure to both png and svg images. I process the input pattern to create an abstraction of the figure that will be used in constructing both of the output formats.

This abstraction is an array of objects defining shapes, that will be iterated by two separate methods, one that uses their data to construct svg elements using a custom module I wrote for the task, and another that uses the PIL library to create shapes to output to png. There is no problem with either of these methods, the code is all working correctly, and my output is as expected.

My question is one of efficiency, since this code will be run on a server, and will be used to return the image data from an API. As it stands the structure of my code is as follows:

class Pattern:
  def __init__(self, input_pattern):
    self.pattern = input_pattern
    self.data = self.__process()
  def __process(self):
    # processes the input, returning an array
    # of custom shape objects
    # which will be read by the render methods
  def renderSVG(self):
    # uses self.data to render an svg
  def renderPNG(self):
    # uses self.data to render a png

The render methods each have separate dependencies; libraries which must be loaded to do the drawing. Only one of the rendering methods will be called for any given API response, so I would like to avoid loading the dependencies for the other method each time Pattern is instantiated.

Two possible solutions come to mind:

Solution 1. Import the libraries within the render methods.

for example:

class Pattern:
  # initialization etc...
  def renderPNG:
    import PIL
    # use PIL to render png

Is this discouraged? I have always imported modules at the top of the file, and never within it. It seems like this would work, but I am wondering if there are good reasons to avoid doing this, since I rarely see it.

Solution 2. Make subclasses for the different formats.

for example, remove both render methods from the base class, and in a separate file:

from theModuleDescribedAbove import Pattern
import PIL

class PNGPattern(Pattern):
  def __init__(self, input_pattern):
    Pattern.__init__(self, input_pattern)
  def render(self):
    # use PIL to render png

This will work as well, but what is the cost of subclassing like this? The code to process the input within the Pattern class is much longer than either of the rendering methods, but not as large as the PIL module. It must be imported before it can be used, is that going to slow things down significantly more than if it were used from within the same file as in Solution 1?

Which of these methods is likely to be the most efficient? Is there a still better method that I didn't think of? Any suggestions or comments would be greatly appreciated.

Solution

If this is an API on a server, presumably it's part of a web framework and served via WSGI. In which case, you are wrong to think that dependencies are loaded for each request. They are not: a process is instantiated to serve requests, and that will persist for many many requests. Within that process, once modules are imported they stay in memory: subsequent import statements do not cause the code to be loaded again.

So, in summary, you are worrying unnecessarily and this is not a source of inefficiency.

OTHER TIPS

It should be fine to use the solution that imports the modules inline. There is not really any performance penalty associated with this. When an import statement is encountered, python executes all of the top-level code in the module that is referenced. When finished with this, it caches the module in a standard library variable sys.modules. Subsequently, the module is retrieved from sys.modules instead of being re-parsed/re-executed. People tend to favor imports at the top of files mostly for clarity. It makes it easier for people reading the code to see exactly what modules are being used in a given file. As a side note, using local imports can be faster since local variables in python are retrieved more quickly than global variables.

As Daniel mentioned, WSGI apps tend to be served by a number of processes which have been preforked by an application server like Gunicorn or Uwsgi. As long as the processes keep running, they will accumulate any modules which have been imported in sys.modules. These modules won't be cached in sys.modules until they've been imported by the process's bootup code or by view code which is executed when handling a request.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow