Referencing custom Python modules and data files

https://softwareengineering.stackexchange.com/questions/375324

07-02-2021
|

Question

I'd want to deploy my Python code and relevant static files such that only a copy of a folder is needed. That is, all the paths inside are relative. The release is to a web server, which calls scripts in a subprocess.

Naturally, everything works in my environment in PyCharm. Problems arise when I subprocess Python and execute scripts from an arbitrary working directory. This can be avoided a few ways:

Use absolute paths when referencing files
Use relative imports with submodules
Append paths to sys.path to use relative paths
Change the working directory of the host
Change project file structure

I have issues with these approaches.

Absolute paths won't work when the code is moved
Relative imports only work with submodules, not up the file tree
Appending to sys.path in the beginning of every script is ugly and cumbersome
Having all source files in the root of the project is messy

These might not be the best of reasons to complain, but in my experience, other people have more experience with almost all things, and might know a trick I've yet to encounter.

How would you approach this? (gasp, an opinion!) Many thanks! Any help is greatly appreciated.

Edit: Example file structures

Root\                       Root\                    Root\
    Data\                       Data\                    Data\
        Data.csv                    Data.csv                Data.csv
        Text.txt                    Text.txt                Text.txt
    Sources\                    Sources\
        Script.py                   Script.py            Script.py
        Module.py                   Module.py            Module.py
        Needs_data.py           Needs_data.py            Needs_data.py
        Data_and_module.py      Data_and_module.py       Data_and_module.py

These are some options I came up with to set up a file structure. Almost all scripts need to import a module and open files in Data. Ideally the structure would be the first kind, possibly with subfolders.

Solution

What's crucially missing from your files hierarchy is __init__.py. This is the distinctive differences between a collection of Python scripts/data and a Python package.

I'm also wondering why you are moving your code. Coding is not a furniture mover business (moving stuff as is to another place), it's an architecture business (you create the plan, then you create the stuff at that other place).

In the Python world, this is done by creating Python packages, and you deploy that Python package by first

building a distributable bundle (an egg, a wheel, whatever)
installing a distributable bundle

Nowadays the modern solution relies on setuptools and wheels distribution package, but there has been different solutions with different tools in the history of Python.

You will need to understand what is a setup.py and why you need one, what is a __init__.py and how to create an actual Python package

in the end you should get something like

Root
    data_treatment
        data                                 
            Data.csv       
            Text.txt       
        scripts
            Script.py      
        __init__.py
        module.py
        Module.py      
        Needs_data.py  
        Data_and_module.py

in your setup.py, you will use package_data or similar directive in order to include your data files in your distributable package, then use that data through pkg_resources

Your scripts might be accessible as console_scripts entrypoints, defined in your setup.py.

You'll deploy your code to somewhere else by installing it there, be it in a virtualenv or something else, using pip or not, you choose

But when that structure is like this, relative imports are alright, and it doesn't matter where or how the package was installed, you'll get access to import data_treatment and all the code and data underneath without having to fiddle with any paths.

OTHER TIPS

Relative imports do work up the tree, as long as you stay within the same package:

#my_package/subpackage1/c.py:
print('hi')

#my_package/subpackage2/d.py:
from ..subpackage1 import c

Relative imports are really the way to go here.

If you add a module in the same directory as my_package, you can then do:

#run_my_package.py:
from my_package import main

main() #just an example of course

And then, no matter the working directory, you can run that script.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange