Referencing custom Python modules and data files
https://softwareengineering.stackexchange.com/questions/375324
-
07-02-2021 - |
Question
I'd want to deploy my Python code and relevant static files such that only a copy of a folder is needed. That is, all the paths inside are relative. The release is to a web server, which calls scripts in a subprocess.
Naturally, everything works in my environment in PyCharm. Problems arise when I subprocess Python and execute scripts from an arbitrary working directory. This can be avoided a few ways:
- Use absolute paths when referencing files
- Use relative imports with submodules
- Append paths to
sys.path
to use relative paths - Change the working directory of the host
- Change project file structure
I have issues with these approaches.
- Absolute paths won't work when the code is moved
- Relative imports only work with submodules, not up the file tree
- Appending to
sys.path
in the beginning of every script is ugly and cumbersome - Having all source files in the root of the project is messy
These might not be the best of reasons to complain, but in my experience, other people have more experience with almost all things, and might know a trick I've yet to encounter.
How would you approach this? (gasp, an opinion!) Many thanks! Any help is greatly appreciated.
Edit: Example file structures
Root\ Root\ Root\
Data\ Data\ Data\
Data.csv Data.csv Data.csv
Text.txt Text.txt Text.txt
Sources\ Sources\
Script.py Script.py Script.py
Module.py Module.py Module.py
Needs_data.py Needs_data.py Needs_data.py
Data_and_module.py Data_and_module.py Data_and_module.py
These are some options I came up with to set up a file structure. Almost all scripts need to import a module and open files in Data
. Ideally the structure would be the first kind, possibly with subfolders.
Solution
What's crucially missing from your files hierarchy is __init__.py
. This is the distinctive differences between a collection of Python scripts/data and a Python package
.
I'm also wondering why you are moving
your code. Coding is not a furniture mover business (moving stuff as is to another place), it's an architecture business (you create the plan, then you create the stuff at that other place).
In the Python world, this is done by creating Python packages, and you deploy that Python package by first
- building a distributable bundle (an egg, a wheel, whatever)
- installing a distributable bundle
Nowadays the modern solution relies on setuptools
and wheels distribution package, but there has been different solutions with different tools in the history of Python.
You will need to understand what is a setup.py
and why you need one, what is a __init__.py
and how to create an actual Python package
in the end you should get something like
Root
data_treatment
data
Data.csv
Text.txt
scripts
Script.py
__init__.py
module.py
Module.py
Needs_data.py
Data_and_module.py
in your setup.py, you will use package_data
or similar directive in order to include your data files in your distributable package, then use that data through pkg_resources
Your scripts might be accessible as console_scripts
entrypoints, defined in your setup.py.
You'll deploy your code to somewhere else by installing it there, be it in a virtualenv or something else, using pip
or not, you choose
But when that structure is like this, relative imports are alright, and it doesn't matter where or how the package was installed, you'll get access to import data_treatment
and all the code and data underneath without having to fiddle with any paths.
OTHER TIPS
Relative imports do work up the tree, as long as you stay within the same package:
#my_package/subpackage1/c.py:
print('hi')
#my_package/subpackage2/d.py:
from ..subpackage1 import c
Relative imports are really the way to go here.
If you add a module in the same directory as my_package
, you can then do:
#run_my_package.py:
from my_package import main
main() #just an example of course
And then, no matter the working directory, you can run that script.