I have found the answer here: scrapyd and file (pkgutil.get_data)
Briefly, you have to add register paths to these static files
in setup.py
.
题
I'm trying to deploy my scrapy crawlers
, but the problem is that I have a yaml file
that I'm trying to load from inside the spider
,
this works when the spider is loaded from the shell: scrapy crawl <spider-name>
.
But when the spider is deployed inside scrapyd
, the path to the yaml file must be absolute
.
Is there a way to use a relative path
for the yaml file
, even when spiders are deployed with scrapyd
?
P.S:
The spider
is deployed on scrapyd
with:
scrapyd-deploy default -p <project-name>
curl http://127.0.0.1:6800/schedule.json -d project=<project-name> -d spider=<spider-name>
And the yaml
file is loaded with:
with open('../categories/categories.yaml', 'r') as f:
pass
解决方案 2
I have found the answer here: scrapyd and file (pkgutil.get_data)
Briefly, you have to add register paths to these static files
in setup.py
.
其他提示
Relative paths are relative to the current working directory (the directory where your script was started from). If you want to load a file from a path relative to the current script location you can try the following:
root_dir = os.path.abspath(os.path.dirname(__file__))
yaml_path = os.path.join(root_dir, '../categories/categories.yaml')
with open(yaml_path, 'r') as f:
pass