Scrapy - Load a yaml file with a relative path inside the spider

https://stackoverflow.com/questions/23196562

06-07-2023
|

Question

I'm trying to deploy my scrapy crawlers, but the problem is that I have a yaml file that I'm trying to load from inside the spider, this works when the spider is loaded from the shell: scrapy crawl <spider-name>. But when the spider is deployed inside scrapyd, the path to the yaml file must be absolute.

Is there a way to use a relative path for the yaml file, even when spiders are deployed with scrapyd?

P.S:
The spider is deployed on scrapyd with:

scrapyd-deploy default -p <project-name>
curl http://127.0.0.1:6800/schedule.json -d project=<project-name> -d spider=<spider-name>

And the yaml file is loaded with:

with open('../categories/categories.yaml', 'r') as f:
    pass

La solution 2

I have found the answer here: scrapyd and file (pkgutil.get_data)

Briefly, you have to add register paths to these static files in setup.py.

Autres conseils

Relative paths are relative to the current working directory (the directory where your script was started from). If you want to load a file from a path relative to the current script location you can try the following:

root_dir = os.path.abspath(os.path.dirname(__file__))
yaml_path = os.path.join(root_dir, '../categories/categories.yaml')
with open(yaml_path, 'r') as f:
    pass

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow