Scrapy - Load a yaml file with a relative path inside the spider

https://stackoverflow.com/questions/23196562

06-07-2023
|

문제

I'm trying to deploy my scrapy crawlers, but the problem is that I have a yaml file that I'm trying to load from inside the spider, this works when the spider is loaded from the shell: scrapy crawl <spider-name>. But when the spider is deployed inside scrapyd, the path to the yaml file must be absolute.

Is there a way to use a relative path for the yaml file, even when spiders are deployed with scrapyd?

P.S:
The spider is deployed on scrapyd with:

scrapyd-deploy default -p <project-name>
curl http://127.0.0.1:6800/schedule.json -d project=<project-name> -d spider=<spider-name>

And the yaml file is loaded with:

with open('../categories/categories.yaml', 'r') as f:
    pass

해결책 2

I have found the answer here: scrapyd and file (pkgutil.get_data)

Briefly, you have to add register paths to these static files in setup.py.

다른 팁

Relative paths are relative to the current working directory (the directory where your script was started from). If you want to load a file from a path relative to the current script location you can try the following:

root_dir = os.path.abspath(os.path.dirname(__file__))
yaml_path = os.path.join(root_dir, '../categories/categories.yaml')
with open(yaml_path, 'r') as f:
    pass

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow