I can not use scrapy to crawl this url "http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0"

https://stackoverflow.com/questions/23562054

18-07-2023
|

Frage

I use scrapy to crawl some element in the html of this page => "http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0"

I want to know why and how to solve this problem, please help me.

But I encounter the problem, the log as follow:

2014-05-09 18:08:46+0800 [crawlitemfromshop] DEBUG: Crawled (200) <GET http://s.taobao.com/search?q=%E6%AF%94%E6%9C%88%E6%97%97%E8%88%B0%E5%BA%97&app=shopsearch> (referer: http://www.taobao.com)
2014-05-09 18:08:46+0800 [crawlitemfromshop] DEBUG: Redirecting (302) to <GET http://jump.taobao.com/jump?target=http%3A%2F%2Flist.tmall.com%2Fsearch_shopitem.htm%3Ftbpm%3D1%26user_id%3D753523703%26n%3D60%26s%3D0> from <GET http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0>
2014-05-09 18:08:58+0800 [crawlitemfromshop] DEBUG: Redirecting (302) to <GET http://pass.tmall.com/add?_tb_token_=C78bVoU0RJwQ&cookie2=94d965b75c3fba2ce3164b5f0477a021&t=c72fdb50b3ed76e6f300eedc81959d7e&target=http%3A%2F%2Flist.tmall.com%2Fsearch_shopitem.htm%3Ftbpm%3D1%26user_id%3D753523703%26n%3D60%26s%3D0&pacc=u-98EKfnz4MXPDJOhj0Zfg==&opi=222.128.8.99&tmsc=1399630137646263> from <GET http://jump.taobao.com/jump?target=http%3A%2F%2Flist.tmall.com%2Fsearch_shopitem.htm%3Ftbpm%3D1%26user_id%3D753523703%26n%3D60%26s%3D0>
2014-05-09 18:09:11+0800 [crawlitemfromshop] DEBUG: Redirecting (302) to <GET http://list.tmall.com/search_shopitem.htm?tbpm=1&user_id=753523703&n=60&s=0> from <GET http://pass.tmall.com/add?_tb_token_=C78bVoU0RJwQ&cookie2=94d965b75c3fba2ce3164b5f0477a021&t=c72fdb50b3ed76e6f300eedc81959d7e&target=http%3A%2F%2Flist.tmall.com%2Fsearch_shopitem.htm%3Ftbpm%3D1%26user_id%3D753523703%26n%3D60%26s%3D0&pacc=u-98EKfnz4MXPDJOhj0Zfg==&opi=222.128.8.99&tmsc=1399630137646263>
2014-05-09 18:09:21+0800 [crawlitemfromshop] DEBUG: Redirecting (302) to <GET http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0> from <GET http://list.tmall.com/search_shopitem.htm?tbpm=1&user_id=753523703&n=60&s=0>
2014-05-09 18:09:21+0800 [crawlitemfromshop] DEBUG: Filtered duplicate request: <GET http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0> - no more duplicates will be shown (see DUPEFILTER_CLASS)
2014-05-09 18:09:21+0800 [crawlitemfromshop] INFO: Closing spider (finished)

Lösung

Can you disable the DUPEFILTER_CLASS in settings.py to test again? Scrapy shell command works fine on my end. I can get all item and pricing information scrapy shell 'http://list.tmall.com/search_shopitem.htm?user_id=753523703&n=60&s=0'

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow