스크레이프베이스 스파이더 : 어떻게 작동합니까?
-
05-07-2019 - |
문제
이것은 스크레이프 튜토리얼의 기본 사례입니다.
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from dmoz.items import DmozItem
class DmozSpider(BaseSpider):
domain_name = "dmoz.org"
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//ul[2]/li')
items = []
for site in sites:
item = DmozItem()
item['title'] = site.select('a/text()').extract()
item['link'] = site.select('a/@href').extract()
item['desc'] = site.select('text()').extract()
items.append(item)
return items
SPIDER = DmozSpider()
프로젝트의 변경 사항으로 복사했습니다.
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item
from firm.items import FirmItem
class Spider1(CrawlSpider):
domain_name = 'wc2'
start_urls = ['http://www.whitecase.com/Attorneys/List.aspx?LastName=A']
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//td[@class="altRow"][1]/a/@href').re('/.a\w+')
items = []
for site in sites:
item = FirmItem
item['school'] = hxs.select('//td[@class="mainColumnTDa"]').re('(JD)(.*?)(\d+)')
items.append(item)
return items
SPIDER = Spider1()
그리고 나는 오류를 얻습니다
[wc2] ERROR: Spider exception caught while processing
<http://www.whitecase.com/Attorneys/List.aspx?LastName=A> (referer: <None>):
[Failure instance: Traceback: <type 'exceptions.TypeError'>:
'ItemMeta' object does not support item assignment
전문가들이 코드를 살펴보고 내가 어디에서 잘못 될지에 대한 단서를 주면 크게 감사드립니다.
고맙습니다
해결책
아마도 당신은 의미합니다 item = FirmItem()
대신에 item = FirmItem
?
제휴하지 않습니다 StackOverflow