have thread blocking with scrapy on windows server

https://stackoverflow.com/questions/22158626

19-10-2022
|

Frage

I get an error for running the following code on windows server

scrapy shell "http://www.yahoo.com"

but I don't have problem with websites that do not redirect to https I think the problem is with thread blocking. could someone help me please

this is the error message that

C:\Documents and Settings\mahyar>scrapy shell "http://www.yahoo.com"
2014-03-03 15:49:38-0600 [scrapy] INFO: Scrapy 0.22.2 started (bot: scrapybot)
2014-03-03 15:49:38-0600 [scrapy] INFO: Optional features available: ssl, http11
2014-03-03 15:49:38-0600 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL
': 0}
2014-03-03 15:49:38-0600 [scrapy] INFO: Enabled extensions: TelnetConsole, Close
Spider, WebService, CoreStats, SpiderState
2014-03-03 15:49:38-0600 [scrapy] INFO: Enabled downloader middlewares: HttpAuth
Middleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, Def
aultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, Redirec
tMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-03-03 15:49:38-0600 [scrapy] INFO: Enabled spider middlewares: HttpErrorMid
dleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddlew
are
2014-03-03 15:49:38-0600 [scrapy] INFO: Enabled item pipelines:
2014-03-03 15:49:38-0600 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:602
3
2014-03-03 15:49:38-0600 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2014-03-03 15:49:38-0600 [default] INFO: Spider opened
2014-03-03 15:49:38-0600 [default] DEBUG: Redirecting (301) to <GET https://www.
yahoo.com/> from <GET http://www.yahoo.com>
Traceback (most recent call last):
  File "c:\Python27\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "c:\Python27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "c:\Python27\lib\site-packages\scrapy\cmdline.py", line 168, in <module>
    execute()
  File "c:\Python27\lib\site-packages\scrapy\cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "c:\Python27\lib\site-packages\scrapy\cmdline.py", line 89, in _run_print
_help
    func(*a, **kw)
  File "c:\Python27\lib\site-packages\scrapy\cmdline.py", line 150, in _run_comm
and
    cmd.run(args, opts)
  File "c:\Python27\lib\site-packages\scrapy\commands\shell.py", line 50, in run

    shell.start(url=url, spider=spider)
  File "c:\Python27\lib\site-packages\scrapy\shell.py", line 45, in start
    self.fetch(url, spider)
  File "c:\Python27\lib\site-packages\scrapy\shell.py", line 90, in fetch
    reactor, self._schedule, request, spider)
  File "c:\Python27\lib\site-packages\twisted\internet\threads.py", line 122, in
 blockingCallFromThread
    result.raiseException()
  File "<string>", line 2, in raiseException
OverflowError: integer 2147486719 does not fit '32-bit int'

Lösung

Looks like you are running a 32 bit version of Windows, and Scrapy requires a 64 bit operating system.

Andere Tipps

This is caused by a pyOpenSSL bug: https://github.com/pyca/cryptography/issues/773

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow