Question

I'm using the scrape.py library to scrape a website. (library and documentation can be found here http://zesty.ca/scrape/)

There is a a button on the page I want the session to press, but I don't understand exactly how to use the submit function. As I understand I am supposed to give it a region object of a form. The button itself is an input html element. I tried giving it both the form and input, and I get the same error every time.

My code (on google app engine):

s.go(url)
form = s.doc.first(name="form1")
s.submit(region=form)

or

s.go(url)
input = s.doc.first(tagname="input", id="blabla")
s.submit(region=input)

and the error:

ERROR    2011-05-01 23:37:18,673 __init__.py:427] sequence item 0: expected string, NoneType found
Traceback (most recent call last):
  File "\appengine\ext\webapp\__init__.py", line 636, in __call__
    handler.post(*groups)
  File "main.py", line 135, in post
    s.submit(region=form)
  File "scrape.py", line 342, in submit
    return self.go(url, p, redirects)
  File "scrape.py", line 288, in go
    self.cookiejar)
  File "scrape.py", line 176, in fetch
    data = urlencode(data)
  File "scrape.py", line 409, in urlencode
    for key, value in params.items()]
  File "scrape.py", line 405, in urlquote
    return ''.join(map(urlquoted.get, text))
TypeError: sequence item 0: expected string, NoneType found
Was it helpful?

Solution 2

My assupmtion is that it's probably because the button and the form were covered in javascript, so scrape probably couldn't work with that. Need libraries that support JS, like selenium or windmill.

OTHER TIPS

Yes I do know that this is a year old but since I am currently using scrape.py and I know the answer to this question I thought I should add it for those who come after.

The problem is in the submit.

Instead of s.submit(region=form) it should be s.submit(form).

The reason is that the variable form contains something like <Region 1254:1250> so you don't need to tell scrape.py that it's there, it is expected to be there.

So it's probably nothing to do with Javascript.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top