Question

I'm trying to program a simple web-crawler using the Requests module, and I would like to know how to disable its -default- keep-alive feauture.

I tried using:

s = requests.session()
s.config['keep_alive'] = False

However, I get an error stating that session object has no attribute 'config', I think it was changed with the new version, but i cannot seem to find how to do it in the official documentation.

The truth is when I run the crawler on a specific website, it only gets five pages at most, and then keeps looping around infinitely, so I thought it has something to do with the keep-alive feature!

PS: is Requests a good module for a web-crawler? is there something more adapted?

Thank you !

Was it helpful?

Solution

This works

s = requests.session()
s.keep_alive = False

Answered in the comments of a similar question.

OTHER TIPS

I am not sure but can you try passing {"Connection": "close"} as HTTP headers when sending a GET request using requests. This will close the connection as soon a server returns a response.

>>> headers = {"Connection": "close"}
>>> r = requests.get('https://example.xcom', headers=headers)

As @praveen suggested it's expected from us to use HTTP/1.1 header Connection: close to notify the server that the connection should be closed after completion of the response.

Here is how it's described in RFC 2616:

HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after completion of the response. For example,

Connection: close

in either the request or the response header fields indicates that the connection SHOULD NOT be considered `persistent' (section 8.1) after the current request/response is complete.

HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top