Pergunta

In Windows cURL I can post a web request similar to this:

curl  --dump-header cook.txt ^
  --data "RURL=http=//www.example.com/r&user=bob&password=hello" ^
  --user-agent  "Mozilla/5.0"  ^
  http://www.example.com/login

With type cook.txt I get a response similar to this:

HTTP/1.1 302 Found                                                 
Date: Thu, ******
Server: Microsoft-IIS/6.0                                          
SERVER: ******                                                  
X-Powered-By: ASP.NET                                              
X-AspNet-Version: 1.1.4322                                         
Location: ******
Set-Cookie: Cookie1=; domain=******; expires=****** ******
******
******
Cache-Control: private                                             
Content-Type: text/html; charset=iso-8859-1                        
Content-Length: 189

I can manually read cookie lines like: Set-Cookie: AuthCode=ABC... (I could script this of course). So I can use AuthCode for subsequent requests.

I am trying do the same in R with RCurl and/or httr (still don't know which one is better for my task).

When I try:

library(httr)

POST("http://www.example.com/login",
     body= list(RURL="http=//www.example.com/r",
                user="bob", password="hello"),
     user_agent("Mozilla/5.0"))  

I get a response similar to this:

Response [http://www.example.com/error]
  Status: 411
  Content-type: text/html
<h1>Length Required</h1> 

By and large I know about 411-error and I could try to fix the request; but I do not get it in cURL, so I am doing something wrong with the POST command.

Can you help me in translating my cURL command to RCurl and/or httr?

Foi útil?

Solução 2

Based on Juba suggestion, here is a working RCurl template.

The code emulates a browser behaviour, as it:

  1. retrieves cookies on a login screen and
  2. reuses them on the following page requests containing the actual data.


### RCurl login and browse private pages ###

library("RCurl")

loginurl ="http=//www.*****"
mainurl  ="http=//www.*****"
agent    ="Mozilla/5.0"

#User account data and other login pars
pars=list(
     RURL="http=//www.*****",
     Username="*****",
     Password="*****"
)

#RCurl pars     
curl = getCurlHandle()
curlSetOpt(cookiejar="cookiesk.txt",  useragent = agent, followlocation = TRUE, curl=curl)
#or simply
#curlSetOpt(cookiejar="", useragent = agent, followlocation = TRUE, curl=curl)

#post login form
web=postForm(loginurl, .params = pars, curl=curl)

#go to main url with real data
web=getURL(mainurl, curl=curl)

#parse/print content of web
#..... etc. etc.


#This has the side effect of saving cookie data to the cookiejar file 
rm(curl)
gc()

Outras dicas

httr automatically preserves cookies across calls to the same site, as illustrated by these two calls to http://httpbin.org

GET("http://httpbin.org/cookies/set?a=1")
# Response [http://httpbin.org/cookies]
#   Status: 200
#   Content-type: application/json
# {
#    "cookies": {
#     "a": "1"
#   }
# } 

GET("http://httpbin.org/cookies")
# Response [http://httpbin.org/cookies]
#   Status: 200
#   Content-type: application/json
# {
#   "cookies": {
#     "a": "1"
#   }
# } 

Perhaps the problem is that you're sending your data as application/x-www-form-urlencoded, but the default in httr is multipart/form-data, so use multipart = FALSE in your POST call.

Here is a way to create a post request, keep and reuse the resulting cookies with RCurl, for example to get web pages when authentication is required :

library(RCurl)
curl <- getCurlHandle()
curlSetOpt(cookiejar="/tmp/cookies.txt", curl=curl)
postForm("http://example.com/login", login="mylogin", passwd="mypasswd", curl=curl)
getURL("http://example.com/anotherpage", curl=curl)
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top