I want to use Tor in getURL function in R. Tor is working (checked in firefox), socks5 at port 9050. But when I set this in R, I get the following error

html <- getURL("http://www.google.com", followlocation = T, .encoding="UTF-8", .opts = list(proxy = "127.0.0.1:9050", timeout=15))

Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : '\n\nTor is not an HTTP Proxy\n\n\n

Tor is not an HTTP Proxy

\n

\nIt appears you have configured your web browser to use Tor as an HTTP proxy.\nThis is not correct: Tor is a SOCKS proxy, not an HTTP proxy.\nPlease configure your client accordingly.

I've tried replace proxy with socks, socks5 but it didn't work.

有帮助吗?

解决方案

There are curl bindings for R, after which you can use curl to call the Tor SOCKS5 proxy server.

The call from the shell (which you can translate to the R binding) is:

curl --socks5-hostname 127.0.0.1:9050 google.com

Tor will do the DNS also for A records.

其他提示

RCurl will default to a HTTP proxy, but Tor provides a SOCKS proxy. Tor is clever enough to understand that the proxy client (RCurl) is trying to use a HTTP proxy, hence the error message in HTML returned by Tor.

In order to get RCurl, and curl, to use a SOCKS proxy, you can use a protocol prefix, and there are two protocol prefixes for SOCKS5: "socks5" and "socks5h" (see the Curl manual). The latter will let the SOCKS server handle DNS-queries, which is the preferred method when using Tor (in fact, Tor will warn you if you let the proxy client resolve the hostname).

Here is a pure R solution which will use Tor for dns-queries.

library(RCurl)
options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050"))
my.handle <- getCurlHandle()
html <- getURL(url='https://www.torproject.org', curl=my.handle)

If you want to specify additional parameters, see below on where to put them:

library(RCurl)
options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050",
                            useragent = "Mozilla",
                            followlocation = TRUE,
                            referer = "",
                            cookiejar = "my.cookies.txt"
                            )
        )
my.handle <- getCurlHandle()
html <- getURL(url='https://www.torproject.org', curl=my.handle)

Hi Naparst I would really appreciate a hint on how to do the solution you propose option should be something like : opts <- list(socks5.hostname="127.0.0.1:9050") (this doesn't work since socks5.hostname is not an option)

Under Mac OSX install Tor Bundle for Mac and Privoxy and then update the proxy settings in the system preferences.

'System preferences' --> 'Wi-FI' --> 'Advanced' --> 'Proxies' --> set 'Web Proxy (HTTP)' Web Proxy Server 127.0.0.1:8118

'System preferences' --> 'Wi-FI' --> 'Advanced' --> 'Proxies' --> set 'Secure Web Proxy (HTTPS)' Secure Web Proxy Server 127.0.0.1:8118 --> 'OK' --> 'Apply'

library(rcurl)
curl <- getCurlHandle()
curlSetOpt(proxy='127.0.0.1:9150',proxytype=5,curl=curl)
html <- getURL(url='check.torproject.com',curl=curl)
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top