Problems with URLencode in R

Question 1

@Richie Cotton's solution also solves for #, whereas URLencode() doesn't.

Here's a really simple example

# Useless...
URLencode("hi$there")
[1] "hi$there"

# This is good, but only if special characters are escaped first
URLencode("hi\\$there")
[1] "hi%5C$there"

# This works without escaping!
library(httr)
curlEscape("hi$there")
[1] "hi%24there"

Question 2

URLencode follows the RFC1738 specification (see section 2.2, page 3), which states that:

only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

That is, it doesn't encode plusses or commas or parentheses. So the URL it generate is correct in theory but not in practise.

The GET function in the httr package that Scott mentioned calls curlEscape from RCurl, which encodes these punctuation characters.

(GET calls handle_url which calls modify_url which calls build_url which calls curlEscape.)

The URL it generates is

paste0('http://webbook.nist.gov/cgi/cbook.cgi?Name=', curlEscape(query), '&Units=SI')
## [1] "http://webbook.nist.gov/cgi/cbook.cgi?Name=Poligodial%20%2B%203%2Dmethoxy%2D4%2C5%2Dmethylenedioxyamphetamine%20%28R%2CS%29%20adduct%2C%20%23%201&Units=SI"

This seems to work OK.

httr has nice features and you may want to start using it. The minimal change to your code to get things working is simply to swap URLencode for curlEscape.

Question 3

Does this do what you want?

library(httr)
url <- 'http://webbook.nist.gov/cgi/cbook.cgi'
args <- list(Name = "Poligodial + 3-methoxy-4,5-methylenedioxyamphetamine (R,S) adduct, # 1",
         Units = 'SI')
res <- GET(url, query=args)
content(res)$children$html

Gives

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
  <meta http-equiv="Window-target" content="_top"/>

...etc.