Question

An annoying problem many chemists are faced with is to convert CAS registry numbers of chemical compounds (stored in some commercial database that is not readily accessible) to Pubchem identifiers (openly available). Pubchem kind of supports conversion between the two, but only through their manual web interface, and not their official PUG REST programmatic interface.

A solution in Ruby is given here, based on the e-utilities interface: http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby/

Does anybody know how this would translate into R?

EDIT: based on the answerbelow, the most elegant solution is:

library(XML)
library(RCurl)

CAStocids=function(query) {
  xmlresponse = xmlParse( getURL(paste("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=",query,sep="") ) )
  cids = sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
  return(cids)
}

> CAStocids("64318-79-2")
[1] "6434870" "5282237"

cheers, Tom

Was it helpful?

Solution

This how the Ruby code does it, translated to R, uses RCurl and XML:

> xmlresponse = xmlParse( getURL("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=64318-79-2") )

and here's how to extract the Id nodes:

> sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
 [1] "6434870" "5282237"

wrap all that in a function....

 convertU = function(query){
    xmlresponse = xmlParse(getURL(
       paste0("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=",query))) 
    sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
 }

> convertU("64318-79-2")
[1] "6434870" "5282237"
> convertU("64318-79-1")
list()
> convertU("64318-78-2")
list()
> convertU("64313-78-2")
[1] "313"

maybe needs a test if not found.

OTHER TIPS

I think you should still be able to convert CAS numbers to PubChem ID's using the PUG where instead of the name of the compound you enter the CAS number. Of course this might not be as specific if the CAS numbers overlap. I haven't tested it.

An example with aspirin https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/50-78-2/cids/JSON

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top