문제

An annoying problem many chemists are faced with is to convert CAS registry numbers of chemical compounds (stored in some commercial database that is not readily accessible) to Pubchem identifiers (openly available). Pubchem kind of supports conversion between the two, but only through their manual web interface, and not their official PUG REST programmatic interface.

A solution in Ruby is given here, based on the e-utilities interface: http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby/

Does anybody know how this would translate into R?

EDIT: based on the answerbelow, the most elegant solution is:

library(XML)
library(RCurl)

CAStocids=function(query) {
  xmlresponse = xmlParse( getURL(paste("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=",query,sep="") ) )
  cids = sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
  return(cids)
}

> CAStocids("64318-79-2")
[1] "6434870" "5282237"

cheers, Tom

도움이 되었습니까?

해결책

This how the Ruby code does it, translated to R, uses RCurl and XML:

> xmlresponse = xmlParse( getURL("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=64318-79-2") )

and here's how to extract the Id nodes:

> sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
 [1] "6434870" "5282237"

wrap all that in a function....

 convertU = function(query){
    xmlresponse = xmlParse(getURL(
       paste0("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=",query))) 
    sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
 }

> convertU("64318-79-2")
[1] "6434870" "5282237"
> convertU("64318-79-1")
list()
> convertU("64318-78-2")
list()
> convertU("64313-78-2")
[1] "313"

maybe needs a test if not found.

다른 팁

I think you should still be able to convert CAS numbers to PubChem ID's using the PUG where instead of the name of the compound you enter the CAS number. Of course this might not be as specific if the CAS numbers overlap. I haven't tested it.

An example with aspirin https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/50-78-2/cids/JSON

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top