Retrieving GWAS information with R

https://stackoverflow.com/questions/7723873

r
genetics

08-02-2021
|

문제

I am trying to get specific disease-related information from the GWAS catalog. This can be done directly from the website via a spreadsheet download. But I was wondering if I could possibly do it programmatically in R. Any suggestions will be greatly appreciated.

Thanks.

Avoks

해결책

Checkout the function download.file() and the package rcurl (http://cran.r-project.org/web/packages/RCurl/index.html) - this should do what you are looking for

다른 팁

You will have to download .tsv file(s) first and manually edit them. This is because GWAS Catalog files contain HTML symbols, like &#x000A7 in "Behçet's disease" (defining that special fourth letter). The # in these symbols will be interpreted by R as an end of line, thus you will get an error message, e.g.:

line 2028 did not have 34 elements

So you downlad it first, open in plain text editor, automatically replace every # with empty character, and only then load it into R with:

read.table("gwas_catalog_v1.0-associations_e91_r2018-02-21.tsv",sep="\t",h=T,stringsAsFactors = F,quote="")

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow