Retrieving GWAS information with R

https://stackoverflow.com/questions/7723873

r
genetics

08-02-2021
|

Question

I am trying to get specific disease-related information from the GWAS catalog. This can be done directly from the website via a spreadsheet download. But I was wondering if I could possibly do it programmatically in R. Any suggestions will be greatly appreciated.

Thanks.

Avoks

Solution

Checkout the function download.file() and the package rcurl (http://cran.r-project.org/web/packages/RCurl/index.html) - this should do what you are looking for

OTHER TIPS

You will have to download .tsv file(s) first and manually edit them. This is because GWAS Catalog files contain HTML symbols, like &#x000A7 in "Behçet's disease" (defining that special fourth letter). The # in these symbols will be interpreted by R as an end of line, thus you will get an error message, e.g.:

line 2028 did not have 34 elements

So you downlad it first, open in plain text editor, automatically replace every # with empty character, and only then load it into R with:

read.table("gwas_catalog_v1.0-associations_e91_r2018-02-21.tsv",sep="\t",h=T,stringsAsFactors = F,quote="")

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow