Question

I have several SNP IDs (i.e., rs16828074, rs17232800, etc...), I want to their coordinates in a Hg19 genome from UCSC genome website.

I would prefer using R to accomplish this goal. How to do that?

Was it helpful?

Solution

Here is a solution using the Bioconductor package biomaRt. It is a slightly corrected and reformatted version of the previously posted code.

library(biomaRt) # biomaRt_2.30.0

snp_mart = useMart("ENSEMBL_MART_SNP", dataset="hsapiens_snp")

snp_ids = c("rs16828074", "rs17232800")
snp_attributes = c("refsnp_id", "chr_name", "chrom_start")

snp_locations = getBM(attributes=snp_attributes, filters="snp_filter", 
                      values=snp_ids, mart=snp_mart)

snp_locations
#    refsnp_id chr_name chrom_start
# 1 rs16828074        2   232318754
# 2 rs17232800       18    66292259

Users are encouraged to read the comprehensive biomaRt vignette and experiment with the following biomaRt functions:

listFilters(snp_mart)
listAttributes(snp_mart)
attributePages(snp_mart)
listDatasets(snp_mart)
listMarts()

OTHER TIPS

Via Perl you will find it quite easy to build code to query for SNPs.

There is a web browser GUI tool (HERE) for building perl scripts based on which database and dataset you wish to query using Biomart library.

Instructions

  1. Go to http://www.ensembl.org/biomart/martview/ad23fb5685e6aecb59ab12ce73c89731 (for supported Metazoans), or http://biomart.vectorbase.org/biomart/martview/6e274bc00b3c68a131a6947d02039ade (for up to date Vectors of Malaria, e.g. A. gambiae)
  2. Select the database and dataset: enter image description here

  3. Click on the "perl" button to generate perl code for the Biomart API querying, and copy-paste the code into your perl editor - run it with the SNP rsNumbers of your choice.

# An example script demonstrating the use of BioMart API.
use strict;
use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

my $confFile = "PATH TO YOUR REGISTRY FILE UNDER biomart-perl/conf/."
my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile,'action'=>$action);
my $registry = $initializer->getRegistry;

my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');
    $query->setDataset("hsapiens_snp");
    $query->addAttribute("refsnp_id");
    $query->addAttribute("refsnp_source");
    $query->addAttribute("chr_name");
    $query->addAttribute("chrom_start");
    $query->formatter("TSV");

my $query_runner = BioMart::QueryRunner->new();

############################## GET RESULTS ##########################
$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();
#####################################################################

Using bioconductor's biomaRt R package.

This provides an easy way to send queries to BioMart which fetches information about SNPs given an rsNumber (i.e. rsid).

E.g. to import SNP data for rs16828074 (an rsNumber you listed in the post), use this:

Code:

library(biomaRt)

snp.id <- 'rs16828074'   # an SNP rsNumber like you listed in the post

snp.db <- useMart("snp", dataset="hsapiens_snp")  # select your SNP database

# The SNP data file imported from the HUMAN database:
nt.biomart <- getBM(c("refsnp_id","allele","chr_name","chrom_start",                   
                      "chrom_strand","associated_gene",
                      "ensembl_gene_stable_id"),
                      filters="refsnp",
                      values=snp.id,
                      mart=snp.db)

Let me know how you get on with this (via comments) since I assume some basic coding and package importing ability in my answer here.

Aknowledgement/s:

goes to Jorge Amigo (for his post in Biostars)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top