Frage

I am analyzing some ChIP-seq data and I was able to retrieve the sequence element associated with each chipped chromosomal region using the genome browser. After parsing and searching for specific motifs, I end up with an output like the following:

head (chr.reg)
 [,1]                      
 [1,] "chr1:181030981-181032670"
 [2,] "chr3:55709147-55709901"  
 [3,] "chr3:119813410-119814934"
 [4,] "chr4:185201060-185205420"
 [5,] "chr4:39610956-39611545"  
 [6,] "chr6:126253238-126253636"

Each of these chromosomal regions contain a transcription factor motif that I am interested in.

My question is the following: Is there a method with which I can retrieve the refseq gene name associated with each of these regions? I tried looking into bioconductor packages but I could not find any or maybe I just overlooked one! would anyone know of a specific package that can help me address this problem?

Thanks in advance :)

War es hilfreich?

Lösung

I believe the answer lies in the ChIPpeakAnno package. Here is a sample code:

  require(ChIPpeakAnno)
  peak <- RangedData(space="chr4", IRanges(39610956, 39611545))#chromosome start, end
  data (TSS.human.GRCh37)
  ap <- annotatePeakInBatch(peak,Annotation=TSS.human.GRCh37 , PeakLocForDistance="end")

The output would look like this:

> ap

RangedData with 1 row and 9 value columns across 1 space
                 space               ranges |        peak      strand
              <factor>            <IRanges> | <character> <character>
1 ENSG00000163683        4 [39610956, 39611545] |           1           -
                      feature start_position end_position insideFeature
                  <character>      <numeric>    <numeric>   <character>
1 ENSG00000163683 ENSG00000163683       39552535     39640513        inside
              distancetoFeature shortestDistance fromOverlappingOrNearest
                      <numeric>        <numeric>              <character>
1 ENSG00000163683             28968            28968             NearestStart

To retrieve refseq or gene symbol for ENSEMBL ids:

require (org.Hs.eg.db)
gene.anno <- select(org.Hs.eg.db, keys= ap$feature,keytype = "ENSEMBL", columns=c("ENSEMBL",        
"SYMBOL"))

The retrieved gene:

> gene.anno
      ENSEMBL     ENTREZID SYMBOL       
1 ENSG00000163683   201895 SMIM14 
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top