Trying to return a specified number of characters from a gene sequence in R
Question
I have a DNA sequence like: cgtcgctgtttgtcaaagtcg....
that is possibly 1000+ letters long.
However, I only want to look at letters 5 to 200, for example, and to define this subset of the string as a new object.
I tried looking at the nchar
function, but haven't found something that would do this.
Solution
OTHER TIPS
Use the substring function:
> tmp.string <- paste(LETTERS, collapse="")
> tmp.string <- substr(tmp.string, 4, 10)
> tmp.string
[1] "DEFGHIJ"
See also the Bioconductor package Biostrings that is a good choice if you need to handle large biological sequences or set of sequences.
#source("http://bioconductor.org/biocLite.R");biocLite("Biostrings")
library(Biostrings)
s <-paste(rep("gtcgctgtttgtcaac",20),collapse="")
d <- DNAString(s)
d[5:200]
as.character(d[5:200])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow