Question

I should calculate how many A letters are in the chromosome.txt file: http://users.utu.fi/jjahol/chromosome.txt

So far I've managed to code this:

cromo <- read.table("http://users.utu.fi/jjahol/chromosome.txt", header=FALSE)
cromo2 <- as.character(unlist(cromo))

This code creates a vector of 1000 elements in which the elements are 60 characters long. How can I convert this to vector where one element equals one character?

Was it helpful?

Solution 2

This should give you the desired result:

cromo <- read.table("http://users.utu.fi/jjahol/chromosome.txt", header=FALSE)
cromo2 <- unlist(strsplit(as.character(cromo$V1),""))
table(cromo2)

Which gives you:

    A     C     G     T 
15520 13843 14215 16422

OTHER TIPS

This is a somewhat unorthodox approach (and unlist(strsplit(...)) would be very fast anyway), but you can use one of the string searching packages that offer vectorized search pattern options, like "stringi":

## Read the data in. Since it's not a data.frame, just use readLines
X <- readLines("http://users.utu.fi/jjahol/chromosome.txt")

## Paste the lines together into a single block of text
Y <- paste(X, collapse = "")

library(stringi)
Strings <- c("A", "C", "G", "T")
stri_count_fixed(Y, Strings)
# [1] 15520 13843 14215 16422

## Named output....
setNames(stri_count_fixed(Y, Strings), Strings)
#     A     C     G     T 
# 15520 13843 14215 16422 

strsplit does this:

> strsplit('text', '')
[[1]]
[1] "t" "e" "x" "t"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top