Question

I would like to exclude a common prefix to all lines from a specific collumn in R dataframe. My input:

   chr

   chr1
   chr2
   chr3
   chr4

And my expected output:

  chr
   1
   2
   3
   4
Was it helpful?

Solution

If those are factors then you need to change the 'level' names rather than working on the item values:

chrdf <- read.table(text="chr1
    chr2
    chr3
    chr4", col.names="chr")
chrdf
#--------
   chr
1 chr1
2 chr2
3 chr3
4 chr4

levels(chrdf$chr) <- gsub("chr", "", levels(chrdf$chr) )
chrdf
  chr
1   1
2   2
3   3
4   4

The 'factor' variable-type is a source of difficulty for many people. I can testify that the difficulty persists for quite a while. I don't ask a lot of questions, but one that I did ask got a lot of upvotes from the rest of the similarly confused (or perhaps the smarter ones were only bemused and sympathetic) SO readership.

OTHER TIPS

This is very easy using gsub:

x <- c("chr1", "chr2", "chr3", "chr4")
gsub("chr", "", x)
# [1] "1" "2" "3" "4"

You may also consider the substring function.

substring(x, 4, nchar(x))
# [1] "1" "2" "3" "4"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top