Question

I am curious about the behaviour of transform. Two ways I might try creating a new column as character not as factor:

x <- data.frame(Letters = LETTERS[1:3], Numbers = 1:3)
y <- transform(x, Alphanumeric = as.character(paste(Letters, Numbers)))
x$Alphanumeric = with(x, as.character(paste(Letters, Numbers)))
x
y
str(x$Alphanumeric)
str(y$Alphanumeric)

The results "look" the same:

> x
  Letters Numbers Alphanumeric
1       A       1          A 1
2       B       2          B 2
3       C       3          C 3
> y
  Letters Numbers Alphanumeric
1       A       1          A 1
2       B       2          B 2
3       C       3          C 3

But look inside and only one has worked:

> str(x$Alphanumeric) # did convert to character
 chr [1:3] "A 1" "B 2" "C 3"
> str(y$Alphanumeric) # but transform didn't
 Factor w/ 3 levels "A 1","B 2","C 3": 1 2 3

I didn't find ?transform very useful to explain this behaviour - presumably Alphanumeric was coerced back to being a factor - or find a way to stop it (something like stringsAsFactors = FALSE for data.frame). What is the safest way to do this? Are there similar pitfalls to beware of, for instance with the apply or plyr functions?

Was it helpful?

Solution

This is not so much an issue with transform as much as it is with data.frames, where stringsAsFactors is set, by default, to TRUE. Add an argument that it should be FALSE and you'll be on your way:

y <- transform(x, Alphanumeric = paste(Letters, Numbers),
               stringsAsFactors = FALSE)
str(y)
# 'data.frame': 3 obs. of  3 variables:
#  $ Letters     : Factor w/ 3 levels "A","B","C": 1 2 3
#  $ Numbers     : int  1 2 3
#  $ Alphanumeric: chr  "A 1" "B 2" "C 3"

I generally use within instead of transform, and it seems to not have this problem:

y <- within(x, {
  Alphanumeric = paste(Letters, Numbers)
})
str(y)
# 'data.frame':  3 obs. of  3 variables:
#  $ Letters     : Factor w/ 3 levels "A","B","C": 1 2 3
#  $ Numbers     : int  1 2 3
#  $ Alphanumeric: chr  "A 1" "B 2" "C 3"

This is because it takes an approach similar to your with approach: Create a character vector and add it (via [<-) into the existing data.frame.

You can view the source of each of these by typing transform.data.frame and within.data.frame at the prompt.


As for other pitfalls, that's much too broad of a question. One thing that comes to mind right waya is that apply would create a matrix from a data.frame, so all the columns would be coerced to a single type.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top