Question

I´m trying to clean the factor variables in a dataframe from trailing spaces. However the levels assignment doesnt work inside my lapply function.

rm.space<-function(x){
    a<-gsub(" ","",x)
    return(a)}


lapply(names(barn),function(x){
    levels(barn[,x])<-rm.space(levels(barn[,x]))
    })

Any ideas how I can assign levels inside a lapply function?

//M

Was it helpful?

Solution

From your code I read that the lapply is used to loop over different variables, not over the levels of the factor. So then you do need some kind of looping structure, but lapply is a bad choice:

  • you loop over a vector -names(barn)- so it's better to use sapply
  • the apply family will return the result from each loop, something you don't want. So you're using memory without purpose.

Anyway, in case you need to assign something to a variable in your global environment within a lapply, you need the <<- operator. Say you need to have a number of variables you selected where the spaces have to be removed:

f <- paste("",letters[1:5])

Df <- data.frame(
    X1 = sample(f,10,r=T),
    X2 = sample(f,10,r=T),
    X3 = sample(f,10,r=T)
    )

# Bad example :   
lapply(c("X1","X3"),function(x){
    levels(Df[,x])<<-gsub(" +","",levels(Df[,x]))
    })

gives

> str(Df)
'data.frame':   10 obs. of  3 variables:
 $ X1: Factor w/ 3 levels "a","b","c": 2 3 1 1 1 2 3 2 2 2
 $ X2: Factor w/ 5 levels " a"," b"," c",..: 4 5 4 2 5 5 1 2 5 3
 $ X3: Factor w/ 5 levels "a","b","c","d",..: 2 3 4 1 4 1 3 3 5 4

Better is to use a for loop :

for( i in c("X1","X3")){
    levels(Df[,i])<-gsub(" +","",levels(Df[,i]))
}

Does what you need without the hassle of the <<- operator and without holding memory unnecessarily.

OTHER TIPS

R is vectorised, you do not need apply():

> f <- as.factor(sample(c("  a", " b", "c", "  d"), 10, replace=TRUE))                                                                                                             
> levels(f)                                                                                                                                                                        
[1] "  a" " b"  "c"   "  d"                                                                                                                                                        
> levels(f) <- gsub(" +", "", levels(f), perl=TRUE)                                                                                                                                
> levels(f)                                                                                                                                                                        
[1] "a" "b" "c" "d"                                                                                                                                                                
> f                                                                                                                                                                                
 [1] d a c b c d d a a a                                                                                                                                                           
Levels: a b c d                                                                                                                                                                    
>

As Joris states lapply works on local copy of data.frame, so it won't modify your original data. But you could use it to replace your data:

barn[] <- lapply(barn, function(x) {
    levels(x) <- rm.space(levels(x))
    x
    })

It is useful when you have different types in data and want to modify only factor's, e.g.:

factors <- sapply(barn, is.factor)
barn[factors] <- lapply(barn[factors], function(x) {
                    levels(x) <- rm.space(levels(x))
                    x
                 })
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top