Question

I'm sitting with a large dataset and want to get som basic information about my variables, first of all if they are numeric or factor/ordinal.

I'm working with a function, and want, one variable at a time, investigate if it is numeric or a factor.

To make the for loop work I'm using dataset[i] to get to the variable I want.

object<-function(dataset){

    n=ncol(dataset)
    for(i in 1:n){
       variable_name<-names(dataset[i])
       factor<-is.factor(dataset[i])
       rdered<-is.ordered(dataset[i])
       numeric<-is.numeric(dataset[i])
       print(list(variable_name,factor,ordered,numeric))
    }
}

is.ordered My problem is that is.numeric() does not seem to work with dataset[i], all the results becomes "FALSE", but only with dataset$.

Do you have any idea how to solve this?

Was it helpful?

Solution

Try str(dataset) to get summary information on an object, but to solve your problem you need to compeletely extract your data with double square brackets. Single square bracket subsetting keeps the output as a sub-list (or data.frame) rather than extracting the contents:

str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
is.numeric(iris[1])
[1] FALSE
class(iris[1])
[1] "data.frame"
is.numeric(iris[[1]])
[1] TRUE

OTHER TIPS

Assuming that dataset is something like a data.frame, you can do the following (and avoid the loop):

names = sapply(dataset, names) # or simply `colnames(dataset)`
types = sapply(dataset, class)

Then types gives you either numeric or factor. You can then simply do something like this:

is_factor = types == 'factor'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top