Question

I have a data set which includes vectors which are factors

> str(gdp)
'data.frame':   64 obs. of  31 variables:
 $ 1 : Factor w/ 62 levels "","1,145.31",..: 1 1 1 53 16 20 22 24 30 32 ...
 $ 2 : Factor w/ 64 levels "1,121.93","1,264.63",..: 42 59 10 13 18 16 17 23 25 35 ...
 $ 3 : Factor w/ 62 levels "","1,072.07",..: 1 1 1 35 36 39 41 42 45 51 ...
 $ 4 : Factor w/ 62 levels "","1,076.03",..: 1 1 1 15 16 21 23 26 27 36 ...
 $ 5 : Factor w/ 62 levels "","1,023.09",..: 1 1 1 11 15 19 17 23 21 27 ...
 $ 6 : Factor w/ 62 levels "","1,003.81",..: 1 1 1 40 45 46 47 52 56 7 ...
 $ 7 : Factor w/ 62 levels "","1,137.23",..: 1 1 1 13 15 19 21 23 24 28 ...
 $ 8 : Factor w/ 62 levels "","1,198.30",..: 1 1 1 26 31 34 35 39 40 47 ...
 $ 9 : Factor w/ 64 levels "1,114.32","1,519.23",..: 27 30 36 41 49 51 50 54 56 64 ...
 $ 10: Factor w/ 62 levels "","1,208.85",..: 1 1 1 35 39 40 42 45 46 53 ...
 $ 11: Factor w/ 64 levels "","1,089.33",..: 1 11 17 20 23 24 26 29 31 37 ...
 $ 12: Factor w/ 62 levels "","1,037.14",..: 1 1 1 22 23 25 31 30 36 41 ...
 $ 13: Factor w/ 63 levels "","1,114.20",..: 1 63 1 8 11 12 14 20 22 27 ...
 $ 14: Factor w/ 64 levels "1,169.73","1,409.74",..: 63 12 14 16 17 22 24 25 28 30 ...
 $ 15: Factor w/ 62 levels "","1,117.66",..: 1 1 1 33 35 39 40 44 43 53 ...
 $ 16: Factor w/ 63 levels "","1,045.73",..: 21 1 1 30 35 38 41 42 47 50 ...
 $ 17: Factor w/ 62 levels "","1,088.39",..: 1 1 1 24 32 26 34 38 40 48 ...
 $ 18: Factor w/ 62 levels "","1,244.71",..: 1 1 1 24 30 31 33 34 38 44 ...
 $ 19: Factor w/ 62 levels "","1,155.37",..: 1 1 1 25 34 37 38 41 44 48 ...
 $ 20: Factor w/ 64 levels "","1,198.29",..: 1 63 8 11 15 17 18 20 26 30 ...
 $ 21: Factor w/ 36 levels "","1,065.67",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ 22: Factor w/ 64 levels "1,123.06","1,315.12",..: 12 14 15 17 22 23 24 26 27 40 ...
 $ 23: Factor w/ 62 levels "","1,016.31",..: 1 1 1 22 25 31 33 38 43 49 ...
 $ 24: Factor w/ 64 levels "1,029.92","1,133.27",..: 52 53 57 60 6 8 9 12 13 22 ...
 $ 25: Factor w/ 64 levels "1,222.15","1,517.69",..: 60 62 7 8 12 14 15 21 22 25 ...
 $ 26: num  NA NA 1.29 1.32 1.36 1.39 1.43 1.62 1.56 1.72 ...
 $ 27: Factor w/ 62 levels "","1,036.85",..: 1 1 1 12 16 21 22 27 25 33 ...
 $ 28: Factor w/ 61 levels "","1,052.88",..: 1 1 1 12 13 17 18 24 23 26 ...
 $ 29: Factor w/ 64 levels "1,018.62","1,081.27",..: 6 7 8 9 10 26 27 34 35 43 ...
 $ 30: Factor w/ 62 levels "","1,203.92",..: 1 1 1 6 5 21 22 23 24 32 ...
 $ 31: Factor w/ 62 levels "","1,039.85",..: 1 1 1 57 59 9 11 13 14 16 ...

I'm trying to preserve all information (decimal points) and turn all of the vectors into numeric. So far I've tried turning these vectors into characters and then to numeric, which was suggested in SO but I get

> gdp<-data.frame(lapply(gdp,as.character))
> gdp<-data.frame(lapply(gdp,as.numeric))
> str(gdp)
'data.frame':   64 obs. of  31 variables:
 $ X1 : num  1 1 1 53 16 20 22 24 30 32 ...
 $ X2 : num  42 59 10 13 18 16 17 23 25 35 ...
 $ X3 : num  1 1 1 35 36 39 41 42 45 51 ...
 $ X4 : num  1 1 1 15 16 21 23 26 27 36 ...
 $ X5 : num  1 1 1 11 15 19 17 23 21 27 ...
 $ X6 : num  1 1 1 40 45 46 47 52 56 7 ...
 $ X7 : num  1 1 1 13 15 19 21 23 24 28 ...
 $ X8 : num  1 1 1 26 31 34 35 39 40 47 ...
 $ X9 : num  27 30 36 41 49 51 50 54 56 64 ...
 $ X10: num  1 1 1 35 39 40 42 45 46 53 ...
 $ X11: num  1 11 17 20 23 24 26 29 31 37 ...
 $ X12: num  1 1 1 22 23 25 31 30 36 41 ...
 $ X13: num  1 63 1 8 11 12 14 20 22 27 ...
 $ X14: num  63 12 14 16 17 22 24 25 28 30 ...
 $ X15: num  1 1 1 33 35 39 40 44 43 53 ...
 $ X16: num  21 1 1 30 35 38 41 42 47 50 ...
 $ X17: num  1 1 1 24 32 26 34 38 40 48 ...
 $ X18: num  1 1 1 24 30 31 33 34 38 44 ...
 $ X19: num  1 1 1 25 34 37 38 41 44 48 ...
 $ X20: num  1 63 8 11 15 17 18 20 26 30 ...
 $ X21: num  1 1 1 1 1 1 1 1 1 1 ...
 $ X22: num  12 14 15 17 22 23 24 26 27 40 ...
 $ X23: num  1 1 1 22 25 31 33 38 43 49 ...
 $ X24: num  52 53 57 60 6 8 9 12 13 22 ...
 $ X25: num  60 62 7 8 12 14 15 21 22 25 ...
 $ X26: num  NA NA 1 2 3 4 5 7 6 8 ...
 $ X27: num  1 1 1 12 16 21 22 27 25 33 ...
 $ X28: num  1 1 1 12 13 17 18 24 23 26 ...
 $ X29: num  6 7 8 9 10 26 27 34 35 43 ...
 $ X30: num  1 1 1 6 5 21 22 23 24 32 ...
 $ X31: num  1 1 1 57 59 9 11 13 14 16 ...

which does not preserve all the decimal points, and does not fill in the blank as NAs. I've also tried

> gdp<-as.numeric(levels(gdp))[gdp]
Error in as.numeric(levels(gdp))[gdp] : invalid subscript type 'list'

Will there be a way to turn the vectors into numeric?

Was it helpful?

Solution

Let's break this down.

First, because gdp is a data frame, levels will return NULL. You may be looking for the output of levels on each column of gdp. In which case you'd want to use something like lapply.

levels(gdp)
# NULL
lapply(gdp, levels)
# this output will make sense
as.numeric(levels(gdp))[gdp]
# this will make no sense

The error is stating that you cannot use a list (gdp) to subscript a vector.

To iterate through the columns of gdp, you will need something like lapply to work on each component.

gdp <- data.frame(lapply(gdp, function(x) {
    if(!is.factor(x)) x 
    else as.numeric(gsub(",","",levels(x),fixed=TRUE))[x] 
}))

Possibly your data set would be better served as a matrix since it appears to be all of type numeric. In which case:

gdp <- as.matrix(gdp)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top