Question

In an existing project I have taken over, I am facing the problem, that when saving my variables to a table or data frame, they are converted automatically to the data type character, as some of the vectors consist of the string "error", whilst others hold a number. Unfortunately, the latter ones are also converted into characters when I create a table.

I have figured out that when I create a data.frame instead of a table, only the columns which contain text are characters, and the rest stay numeric. However, I am facing the problem that some vectors contain more rows than others (a few only hold one argument, others two or three).

What I want to do, is create a data.frame out of all these vectors with the values of the vectors in a single row. For instance, this happens:

x <- 1
y <- c("Error","Error")
data.frame(x,y)

  x y
1 1 Error
2 1 Error

I do not want two rows, but the result I am looking for would be:

x <- 1
y <- t(c("Error","Error"))
data.frame(x,y)

  x    X1    X2
1 1 Error Error

The first thing I thought of was to do:

> x <- 1
> y <- c("Error", "Error")
> newframe <- data.frame(t(c(x,y)))
> class(newframe$X1)
[1] "factor"

But unfortunately, the act of transposing the scalar containing the values of the vectors, as shown by attributes() causes the elements of x to be converted to characters and then to factors when creating the data.frame.

The trouble is, I do not want to apply t() to multi-row vectors by hand, but much rather would have an option to do this automatically. What I have done for now is write a function that takes a list of variable names as inputs and individually transposes each of them. As my list of vectors is quite long, and I have to do this at multiple times throughout the code, I cannot help but feel like there must be a more elegant way to do this - is there?

Was it helpful?

Solution 2

You could do this:

x <- 1
y <- c("Error","Error")
df <- data.frame(c(list(), x, y), stringsAsFactors = FALSE)
> str(df)
'data.frame':   1 obs. of  3 variables:
 $ X1        : num 1
 $ X.Error.  : chr "Error"
 $ X.Error..1: chr "Error"

You just have to set proper column names.

OTHER TIPS

The problem is not linked to data.frame, but simply that you cannot have in the same vector objects of class numeric and objects of class character. It is NOT possible.

The person who started the project before you should not have used the string "Error" to indicate a missing data. Instead, you should use NA :

x=c(1,2)
y=c("Error","Error")
c(x,y) # Here the result is coerced as character automatically by R. There is no way to avoid that.

Instead you should use

c(x,NA) # NA is accepted in a vector of numeric

Note: you should think a data.frame as a list of vectors which are the columns of the data.frame. Hence if you have 2 columns, each column is an independent vector and hence it is possible to have different class per column:

x <- c(1,2)
y <- c("Error","Error")
df=data.frame(x=x,y=y,stringsAsFactors=FALSE)
class(df$x)
class(df$y)

Now if you try to transpose the data.frame, of course the new column vectors will become c(1,"Error") and c(2,"Error") that will be coerced as character as we have seen before.

t(df)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top