Question

I have a data frame where one particular column has a set of specific values (let's say, 1, 2, ..., 23). What I would like to do is to convert from this layout to the one, where the frame would have extra 23 (in this case) columns, each one representing one of the factor values. The data in these columns would be booleans indicating whether a particular row had a given factor value... To show a specific example:

Source frame:

ID       DATE         SECTOR
123      2008-01-01   1
456      2008-01-01   3
789      2008-01-02   5
... <more records with SECTOR values from 1 to 5>

Desired format:

ID       DATE         SECTOR.1   SECTOR.2   SECTOR.3   SECTOR.4   SECTOR.5
123      2008-01-01      T          F          F          F          F
456      2008-01-01      F          F          T          F          F
789      2008-01-02      F          F          F          F          T

I have no problem doing it in a loop but I hoped there would be a better way. So far reshape() didn't yield the desired result. Help would be much appreciated.

Was it helpful?

Solution

I would try to bind another column called "value" and set value = TRUE.

df <- data.frame(cbind(1:10, 2:11, 1:3))
colnames(df) <- c("ID","DATE","SECTOR")
df <- data.frame(df, value=TRUE)

Then do a reshape:

reshape(df, idvar=c("ID","DATE"), timevar="SECTOR", direction="wide")

The problem with using the reshape function is that the default for missing values is NA (in which case you will have to iterate and replace them with FALSE).

Otherwise you can use cast out of the reshape package (see this question for an example), and set the default to FALSE.

df.wide <- cast(df, ID + DATE ~ SECTOR, fill=FALSE)
> df.wide 
   ID DATE     1     2     3
1   1    2  TRUE FALSE FALSE
2   2    3 FALSE  TRUE FALSE
3   3    4 FALSE FALSE  TRUE
4   4    5  TRUE FALSE FALSE
5   5    6 FALSE  TRUE FALSE
6   6    7 FALSE FALSE  TRUE
7   7    8  TRUE FALSE FALSE
8   8    9 FALSE  TRUE FALSE
9   9   10 FALSE FALSE  TRUE
10 10   11  TRUE FALSE FALSE

OTHER TIPS

Here's another approach using xtabs which may or may not be faster (if someone would try and let me know):

df <- data.frame(cbind(1:12, 2:13, 1:3))
colnames(df) <- c("ID","DATE","SECTOR")
foo <- xtabs(~ paste(ID, DATE) + SECTOR, df)
cbind(t(matrix(as.numeric(unlist(strsplit(rownames(foo), " "))), nrow=2)), foo)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top