Question

I have a table like

+------+---------+---------+---------+----------+---------+

| Code | Display | Synonym | Synonym | Synonym  | Synonym |

+------+---------+---------+---------+----------+---------+

|    1 | A       | Cat     | Dog     | Lion     |         |

|    2 | B       | Horse   | Penguin |          |         |

|    3 | C       | Donkey  | Giraffe | Mongoose | Rabbit  |

+------+---------+---------+---------+----------+---------+

I want to output a table like

+------+---------+----------+

| Code | Display | Synonym  |

+------+---------+----------+

|    1 | A       | Cat      |

|    1 | A       | Dog      |

|    1 | A       | Lion     |

|    2 | B       | Horse    |

|    2 | B       | Penguin  |

|    3 | C       | Donkey   |

|    3 | C       | Giraffe  |

|    3 | C       | Mongoose |

|    3 | C       | Rabbit   |

+------+---------+----------+

In other words, I want to pair off Code and Display with each Synonym that is presented, and each Code can have 1 to several synonyms. I've seen examples of reshape used in other contexts, but haven't been able to figure out how to apply it here.

Was it helpful?

Solution

You can use standard reshaping on a ragged array - with melt() from reshape2, you can use the na.rm argument to remove NAs as you go, otherwise you can do it afterward:

library(reshape2)
dat.m <- melt(dat, id.vars = c("Code", "Display"), value.name = "Synonym", na.rm = TRUE)
#   Code Display  variable  Synonym
#1     1       A   Synonym      Cat
#2     2       B   Synonym    Horse
#3     3       C   Synonym   Donkey
#4     1       A Synonym.1      Dog
#5     2       B Synonym.1  Penguin
#6     3       C Synonym.1  Giraffe
#7     1       A Synonym.2     Lion
#9     3       C Synonym.2 Mongoose
#12    3       C Synonym.3   Rabbit

You can drop the variable column if you like:

dat.m$variable <- NULL

OTHER TIPS

Here are two base R approaches.

stack

cbind(mydf[1:2], stack(lapply(mydf[-c(1:2)], as.character)))
#    Code Display   values       ind
# 1     1       A      Cat   Synonym
# 2     2       B    Horse   Synonym
# 3     3       C   Donkey   Synonym
# 4     1       A      Dog Synonym.1
# 5     2       B  Penguin Synonym.1
# 6     3       C  Giraffe Synonym.1
# 7     1       A     Lion Synonym.2
# 8     2       B          Synonym.2
# 9     3       C Mongoose Synonym.2
# 10    1       A          Synonym.3
# 11    2       B          Synonym.3
# 12    3       C   Rabbit Synonym.3

reshape

Make life easier by renaming your columns first to a pattern like "Synonym_1", "Synonym_2" and so on. Actually, R likes "Synonym.1", "Synonym.2" and so on better....

A <- grep("Synonym", names(mydf))
names(mydf)[A] <- paste0("Synonym_", seq_along(A))

Now, reshape...

reshape(mydf, direction = "long", varying = A, sep = "_")
#     Code Display time  Synonym id
# 1.1    1       A    1      Cat  1
# 2.1    2       B    1    Horse  2
# 3.1    3       C    1   Donkey  3
# 1.2    1       A    2      Dog  1
# 2.2    2       B    2  Penguin  2
# 3.2    3       C    2  Giraffe  3
# 1.3    1       A    3     Lion  1
# 2.3    2       B    3           2
# 3.3    3       C    3 Mongoose  3
# 1.4    1       A    4           1
# 2.4    2       B    4           2
# 3.4    3       C    4   Rabbit  3

I figured out a maybe indirect way to do this shortly after asking the question:

allergies_output <- reshape(allergies_input,varying=list(grep('Synonym',names(allergies_input),value=TRUE)),direction='long',idvar=c('Code','Display'),v.names='Synonym',names(allergies_input))

This gives some wonky results, but nothing that can't be fixed by dropping some column names.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top