Domanda

I have a table like

+------+---------+---------+---------+----------+---------+

| Code | Display | Synonym | Synonym | Synonym  | Synonym |

+------+---------+---------+---------+----------+---------+

|    1 | A       | Cat     | Dog     | Lion     |         |

|    2 | B       | Horse   | Penguin |          |         |

|    3 | C       | Donkey  | Giraffe | Mongoose | Rabbit  |

+------+---------+---------+---------+----------+---------+

I want to output a table like

+------+---------+----------+

| Code | Display | Synonym  |

+------+---------+----------+

|    1 | A       | Cat      |

|    1 | A       | Dog      |

|    1 | A       | Lion     |

|    2 | B       | Horse    |

|    2 | B       | Penguin  |

|    3 | C       | Donkey   |

|    3 | C       | Giraffe  |

|    3 | C       | Mongoose |

|    3 | C       | Rabbit   |

+------+---------+----------+

In other words, I want to pair off Code and Display with each Synonym that is presented, and each Code can have 1 to several synonyms. I've seen examples of reshape used in other contexts, but haven't been able to figure out how to apply it here.

È stato utile?

Soluzione

You can use standard reshaping on a ragged array - with melt() from reshape2, you can use the na.rm argument to remove NAs as you go, otherwise you can do it afterward:

library(reshape2)
dat.m <- melt(dat, id.vars = c("Code", "Display"), value.name = "Synonym", na.rm = TRUE)
#   Code Display  variable  Synonym
#1     1       A   Synonym      Cat
#2     2       B   Synonym    Horse
#3     3       C   Synonym   Donkey
#4     1       A Synonym.1      Dog
#5     2       B Synonym.1  Penguin
#6     3       C Synonym.1  Giraffe
#7     1       A Synonym.2     Lion
#9     3       C Synonym.2 Mongoose
#12    3       C Synonym.3   Rabbit

You can drop the variable column if you like:

dat.m$variable <- NULL

Altri suggerimenti

Here are two base R approaches.

stack

cbind(mydf[1:2], stack(lapply(mydf[-c(1:2)], as.character)))
#    Code Display   values       ind
# 1     1       A      Cat   Synonym
# 2     2       B    Horse   Synonym
# 3     3       C   Donkey   Synonym
# 4     1       A      Dog Synonym.1
# 5     2       B  Penguin Synonym.1
# 6     3       C  Giraffe Synonym.1
# 7     1       A     Lion Synonym.2
# 8     2       B          Synonym.2
# 9     3       C Mongoose Synonym.2
# 10    1       A          Synonym.3
# 11    2       B          Synonym.3
# 12    3       C   Rabbit Synonym.3

reshape

Make life easier by renaming your columns first to a pattern like "Synonym_1", "Synonym_2" and so on. Actually, R likes "Synonym.1", "Synonym.2" and so on better....

A <- grep("Synonym", names(mydf))
names(mydf)[A] <- paste0("Synonym_", seq_along(A))

Now, reshape...

reshape(mydf, direction = "long", varying = A, sep = "_")
#     Code Display time  Synonym id
# 1.1    1       A    1      Cat  1
# 2.1    2       B    1    Horse  2
# 3.1    3       C    1   Donkey  3
# 1.2    1       A    2      Dog  1
# 2.2    2       B    2  Penguin  2
# 3.2    3       C    2  Giraffe  3
# 1.3    1       A    3     Lion  1
# 2.3    2       B    3           2
# 3.3    3       C    3 Mongoose  3
# 1.4    1       A    4           1
# 2.4    2       B    4           2
# 3.4    3       C    4   Rabbit  3

I figured out a maybe indirect way to do this shortly after asking the question:

allergies_output <- reshape(allergies_input,varying=list(grep('Synonym',names(allergies_input),value=TRUE)),direction='long',idvar=c('Code','Display'),v.names='Synonym',names(allergies_input))

This gives some wonky results, but nothing that can't be fixed by dropping some column names.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top