Domanda

I have a data frame that maps some ids to a list of versions:

id versions
 1  1, 2, 4
 2        1
 3     3, 4

It can be created with the following code:

df <- data.frame(id=c(1, 2, 3), 
  versions=c("1 2 4", "1", "3 4"), 
  stringsAsFactors=F)
df$versions <- strsplit(df$versions, " ")

Notice that each element of the versions column is a list.

How to normalize this data frame? I need to get a data frame like this:

id version
 1       1
 1       2
 1       4
 2       1
 3       3
 3       4
È stato utile?

Soluzione

stack would be perfect for this:

stack(setNames(df$versions, df$id))
#   values ind
# 1      1   1
# 2      2   1
# 3      4   1
# 4      1   2
# 5      3   3
# 6      4   3

Altri suggerimenti

I adapted and simplified the solution from another SO question for future reference:

data.frame(id = rep(df$id, sapply(df$versions, length)),
      version = unlist(df$versions))

The new id column is computed by repeating each id according to the number of versions it has (i.e., the length of the list versions). The new version column is computed using unlist, that just returns a vector by concatenating all elements in the list.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top