Question

Following a web scrape with RCurl, I've used XML's readHTMLTable and now have a list of 100 dataframes with 40 observations of two variables. I would like to convert this to a single dataframe of 100 rows and 40 columns. The first column in each of the dataframes contains what I would like to become column names in a single dataframe. This is as close as I can get to a MWE (each of the dataframes in my actual list are named NULL):

description <- c("name", "location", "age")
value <- c("mike", "florida", "25")
df1 <- data.frame(description, value)
description <- c("name", "location", "tenure")
value <- c("jim", "new york", "5")
df2 <- data.frame(description, value)
list <- list(df1, df2)

# list output
[[1]]
  description   value
1        name    mike
2    location florida
3         age      25

[[2]]
  description    value
1        name      jim
2    location new york
3      tenure        5

Here is the general output I'm hoping to achieve:

library(reshape2)
listm <- melt(list)
dcast(listm, L1 ~ description)
# dcast output
  L1  age location name tenure
1  1   25  florida mike   <NA>
2  2 <NA> new york  jim      5

My issue, as mentioned above and for which I don't know how to represent via MWE, is the fact that each dataframe is named NULL, and there is accordingly no unique identifier by which to cast the data.

How can I deal with this issue in reshape2 and/or plyr?

Was it helpful?

Solution

You can use rep on the rows of each data.frame in your list to get the L1 column. Then it's straightforward to cast:

# ll is your list of data.frames
ll.df <- cbind(L1 = rep(seq_along(ll), sapply(ll, nrow)), do.call(rbind, ll))

require(reshape2)
dcast(ll.df, L1 ~ description)
  L1  age location name tenure
1  1   25  florida mike   <NA>
2  2 <NA> new york  jim      5
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top