Question

I have a data frame with 10k rows and 500 columns. For each column, I want to create a count for each unique value in the row. E.g.

      Fruit    Vegetable  Meat 
1     Apple    Carrot     Steak
2     Apple    Potato     Chicken
3     Pear     Peas       Duck

Would produce:

Fruit;Apple;2;Pear;1
Vegetable;Carrot;1;Potato;1;Peas;1
Meat;Steak;1;Chicken;1;Duck;1

The Hmisc describe function produces this kind of analysis, but the output is so badly formatted as to be useless.

Thanks.

Was it helpful?

Solution

lapply(names(df),function(x){ tb <- table(df[[x]]);
      write.table(file="test.csv", append=TRUE, quote=FALSE, 
                  row.names=FALSE, col.names=FALSE, sep=";", 
                  x= paste(x, paste( names(tb), tb, collapse=";", sep=";") , 
                           sep=";") 
                  )})
#--------
fruit;Apple;2;Pear;1
veg;Carrot;1;Peas;1;Potato;1
meat;Chicken;1;Duck;1;Steak;1

You will also see a list of three NULLs which would not be sent to a text file. Writing tables and matrices to files is not a strong point of R. There is a write.matrix function in package::MASS. My initial effort with writeLines failed because it has no 'append' option and I wasn't able to cobble together a connection call that would do the append.

(The other gotcha' in R is that processing a list (and by inheritance a dataframe) with 'apply/lapply/sapply' does not pass the names of the list-element (and colnames for dataframes) to the function, so "write" functions would not have the names internally for writing to a file. That is why I worked with names(df) rather than just df.

As a further note, there are probably JSON-writing functions out there and they might be more reliable. I'll take a look and report back.

There is the RJSONIO package:

> require(RJSONIO)
Loading required package: RJSONIO
> toJSON(df)
[1] "{\n \"fruit\": [ \"Apple\", \"Apple\", \"Pear\" ],\n\"veg\": [ \"Carrot\", \"Potato\", \"Peas\" ],\n\"meat\": [ \"Steak\", \"Chicken\", \"Duck\" ] \n}"

OTHER TIPS

You could run the table function through apply

For instance

fruit <- c("Apple", "Apple", "Pear")
veg <- c("Carrot", "Potato", "Peas")
meat <- c("Steak", "Chicken", "Duck")
df <- data.frame(fruit, veg, meat)

apply(df, 2, table)

$fruit

Apple  Pear 
    2     1 

$veg

Carrot   Peas Potato 
     1      1      1 

$meat

Chicken    Duck   Steak 
      1       1       1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top