R: loop to add to table new columns, each populated with data aggregated from different csv file

https://stackoverflow.com/questions/11410584

20-06-2021
|

Question

I am struggling to work out how to do this in R. I have data like this from a set of ~50 csv files, each detailing an individual books sale transaction:

**week 1**
**author** **title** **customerID**
author1 title1 1
author1 title2 2
author2 title3 3
author3 title4 3

**week 2**
**author** **title** **customerID**
author1 title1 4
author3 title4 5
author4 title5 1
author5 title6 6

... ~ 50 weeks, each from a separate csv file

I want to get a new table, each row representing an author that appears in the complete data set, and with columns for each of the ~50 weeks that I have data for. Each cell should be the number of book sales of that author in that week. That can be calculated simply from summing the number of rows with that author in that week's sales file. So it should look something like this:

**author** **week1** **week2** ... **week50**
author1 2 1 ...
author2 1 0 ...
author3 1 1 ...
author4 0 1 ...
author5 0 1 ...
...

Any ideas? I know how to get the list of unique authors to make the first column from. And I know how to load each week's sale data into a data frame. But I need help automating this process: 1) iterating over the unique authors 2) iterating over each week's csv file or data frame 3) summing the sales for that author in that week 4) adding count as the value for that cell

Could anyone help? Thanks :-)

Solution

text1<-"**week 1**
**author** **title** **customerID**
author1 title1 1
author1 title2 2
author2 title3 3
author3 title4 3
"

df1<-read.table(header=T,skip=1,stringsAsFactors=F,text=text1)
week1<-read.table(header=F,nrows=1,stringsAsFactors=F,text=text1,sep=";")
week1<-substr(week1,3,nchar(week1)-2)
df1$week<-rep(week1,nrow(df1))

text2<-"**week 2**
**author** **title** **customerID**
author1 title1 4
author3 title4 5
author4 title5 1
author5 title6 6
"

df2<-read.table(header=T,skip=1,stringsAsFactors=F,text=text2)
week2<-read.table(header=F,nrows=1,stringsAsFactors=F,text=text2,sep=";")
week2<-substr(week2,3,nchar(week2)-2)
df2$week<-rep(week2,nrow(df2))

df<-rbind(df1,df2)
names(df)<-c("author","title","customerID","week")

require(plyr)
agg<-ddply(df,~author*week,function(df) length(df$title))


require(reshape)
res<-cast(agg,author~week,value="V1",fill=0)
res

   author week 1 week 2
1 author1      2      1
2 author2      1      0
3 author3      1      1
4 author4      0      1
5 author5      0      1

You only need a loop to read in your data. For that you can use something like

ff<-list.files(pattern="*.[Cc][Ss][Vv]")
for (i in 1:length(ff)){
  code for reading the data 
  and constructing the data.frame 
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow