finding the most frequent item using bigmemory techniques and parallel computing? [closed]

StackOverflow https://stackoverflow.com/questions/23572524

  •  19-07-2023
  •  | 
  •  

質問

How can I find which months have the most frequent delays without using regression? The following csv is a sample of a 100MB file. I know I should use bigmemory techniques but wasn't sure how to approach this. Here months are stored as integers not factor.

Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay
2006,1,11,3,743,745,1024,1018,US,343,N657AW,281,273,223,6,-2,ATL,PHX,1587,45,13,0,,0,0,0,0,0,0
2006,1,11,3,1053,1053,1313,1318,US,613,N834AW,260,265,214,-5,0,ATL,PHX,1587,27,19,0,,0,0,0,0,0,0
2006,1,11,3,1915,1915,2110,2133,US,617,N605AW,235,258,220,-23,0,ATL,PHX,1587,4,11,0,,0,0,0,0,0,0
2006,1,11,3,1753,1755,1925,1933,US,300,N312AW,152,158,126,-8,-2,AUS,PHX,872,16,10,0,,0,0,0,0,0,0
2006,1,11,3,824,832,1015,1015,US,765,N309AW,171,163,132,0,-8,AUS,PHX,872,27,12,0,,0,0,0,0,0,0
2006,1,11,3,627,630,834,832,US,295,N733UW,127,122,108,2,-3,BDL,CLT,644,6,13,0,,0,0,0,0,0,0
2006,1,11,3,825,820,1041,1021,US,349,N177UW,136,121,111,20,5,BDL,CLT,644,4,21,0,,0,0,0,20,0,0
2006,1,11,3,942,945,1155,1148,US,356,N404US,133,123,121,7,-3,BDL,CLT,644,4,8,0,,0,0,0,0,0,0
2006,1,11,3,1239,1245,1438,1445,US,775,N722UW,119,120,103,-7,-6,BDL,CLT,644,4,12,0,,0,0,0,0,0,0
2006,1,11,3,1642,1645,1841,1845,US,1002,N104UW,119,120,105,-4,-3,BDL,CLT,644,4,10,0,,0,0,0,0,0,0
2006,1,11,3,1836,1835,NA,2035,US,1103,N425US,NA,120,NA,NA,1,BDL,CLT,644,0,17,0,,1,0,0,0,0,0
2006,1,11,3,NA,1725,NA,1845,US,69,0,NA,80,NA,NA,NA,BDL,DCA,313,0,0,1,A,0,0,0,0,0,0
役に立ちましたか?

解決

Let's say your data.frame is called dd. If you want to see the total number of weather delays for each month across all years you can do

delay <- aggregate(WeatherDelay~Month, dd, sum)
delay[order(-delay$WeatherDelay),]

他のヒント

Is this closer to what you want? I don't know R well enough to sum the rows, but this at least aggregates them. I am learning, too!

delays <- read.csv("tmp.csv", stringsAsFactors = FALSE)

delay <- aggregate(cbind(ArrDelay, DepDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay) ~ Month, delays, sum)
delay

It outputs:

  Month ArrDelay DepDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay
1     1       10      -16            0        0             0                 0
2     2      -31       -2            0        0             0                 0
3     3        9       -4            0       20             0                 0

Note: I changed your document a bit to provide some diversity on the Months column:

Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay
2006,1,11,3,743,745,1024,1018,US,343,N657AW,281,273,223,6,-2,ATL,PHX,1587,45,13,0,,0,0,0,0,0,0
2006,1,11,3,1053,1053,1313,1318,US,613,N834AW,260,265,214,-5,0,ATL,PHX,1587,27,19,0,,0,0,0,0,0,0
2006,2,11,3,1915,1915,2110,2133,US,617,N605AW,235,258,220,-23,0,ATL,PHX,1587,4,11,0,,0,0,0,0,0,0
2006,2,11,3,1753,1755,1925,1933,US,300,N312AW,152,158,126,-8,-2,AUS,PHX,872,16,10,0,,0,0,0,0,0,0
2006,1,11,3,824,832,1015,1015,US,765,N309AW,171,163,132,0,-8,AUS,PHX,872,27,12,0,,0,0,0,0,0,0
2006,1,11,3,627,630,834,832,US,295,N733UW,127,122,108,2,-3,BDL,CLT,644,6,13,0,,0,0,0,0,0,0
2006,3,11,3,825,820,1041,1021,US,349,N177UW,136,121,111,20,5,BDL,CLT,644,4,21,0,,0,0,0,20,0,0
2006,1,11,3,942,945,1155,1148,US,356,N404US,133,123,121,7,-3,BDL,CLT,644,4,8,0,,0,0,0,0,0,0
2006,3,11,3,1239,1245,1438,1445,US,775,N722UW,119,120,103,-7,-6,BDL,CLT,644,4,12,0,,0,0,0,0,0,0
2006,3,11,3,1642,1645,1841,1845,US,1002,N104UW,119,120,105,-4,-3,BDL,CLT,644,4,10,0,,0,0,0,0,0,0
2006,3,11,3,1836,1835,NA,2035,US,1103,N425US,NA,120,NA,NA,1,BDL,CLT,644,0,17,0,,1,0,0,0,0,0
2006,1,11,3,NA,1725,NA,1845,US,69,0,NA,80,NA,NA,NA,BDL,DCA,313,0,0,1,A,0,0,0,0,0,0
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top