How to find sum and average for some columns based on the numbers from another column in R

StackOverflow https://stackoverflow.com/questions/21272198

  •  01-10-2022
  •  | 
  •  

Question

GIVEN DATA

I have 6 columns of data of vehicle trajectory (observation of vehicles' change in position, velocity, etc over time) a part of which is shown below:

Vehicle ID Frame ID Global X Vehicle class Vehicle velocity Lane
          1      177  6451181             2            24.99    5
          1      178  6451182             2            24.95    5
          1      179  6451184             2            24.91    5
          1      180  6451186             2            24.90    5
          1      181  6451187             2            24.96    5
          1      182  6451189             2            25.08    5

Vehicle ID is the identification of individual vehicles e.g. vehicle 1, vehicle 2, etc. It is repeated in the column for each frame in which it was observed. Please note that each frame is 0.1 seconds long so 10 frames make 1 second. The IDs of frames is in Frame ID column. Vehicle class is the type of vehicle (1=motorcycle, 2=car, 3=truck). Vehicle velocity column represents instantaneous speed of vehicle in that instant of time i.e. in a frame. Lane represents the number or ID of the lane in which vehicle is present in a particular frame.

WHAT I NEED TO FIND

The data I have is for 15 minutes period. The minimum frame ID is 5 and maximum frame ID is 9952. I need to find the total number of vehicles in every 30 seconds time period. This means that starting from the first 30 seconds (frame ID 5 to frame ID 305), I need to know the unique vehicle IDs observed. Also, for these 30 seconds period, I need to find the average velocity of each vehicle class. This means that e.g. for cars I need to find the average of all velocities of those vehicles whose vehicle class is 2. I need to find this for all 30 seconds time period i.e. 5-305, 305-605, 605-905,..., 9605-9905. The ouput should tables for cars, trucks and motorcycles like this:

Time Slots   Total Cars   Average Velocity
5-305   xx   xx  
305-605   xx   xx
.   .   .
.   .   .
9605-9905   xx   xx 

WHAT I HAVE TRIED SO FAR

# Finding the minimum and maximum Frame ID for creating 30-seconds time slots
minfid <- min(data$'Frame ID') # this was 5
maxfid <- max(data$'Frame ID') # this was 9952

for (i in 'Frame ID'==5:Frame ID'==305) {
table ('Vehicle ID')
mean('Vehicle Velocity', 'Vehicle class'==2)
}   #For cars in first 30 seconds

I can't generate the required output and I don't know how can I do this for all 30 second periods. Please help.

Was it helpful?

Solution

It's a bit tough to make sure code is completely correct with your data since there is only one vehicle in the sample you show. That said, this is a typical split-apply-combine type analysis you can execute easily with the data.table package:

library(data.table)
dt <- data.table(df)  # I just did a `read.table` on the text you posted
dt[, frame.group:=cut(Frame_ID, seq(5, 9905, by=300), include.lowest=T)]

Here, I just converted your data into a data.table (df was a direct import of your data posted above), and then created 300 frame buckets using cut. Then, you just let data.table do the work. In the first expression we calculate total unique vehicles per frame.group

dt[, list(tot.vehic=length(unique(Vehicle_ID))), by=frame.group]
#    frame.group tot.vehic
# 1:     [5,305]         1  

Now we group by frame.group and Vehicle_class to get average speed and count for those combinations:

dt[, list(tot.vehic=length(unique(Vehicle_ID)), mean.speed=mean(Vehicle_velocity)), by=list(frame.group, Vehicle_class)]
#    frame.group Vehicle_class tot.vehic mean.speed
# 1:     [5,305]             2         1     24.965

Again, a bit silly when we only have one vehicle, but this should work for your data set.


EDIT: to show that it works:

library(data.table)
set.seed(101)
dt <- data.table(
  Frame_ID=sample(5:9905, 50000, rep=T), 
  Vehicle_ID=sample(1:400, 50000, rep=T),
  Vehicle_velocity=runif(50000, 25, 100)
)
dt[, frame.group:=cut(Frame_ID, seq(5, 9905, by=300), include.lowest=T)]
dt[, Vehicle_class:=Vehicle_ID %% 3]
head(
  dt[order(frame.group, Vehicle_class), list(tot.vehic=length(unique(Vehicle_ID)), mean.speed=mean(Vehicle_velocity)), by=list(frame.group, Vehicle_class)]
)
#    frame.group Vehicle_class tot.vehic mean.speed
# 1:     [5,305]             0       130   63.34589
# 2:     [5,305]             1       131   61.84366
# 3:     [5,305]             2       129   64.13968
# 4:   (305,605]             0       132   61.85548
# 5:   (305,605]             1       132   64.76820
# 6:   (305,605]             2       133   61.57129

Maybe it's your data?

OTHER TIPS

Here is a plyr version:

data$timeSlot <- cut(data$FrameID, 
                     breaks = seq(5, 9905, by=300), 
                     dig.lab=5, 
                     include.lowest=TRUE)

# split & combine
library(plyr)
data.sum1 <- ddply(.data = data, 
                .variables = c("timeSlot"), 
                .fun = summarise, 
                   totalCars = length(unique(VehicleID)),
                   AverageVelocity = mean(velocity)
                )


# include VehicleClass
data.sum2 <- ddply(.data = data, 
                   .variables = c("timeSlot", "VehicleClass"), 
                   .fun = summarise, 
                     totalCars = length(unique(VehicleID)),
                     AverageVelocity = mean(velocity)
)

The column names like FrameID would have to be edited to match the ones you use:

data <- read.table(sep = "", header = TRUE, text = "
VehicleID FrameID GlobalX VehicleClass velocity Lane
 1 177 6451181 2 24.99 5
 1 178 6451182 2 24.95 5
 1 179 6451184 2 24.91 5
 1 180 6451186 2 24.90 5
 1 181 6451187 2 24.96 5
 1 182 6451189 2 25.08 5") 
data.sum1
#   timeSlot totalCars AverageVelocity
# 1  [5,305]         1          24.965

data.sum2
#   timeSlot VehicleClass totalCars AverageVelocity
# 1  [5,305]            2         1          24.965
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top