Question

I am new to R and to Stackoverflow and I need an assistant in sorting and extracting information from a data frame I created. I need to extract which IATA and NAME has received the most commission. The result should print: 3301, you pay, 12. I can subset each and every IATA but it is a long process. What will be the best function in R to sort all this information and print out this information.

  IATA          NAME TICKET_NUM            PAX FARE TAX COMM NET
3300      pay more        700     john cohen   10 1.1    2   8
3300      pay more        701     james levy   11 1.2    2   9
3300      pay more        702 jonathan arbel   12 1.2    3   9
3300      pay more        703      gil matan    9 1.0    2   7
3301      you pay         704      ron natan   19 2.0    6   9
3301      you pay         705    don horvitz   18 2.0    6   9
3302 pay by ticket        706  lutter kaplan    9 1.2    0   9
3303         enjoy        707   lutter omega   12 1.2    0  12
3303         enjoy        708   graig daniel   14 1.3    1  13
3303         enjoy        730 orly rotenberg   15 1.0    1  14
3303         enjoy        731     yohan bach   12 1.0    1  11
Was it helpful?

Solution

This seems to return what you requested (using Jeremy's code for the second part):

comm <- read.table(text = '

IATA          NAME TICKET_NUM            PAX FARE TAX COMM NET
3300      pay.more        700     john.cohen   10 1.1    2   8
3300      pay.more        701     james.levy   11 1.2    2   9
3300      pay.more        702 jonathan.arbel   12 1.2    3   9
3300      pay.more        703      gil.matan    9 1.0    2   7
3301       you.pay        704      ron.natan   19 2.0    6   9
3301       you.pay        705    don.horvitz   18 2.0    6   9
3302 pay.by.ticket        706  lutter.kaplan    9 1.2    0   9
3303         enjoy        707   lutter.omega   12 1.2    0  12
3303         enjoy        708   graig.daniel   14 1.3    1  13
3303         enjoy        730 orly.rotenberg   15 1.0    1  14
3303         enjoy        731     yohan.bach   12 1.0    1  11

', header=TRUE, stringsAsFactors = FALSE)

comm2 <- with(comm, aggregate(COMM ~ IATA + NAME, FUN = function(x) sum(x, na.rm = TRUE)))
comm2

max_comm <- comm2[comm2$COMM == max(comm2$COMM),]
max_comm

  IATA    NAME COMM
4 3301 you.pay   12

Here is an explanation of the first statement:

The with function identifies the data set to use (here comm). The function aggregate is a general function for performing operations on groups. You want to operate on COMM by IATA and NAME. You write that: COMM ~ IATA + NAME. Next you specify the desired function to perform on COMM (here sum). You do that with FUN = function(x) sum(x). In case there are any missing observations in COMM I added na.rm = TRUE within the sum(x) function.

OTHER TIPS

Call that table comm

max_comm <- comm[comm$COMM == max(comm$COMM),]

or just sort it and look at the head

head(comm[order(-comm$COMM),])

Edit: If you want to sum by IATA first then use data.table

library(data.table)
comm2 <- data.table(comm)
sum_comm <- comm2[, list(COMM_SUM=sum(COMM)), by = c("IATA","NAME")]

data.table has an unusual syntax, you could also try dplyr which is supposed to be roughly as good as data.table now

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top