Question

I have a data frame and I want to return a ranking for each Category based on PCC.

> head(newdf)
            ItemId    Category PCC
1       5063660193 Go to Gifts   2
2   24154563660193 Go to Gifts   1
2.1 24154563660193   All Gifts   1
3   26390063660193 Go to Gifts   3
3.1 26390063660193   All Gifts   3
4         18700100 Go to Gifts   1

I've initially though to do it using the sqldf package, but unfortunately a dependence (tcltk) is not available for R version 3.0.2.

With sqldf a call similar to the following should do the job:

# ranking by category
rank <- sqldf("select 
                 nf.ItemId,
                 nf.Category,
                 nf.PCC,
                 rank() over(Partition by nf.Category order by nf.PCC, nf.ItemId, nf.Category) as Ranks

               from 
                 newdf as nf

               order by 
                 nf.Category,
                 nf.Ranks")

Do you know any alternative I can use?

Was it helpful?

Solution

These are only a handful of the different ways to do this:

dat <- read.table(text = "            ItemId    Category PCC
       5063660193 'Go to Gifts'   2
   24154563660193 'Go to Gifts'   1
 24154563660193   'All Gifts'   1
   26390063660193 'Go to Gifts'   3
 26390063660193   'All Gifts'   3
         18700100 'Go to Gifts'   1",header = TRUE,sep = "")

library(plyr)
ddply(dat,.(Category),transform, val = rank(PCC))

library(dplyr)
mutate(group_by(dat,Category),val = rank(PCC))

library(data.table)
dat1 <- data.table(dat)
setkey(dat1,Category)
dat1[,val := rank(PCC),by = key(dat1)]

Also, I am able to load sqldf on R 3.0.2 just fine, so I'm not sure what your problem there is.

This uses the default behavior of rank. See ?rank and the ties.method argument to customize it to your exact needs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top