Question

I have a dataset of values that has multiple columns (for different sites) and rows (for different days) that I am trying to rank for each day using R. I would like the rank the data for each column (site) from the total number of sites within one day (so ranking based on each row). It would be possible to do in Excel, but would obviously take a long time. Below is a [much smaller] example of what i'm trying to achieve:

date - site1 - site2 - site3 - site4
1/1/00 - 24 - 33 - 10 - 13
2/1/00 - 13 - 25 - 6 - 2
~~ leading to:
date - site1 - site2 - site3 - site4
1/1/00 - 2 - 1 - 4 - 3
2/1/00 - 2 - 1 - 3 - 4

hopefully there's some simple command, thanks a lot!

Was it helpful?

Solution

You can use rank to give the ranks of the data.

# your data
mydf <- read.table(text="date - site1 - site2 - site3 - site4
1/1/00 - 24 - 33 - 10 - 13
2/1/00 - 13 - 25 - 6 - 2", sep="-", header=TRUE)

# find ranks
t(apply(-mydf[-1], 1, rank))

# add to your dates
mydf.rank <- cbind(mydf[1], t(apply(-mydf[-1], 1, rank)))

About the code

mydf[-1] # removes the first column

-mydf[-1] #using the `-` negates the values -so the rank goes in decreasing order

apply with MARGIN=1 finds the ranks across rows

The t transposes the matrix to give the output as you want

OTHER TIPS

This is a tidy way.

Reshape to long format, sort (arrange), group, and spread. The only tricky part is knowing that sorting groups means you've automatically ranked them (either ascending or descending). The function row_number acknowledges this.

library(tidyverse)
library(lubridate)

# Data   
df <- tribble(
  ~date,    ~site1,   ~site2,    ~site3,    ~site4,
  mdy("1/1/2000"),   24,       33,        10,          13,
  mdy("2/1/2000"),   13,       25,         6,           2
) 

df %>% 
  gather(site, days, -date) %>%       #< Make Tidy
  arrange(date, desc(days)) %>%       #< Sort relevant columns
  group_by(date) %>% 
  mutate(ranking = row_number()) %>%  #< Ranking function
  select(-days) %>%                   #< Remove unneeded column. Worth keeping in tidy format!
  spread(site, ranking)

#> # A tibble: 2 x 5
#> # Groups:   date [2]
#>   date       site1 site2 site3 site4
#>   <date>     <int> <int> <int> <int>
#> 1 2000-01-01     2     1     4     3
#> 2 2000-02-01     2     1     3     4

Created on 2018-03-06 by the reprex package (v0.2.0).
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top