R implementation of sorting by reversals

https://stackoverflow.com/questions/12883296

07-07-2021
|

Question

I was wondering if there is an implementation in R where it sorts a permutation of n numbers into the original 1...n sequence and provides the number of reversals needed. Eg an implementation of the "sorting by reversals" or "sorting by translocation" as outlined in this ppt.

Specifically, I have a permutation of a sequence of n elements, pi(n), and I want to figure out how close it is to the original sequence. The number of reversals seems a good metric.

Thanks!

Solution

This seems like a job for Kendall's distance (also known, sometimes, as the Bubble-sort distance). It is probably the most commonly used metric to measure distance in ranking space.

The Kendall distance counts the number of times that two sequences differ in their ordering of the items in two indices. In the case that one of the sequences is the trivial sequence (1, 2, ..., n), we can measure the distance simply by counting the number of times that i < j and pi(i) > pi(j).

If you like this metric (it is equivalent to the minimum number of pairwise transpositions of adjacent items you would have to complete to transform one sequence into 1:n), you can find it in my package, RMallow, available on CRAN. The function is called AllSeqDists. Here is an example:

library(RMallow)
# Create a matrix of sequences, each of length 5
datas <- matrix(c(1:5, 5:1, c(2, 1, 3, 4, 5), c(5, 1, 2, 3, 4), c(1, 2, 4, 5, 6), c(1, 5, 6, 2, 4)), nrow = 6, byrow = TRUE)
# Calculate all of their Kendall distances to the sequence (1, 2, 3, 4, 5)
datas <- SimplifySequences(datas)
dists <- AllSeqDists(datas)

You might also consider Spearman's metric.
Also, there are a class of models on ranking data that I must plug called "Mallows models", depending on what you want to do.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow