Question

I recently gathered, using a questionnaire, a set of opinions on the importance of various software components. Figuring that some form of Condorcet voting method would be the best way to obtain an overall rank, I opted to use OpenSTV to analyze it.


My data is in tabular format, space delimited, and looks more or less like:

A B C D E F G    # Candidates
5 2 4 3 7 6 1    # First ballot. G is ranked first, and E is ranked 7th
4 2 6 5 1 7 3    # Second ballot
etc

In this format, the number indicates the rank and the sequence order indicates the candidate. Each "candidate" has a rank (required) from 1 to 7, where a 1 means most important and a 7 means least important. No duplicates are allowed.

This format struck me as the most natural way to represent the output, being a direct representation of the ballot format.


The OpenSTV/BLT format uses a different method of representing the same info, conceptually as follows:

G B D C A F E    # Again, G is ranked first and E is ranked 7th
E B G A D C F    # 
etc

The actual numeric file format uses the (1-based) index of the candidate, rather than the label, and so is more like:

7 2 4 3 1 6 5    # Same ballots as before.
5 2 7 1 4 3 6    # A -> 1, G -> 7

In this format, the number indicates the candidate, and the sequence order indicates the rank. The actual, real, BLT format also includes a leading weight and a following zero to indicate the end of each ballot, which I don't care too much about for this.


My question is, what is the most elegant way to convert from the first format to the (numeric) second?

Was it helpful?

Solution

Here's my solution in Python, and it works ok but feels a little clumsy. I'm sure there's a cleaner way(perhaps in another language?)

This took me longer than it should have to wrap my head around yesterday afternoon, so maybe somebody else can use this too.

Given:

ballot = '5 2 4 3 7 6 1'

Python one(ish)-liner to convert it:

rank = [i for r,i in sorted((int(r),i+1) for i,r in enumerate(ballot.split())]
rank = " ".join(rank)

Alternatively, in a slightly more understandable form:

# Split into a list and convert to integers
int_ballot = [int(x) for x in ballot.split()]

# This is the important bit.
# enumerate(int_ballot) yields pairs of (zero-based-candidate-index, rank)
# Use a list comprehension to swap to (rank, one-based-candidate-index)
ranked_ballot = [(rank,index+1) for index,rank in enumerate(int_ballot)]

# Sort by the ranking. Python sorts tuples in lexicographic order
# (ie sorts on first element)
# Use a comprehension to extract the candidate from each pair
rank = " ".join([candidate for rank,candidate in sorted(ranked_ballot)])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top