Question

I'm trying to count the number of first, second, and third choice votes for each candidate of an election (specifically the Stack Overflow 2014 Moderator Election). I downloaded the data file and opened it up. Based on my rudimentary interpretation of the file format I wrote a PHP script to count the votes:

<?php

$lines = file("stackoverflow-com-2014-election-results.blt");
unset($lines[0]);

$ballots = 0;
$first = array();
$second = array();
$third = array();

for ($i = 1;; $i++) {
    $line = explode(" ", trim($lines[$i]));
    if ($line[0] != 1) break;
    $ballots++;
    @$first[$line[1]]++;
    @$second[$line[2]]++;
    @$third[$line[3]]++;
}

$names = array();

for ($i++; $i < count($lines); $i++) {
    $names[count($names) + 1] = trim(trim($lines[$i]), '"');
}

printf("%20s%8s%8s%8s%8s\n", "Name", "1st", "2nd", "3rd", "Total");
print(str_repeat("-", 52) . "\n");
foreach ($names as $id => $name) {
    printf("%20s%8s%8s%8s%8s\n", $name,
        $first[$id], $second[$id], $third[$id],
        $first[$id] + $second[$id] + $third[$id]);
}
print(str_repeat("-", 52) . "\n");
printf("Ballots: %d\n", $ballots);

When I run it at the command line it prints this table:

                Name     1st     2nd     3rd   Total
----------------------------------------------------
                Undo    1358    1425    1814    4597
            bluefeet    3352    3148    2287    8787
          0x7fffffff    1932    2147    2159    6238
            Bohemian    5678    2935    2415   11028
        Jon Clements    1531    1527    1618    4676
            Doorknob    1165    1720    1753    4638
         Raghav Sood    1358    1565    1571    4494
      Siddharth Rout    1732    1872    1866    5470
                Matt    1381    1988    2009    5378
              meagar    1903    2382    2881    7166
----------------------------------------------------
Ballots: 21571

My problem is I can't get this to match up to what OpenSTV says when I run it on the same file. The "count of first choices" are all slightly different:

Ballot file contains 21571 non-empty ballots.

Counting votes for Stack Overflow Moderator Election 2014 using Meek STV.
10 candidates running for 3 seats.

 R|Undo          |bluefeet      |0x7fffffff    |Bohemian      |Jon Clements  
  |              |              |              |              |              
  |--------------+--------------+--------------+--------------+--------------
  |Doorknob      |Raghav Sood   |Siddharth Rout|Matt          |meagar        
  |              |              |              |              |              
  |--------------+--------------+--------------+--------------+--------------
  |Exhausted     |Surplus       |Threshold     
  |              |              |              
=============================================================================
 1|   1379.000000|   3372.000000|   1951.000000|   5707.000000|   1545.000000
  |   1181.000000|   1375.000000|   1749.000000|   1389.000000|   1923.000000
  |      0.000000|    314.249999|   5392.750001
  |--------------------------------------------------------------------------
  | Count of first choices. Candidate Bohemian has reached the threshold and
  | is elected.
=============================================================================
[...]

What am I doing wrong? Or what is OpenSTV doing differently?


Update: My script was broken because it didn't take into account some rows which were encoded with second or third choices without prior choices being set. I'm guessing this was caused by users in the election sometimes deselecting prior choices: After having selected two candidates, deselecting the first choice candidate should treat the only remaining selected candidate as the user's first choice.

Fixed version:

<?php

$lines = file("stackoverflow-com-2014-election-results.blt");

$line = explode(" ", trim($lines[0]));
$numCandidates = $line[0];
$numChoices = $line[1];
$choiceVotes = array_fill(1, $numChoices, array_fill(1, $numCandidates, 0));

$totalBallots = 0;

for ($i = 1;; $i++) {
    $line = explode(" ", trim($lines[$i]));
    if ($line[0] == 0) break;
    $totalBallots++;
    for ($j = 1, $k = 1; $j <= $numChoices; $j++) {
        if ($line[$j] != 0) $choiceVotes[$k++][$line[$j]]++;
    }
}

$names = array();
for ($j = 1; $j <= $numCandidates; $j++) {
    $names[$j] = trim(trim($lines[$j + $i]), '"');
}

$rowFormat = "%20s" . str_repeat("%8s", $numChoices) . "%8s\n";
$separator = str_repeat("-", 20 + (8 * $numChoices) + 8) . "\n";

$row = array("Name");
for ($i = 1; $i <= $numChoices; $i++) $row[] = $i . gmdate('S', $i * 86400 - 1);
$row[] = "Total";
vprintf($rowFormat, $row);

print $separator;
foreach ($names as $id => $name) {
    $row = array($name);
    $candidateTotal = 0;
    for ($i = 1; $i <= $numChoices; $i++) {
        $votes = $choiceVotes[$i][$id];
        $row[] = $votes;
        $candidateTotal += $votes;
    }
    $row[] = $candidateTotal;
    vprintf($rowFormat, $row);
}
print $separator;
printf("Ballots: %d\n", $totalBallots);
Was it helpful?

Solution

I ran your script on the data file that contains 21563 ballot rows:

                Name     1st     2nd     3rd   Total
----------------------------------------------------
              meagar    1903    2382    2881    7166
         Raghav Sood    1357    1564    1570    4491
            Bohemian    5674    2935    2415   11024
          0x7fffffff    1932    2146    2159    6237
                Undo    1358    1424    1814    4596
            bluefeet    3352    3146    2284    8782
                Matt    1380    1988    2009    5377
        Jon Clements    1531    1526    1616    4673
      Siddharth Rout    1731    1871    1865    5467
            Doorknob    1165    1720    1752    4637
----------------------------------------------------
Ballots: 21563

and I also imported it into a Google Spreadsheet (link):

google spreadsheet

giving the same result as your script, using countif(). For example:

=countif( data!B:B;$A2)

to calculate value of cell C2.

So I must conclude that you did not use the same data. The data in

https://stackoverflow.com/election/download-result/5

even changed after you posted the question.

Update:

So a ballot row like this:

1 0 6 0 0

with the columns:

Weight 1st 2nd 3rd end 

seems to be treated as:

1 6 0

when taken through OpenSTV, i.e. 6 as the 1st choice and the 0 votes are ignored.

I did a test, where I removed the 0 choices (link to the modified .blt file)

The result from OpenSTV:

Ballot file contains 10 candidates and 21563 ballots.
No candidates have withdrawn.
Ballot file contains 21563 non-empty ballots.

Counting votes for Stack Overflow Moderator Election 2014 using Meek STV.
10 candidates running for 3 seats.

 R|meagar       |Raghav Sood  |Bohemian     |0x7fffffff   |Undo         |bluefeet     |Matt         
  |             |             |             |             |             |             |             
  |-------------+-------------+-------------+-------------+-------------+-------------+-------------
  |Jon Clements |Siddharth Rou|Doorknob     |Exhausted    |Surplus      |Threshold    
  |             |t            |             |             |             |             
====================================================================================================
 1|  1923.000000|  1374.000000|  5703.000000|  1951.000000|  1379.000000|  3372.000000|  1388.000000
  |  1544.000000|  1748.000000|  1181.000000|     0.000000|   312.249999|  5390.750001
  |-------------------------------------------------------------------------------------------------
  | Count of first choices. Candidate Bohemian has reached the threshold and is elected.
====================================================================================================

The result from your script:

                Name     1st     2nd     3rd   Total
----------------------------------------------------
              meagar    1923    2378    2865    7166
         Raghav Sood    1374    1558    1559    4491
            Bohemian    5703    2922    2399   11024
          0x7fffffff    1951    2143    2143    6237
                Undo    1379    1419    1798    4596
            bluefeet    3372    3138    2272    8782
                Matt    1388    1991    1998    5377
        Jon Clements    1544    1516    1613    4673
      Siddharth Rout    1748    1870    1849    5467
            Doorknob    1181    1724    1732    4637
----------------------------------------------------
Ballots: 21563

where I modified your script with isset().

So the count of 1st choices now seems to be the same.

OTHER TIPS

When I count the number of carriage returns, followed by this string "1 10" I get a total of 1903. That matches your count for first place votes for meagar. I think your script is correct.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top