Vote counting script is not adding up correctly
Question
I'm trying to count the number of first, second, and third choice votes for each candidate of an election (specifically the Stack Overflow 2014 Moderator Election). I downloaded the data file and opened it up. Based on my rudimentary interpretation of the file format I wrote a PHP script to count the votes:
<?php
$lines = file("stackoverflow-com-2014-election-results.blt");
unset($lines[0]);
$ballots = 0;
$first = array();
$second = array();
$third = array();
for ($i = 1;; $i++) {
$line = explode(" ", trim($lines[$i]));
if ($line[0] != 1) break;
$ballots++;
@$first[$line[1]]++;
@$second[$line[2]]++;
@$third[$line[3]]++;
}
$names = array();
for ($i++; $i < count($lines); $i++) {
$names[count($names) + 1] = trim(trim($lines[$i]), '"');
}
printf("%20s%8s%8s%8s%8s\n", "Name", "1st", "2nd", "3rd", "Total");
print(str_repeat("-", 52) . "\n");
foreach ($names as $id => $name) {
printf("%20s%8s%8s%8s%8s\n", $name,
$first[$id], $second[$id], $third[$id],
$first[$id] + $second[$id] + $third[$id]);
}
print(str_repeat("-", 52) . "\n");
printf("Ballots: %d\n", $ballots);
When I run it at the command line it prints this table:
Name 1st 2nd 3rd Total
----------------------------------------------------
Undo 1358 1425 1814 4597
bluefeet 3352 3148 2287 8787
0x7fffffff 1932 2147 2159 6238
Bohemian 5678 2935 2415 11028
Jon Clements 1531 1527 1618 4676
Doorknob 1165 1720 1753 4638
Raghav Sood 1358 1565 1571 4494
Siddharth Rout 1732 1872 1866 5470
Matt 1381 1988 2009 5378
meagar 1903 2382 2881 7166
----------------------------------------------------
Ballots: 21571
My problem is I can't get this to match up to what OpenSTV says when I run it on the same file. The "count of first choices" are all slightly different:
Ballot file contains 21571 non-empty ballots.
Counting votes for Stack Overflow Moderator Election 2014 using Meek STV.
10 candidates running for 3 seats.
R|Undo |bluefeet |0x7fffffff |Bohemian |Jon Clements
| | | | |
|--------------+--------------+--------------+--------------+--------------
|Doorknob |Raghav Sood |Siddharth Rout|Matt |meagar
| | | | |
|--------------+--------------+--------------+--------------+--------------
|Exhausted |Surplus |Threshold
| | |
=============================================================================
1| 1379.000000| 3372.000000| 1951.000000| 5707.000000| 1545.000000
| 1181.000000| 1375.000000| 1749.000000| 1389.000000| 1923.000000
| 0.000000| 314.249999| 5392.750001
|--------------------------------------------------------------------------
| Count of first choices. Candidate Bohemian has reached the threshold and
| is elected.
=============================================================================
[...]
What am I doing wrong? Or what is OpenSTV doing differently?
Update: My script was broken because it didn't take into account some rows which were encoded with second or third choices without prior choices being set. I'm guessing this was caused by users in the election sometimes deselecting prior choices: After having selected two candidates, deselecting the first choice candidate should treat the only remaining selected candidate as the user's first choice.
Fixed version:
<?php
$lines = file("stackoverflow-com-2014-election-results.blt");
$line = explode(" ", trim($lines[0]));
$numCandidates = $line[0];
$numChoices = $line[1];
$choiceVotes = array_fill(1, $numChoices, array_fill(1, $numCandidates, 0));
$totalBallots = 0;
for ($i = 1;; $i++) {
$line = explode(" ", trim($lines[$i]));
if ($line[0] == 0) break;
$totalBallots++;
for ($j = 1, $k = 1; $j <= $numChoices; $j++) {
if ($line[$j] != 0) $choiceVotes[$k++][$line[$j]]++;
}
}
$names = array();
for ($j = 1; $j <= $numCandidates; $j++) {
$names[$j] = trim(trim($lines[$j + $i]), '"');
}
$rowFormat = "%20s" . str_repeat("%8s", $numChoices) . "%8s\n";
$separator = str_repeat("-", 20 + (8 * $numChoices) + 8) . "\n";
$row = array("Name");
for ($i = 1; $i <= $numChoices; $i++) $row[] = $i . gmdate('S', $i * 86400 - 1);
$row[] = "Total";
vprintf($rowFormat, $row);
print $separator;
foreach ($names as $id => $name) {
$row = array($name);
$candidateTotal = 0;
for ($i = 1; $i <= $numChoices; $i++) {
$votes = $choiceVotes[$i][$id];
$row[] = $votes;
$candidateTotal += $votes;
}
$row[] = $candidateTotal;
vprintf($rowFormat, $row);
}
print $separator;
printf("Ballots: %d\n", $totalBallots);
Solution
I ran your script on the data file that contains 21563
ballot rows:
Name 1st 2nd 3rd Total
----------------------------------------------------
meagar 1903 2382 2881 7166
Raghav Sood 1357 1564 1570 4491
Bohemian 5674 2935 2415 11024
0x7fffffff 1932 2146 2159 6237
Undo 1358 1424 1814 4596
bluefeet 3352 3146 2284 8782
Matt 1380 1988 2009 5377
Jon Clements 1531 1526 1616 4673
Siddharth Rout 1731 1871 1865 5467
Doorknob 1165 1720 1752 4637
----------------------------------------------------
Ballots: 21563
and I also imported it into a Google Spreadsheet (link):
giving the same result as your script, using countif()
. For example:
=countif( data!B:B;$A2)
to calculate value of cell C2
.
So I must conclude that you did not use the same data. The data in
https://stackoverflow.com/election/download-result/5
even changed after you posted the question.
Update:
So a ballot row like this:
1 0 6 0 0
with the columns:
Weight 1st 2nd 3rd end
seems to be treated as:
1 6 0
when taken through OpenSTV, i.e. 6
as the 1st choice and the 0
votes are ignored.
I did a test, where I removed the 0
choices (link to the modified .blt file)
The result from OpenSTV:
Ballot file contains 10 candidates and 21563 ballots.
No candidates have withdrawn.
Ballot file contains 21563 non-empty ballots.
Counting votes for Stack Overflow Moderator Election 2014 using Meek STV.
10 candidates running for 3 seats.
R|meagar |Raghav Sood |Bohemian |0x7fffffff |Undo |bluefeet |Matt
| | | | | | |
|-------------+-------------+-------------+-------------+-------------+-------------+-------------
|Jon Clements |Siddharth Rou|Doorknob |Exhausted |Surplus |Threshold
| |t | | | |
====================================================================================================
1| 1923.000000| 1374.000000| 5703.000000| 1951.000000| 1379.000000| 3372.000000| 1388.000000
| 1544.000000| 1748.000000| 1181.000000| 0.000000| 312.249999| 5390.750001
|-------------------------------------------------------------------------------------------------
| Count of first choices. Candidate Bohemian has reached the threshold and is elected.
====================================================================================================
The result from your script:
Name 1st 2nd 3rd Total
----------------------------------------------------
meagar 1923 2378 2865 7166
Raghav Sood 1374 1558 1559 4491
Bohemian 5703 2922 2399 11024
0x7fffffff 1951 2143 2143 6237
Undo 1379 1419 1798 4596
bluefeet 3372 3138 2272 8782
Matt 1388 1991 1998 5377
Jon Clements 1544 1516 1613 4673
Siddharth Rout 1748 1870 1849 5467
Doorknob 1181 1724 1732 4637
----------------------------------------------------
Ballots: 21563
where I modified your script with isset()
.
So the count of 1st choices now seems to be the same.
OTHER TIPS
When I count the number of carriage returns, followed by this string "1 10" I get a total of 1903. That matches your count for first place votes for meagar. I think your script is correct.