On twitter I’ve begun to post regular output from “the model”. Reaction has been positive so far, although many wonder just exactly where all this data comes from.
As mentioned in previous posts, Nate Silver and FiveThirtyEight is a major influence of the model. The Irish political system requires a custom approach. The effort to adapt his work to Ireland has spanned a number of months. In all, the R code (language used for statistical programming) runs to over 1,000 lines, with many spreadsheets of data feeding the program.
The model’s first step (and Nate’s) is to take the polls and find a single figure for each party. The basic principle is that a more recent poll is “weighted” more, with less recent polls weighted less. Once aggregated, the polls reduce to a single figure for each party, also known as a “poll of polls”.
At the time of writing there are two approaches being taken. The first is my own simple polling aggregation. This has been the basis of the model up to the end of 2015, and the figures derived broadly mirror recent opinion polls.
As of the new year, I have been using poll averages from the Irish Polling Indicator which uses a much more advanced model. A special thanks to Tom Louwerse from Trinity College who has allowed use of his numbers. I encourage readers to check out his most recent paper on the topic.
Mapping polls to a constituency level
The model takes the “poll of polls” and attempts to spread the votes for each party across the 40 Irish constituencies. I use six “parties” – FG, FF, SF, LP, GP and NP. To “force categorise” the other parties into “non-party” (NP) is a temporary measure. We separate out the smaller parties later.
All figures used below are sample data.
The program begins by filling the blue boxes. I generate a random turnout, and use census/ electoral register data to produce total vote figures for each constituency.
The total nationwide vote is split into total figures for the party groups using the “poll of polls” from above.
Every party in every constituency is given its national average vote share. We know intuitively that if, for example, Fine Gael polls 30% nationwide, it will poll above 30% in Mayo, and below in Dublin South Central. For now, each constituency sits at 30% but it will adjust later.
Next is a table of “deviates”. Keep in mind the above table, but one filled with completely different data. The program examines data from the last general election (2011) and the last local elections (2014). The extent to which each party (in a given constituency) deviates from its national average at both elections is calculated and averaged.
This is used to create another table (same size as before) containing a “should be” percentage. The past election data combined with today’s polling allows us to determine what vote share a party should receive in a given constituency. Fine Gael in Mayo “should be” c. 15% higher than its national vote and “should be” c. 10% lower in Dublin South Central etc.
At this point you may ask why we can’t simply convert the “should be” percentages into numbers. The answer goes back to the blue totals above. If the green numbers don’t sum to match the blue numbers, the count is wrong.
To keep the totals in tact, I have written a “vote swap” algorithm to move votes around the green space, keeping the blue totals constant. A “votes to move” table determines how many votes are needed to bring a party (in each constituency) to its “should be” point.
The “vote swap” algorithm searches the “votes to move” table for the largest move that needs to take place, and finds other parties in that constituency which need to move the other way. Returning to the Mayo example, Fine Gael’s vote would need to rise, and Labour’s to fall. The program then searches the remaining constituencies for one where Fine Gael’s vote needs to fall, and Labour’s to rise. For every action, there is an equal and opposite reaction.
The algorithm repeats to cover every party in every constituency. Once complete, the model obtains a number of votes for every party in every constituency.
I must give a hat-tip to Adrian Kavanagh, Storyful’s IEOD, and the many other candidate lists in the field right now. I compile my own list through these sources and add some extra information (if applicable:
- 2011 vote – Number of votes at the 2011 general election.
- 2011 elected – A binary measure (0 or 1) if the candidate was elected at this election (also marked for Senators as an indicator of incumbency advantage).
- 2014 vote – Number of votes at the 2014 local elections.
- 2014 elected – A binary measure (0 or 1).
- Minister – 1 for a minister, 0.5 for a junior minister, 0 for a non-minister.
The above data combine through a number of formulas to create a “candidate weight”. This score only applies where two or more candidates are in contest from the same party. If one candidate contests, their vote is that for their party in the constituency (see previous section).
Example: If two Fianna Fáil candidates are in contest, the party’s vote in the constituency will split by the candidate weights. Let’s assume 10,000 votes are available. If candidate A has a weight of 0.6, and candidate B has 0.4, A will win 6,000 first preference votes (FPVs) and B wins 4,000. This sounds rigid, but later on I discuss simulation and how party ticket underdogs are allowed to lead on occasion.
Where a candidate has not contested a prior election, subjective interpretation must be used to assign a candidate weight. This is particularly sensitive to independent/other candidates who may possess some local/national “celebrity” status (or lack of). A qualitative reading of each constituency is used to determine a weight.
The 2014 local election results also undergo some subjective adjustment. A candidate may have received a small number of votes, but may have contested a very strong area for their party. It is likely (although by no means certain) that the votes received by other candidates in that area would rally around this person in a general election setting. A local election area (LEA) can typically be one-third to one-sixth of a constituency’s area and population. This adjustment is also made when a sitting TD is joined by a Councillor on a party ticket. The candidate weights would otherwise greatly favour the incumbent if this adjustment isn’t made.
At the heart of the model is a full simulation of Ireland’s PR-STV election count system. R code has been written to incorporate all the rules of the count.
A unique feature of the Irish system is “non transferable” (NT) votes. These occur when voters fail to give preferences for every candidate on the ballot. As a count progresses, the number of NT votes rises. To model this I examined a number of past general/local election counts. I then derived a formula which takes the votes to be transferred at any count and determines the number of NT votes.
t – (t*(1-((1-(c) / n))^(1/0.67))/1.85))
t = The number of transfers available
c = The count number
n = The number of candidates in the constituency
In any constituency, transfers contain a mix of party preferences (Sinn Féin to AAA/PBP for example) and local characteristics. The latter is beyond the scope of this model. We lack sufficient data to allocate transfers based on close proximity of candidates. Dual county constituencies (Cavan-Monaghan, Sligo-Leitrim etc.) have a tendency to transfer within county lines, in an effort to secure a TD for the county. A potential improvement for the model would be transfers based on geographic proximity, perhaps using the address provided for each candidate on the ballot paper (although not all candidates live in their constituencies).
To determine party to party transfers, the model takes an average of transfer patterns from the 2011 general election and 2014 local elections. Some adjustments are made including incorporation of the recent transfer pact between Fine Gael and Labour. This simple approach presents some issues. While Sinn Féin to AAA/PBP transfers are high nationally (and vice versa), they are likely to vary at a constituency level. We lack sufficient data to model transfers locally, so a national average must be taken.
Above is the “transfer matrix” for the 2011 general election. The figures included are percentages, however you will notice that rows and columns don’t total to 100%. I have highlighted internal party transfers which are naturally high, but which didn’t occur for the United Left Alliance and Green Party at the time (the opportunity never arose).
Example: Let’s assume we are at the final count. A Fianna Fáil candidate has been eliminated and only a Fine Gael candidate and a Labour candidate remain. Historically we see that Fianna Fáil gives 19% of available transfers to Fine Gael and 13% to Labour, a ratio of 19:13. In percentages, Fine Gael will receive 59% of the transfers, and Labour 41%. Non-transferable votes would be high in this case, but recall that these are modeled separately. If candidates of all parties are available to receive transfers, the ratio is 19:13:58:9:7:11:23, or 14%, 9%, 41%, 6%, 5%, 8% and 16%, and so on. Where multiple candidates of the same party can receive transfers, the model divides the available votes randomly, but with consideration to the strength of each candidate (like candidate weights earlier).
With the above processes complete, we obtain one set of constituency results. To assess all possible scenarios, we run the model 1,000 times. All variables will change throughout:
- Turnout – At the last general election, turnout reached 70%, a figure not seen since 1997. Turnout in 2016 is likely to be lower. I model turnout at 50-65%. Any figure within this interval is equally likely to emerge (uniform random). In any case, turnout does not change the results. It allows us to estimate a figure for FPVs if we wish.
- Party vote – Above I discussed the “poll of polls” which gives us a figure for party share. We also know the margin of error for these polls. Taking the example of a single poll, we have a margin of error of +/- 3%. If Sinn Féin is on 17%, its margin of error is a range of 14-20%. A figure within this interval is randomly chosen at each of the 1,000 simulations. We use the normal distribution. Those who remember the concept of the “bell curve” from the Leaving Cert (or elsewhere) will be able to picture its shape. Of the 1,000 simulations, the vast majority will place Sinn Féin at 16%, 17% and 18%. Every now and then, 15% and 19% will emerge, and on rare occasions, 14% and 20%. Note that this is an overly simple example rounded to the nearest whole number. Margins of error are much more complex (especially for smaller parties).
- Deviates – Earlier, I determined to what extent a party is below or above its national average in a given constituency. This is based off the 2011 and 2014 elections. Some constituencies are more accurately captured through the 2011 data, others in 2014, although we are not sure exactly which. I work off the assumption that the 2014 data is more valuable and better captures the distribution of a party’s support nationwide. I also allow for deviates outside these two ranges. Note that this variable is perhaps the most important for determining the outcome of constituency contests.
- Candidate weights – Earlier I spoke about party ticket underdogs, those candidates who are expected to poll below their party colleagues but who occasionally outperform the odds. This is not just a feature of party tickets but of the Independent/Others category. I model such candidates using a normal distribution for each candidate. Visually, this can be seen as overlapping distributions. The area of overlap signifies the proportion of times an underdog candidate will surpass their party colleague on the ticket. The allowance given to underdog candidates is arbitrary in the model. We don’t have sufficient data to determine how often candidates surpassed expectations in previous elections. Anecdotally, we know that a sitting TD fears the Councillor who joins them on the party ticket, despite incumbency advantage.
- Transfers – Transfers are modeled with a 50-100% weight towards the 2014 local elections, versus the 2011 general election. The assumption being that the 2014 election better captures the likely transfer patters in 2016. Once again a reminder that local transfer patterns are not factored in, but instead make use of national averages.
As the model runs, countless data points are recorded. The most important is a binary measure of elected, or not. This is taken for each candidate at each of the 1,000 simulations. From this we can derive the probability of election for every candidate, the likely seats for each party, and even the likely number of female TDs. When run on different dates, the model allows us to see the prospects for each candidate unfold over time. As new polls and candidates emerge, the data changes.
I will be uploading all of this data to the site once the final list of candidates is known. Modelling would be inconsistent over time otherwise, a point to bear in mind for party totals already being published on social media.
Your feedback is most welcome on the methods above. The various assumptions underlying the model have yet to be tested at an election and are worthy of improvement and fine tuning before the elections takes place.