Raw voting data comes in all shapes and forms, and not all voting data can be easily transformed into a format that is ready for ternary plots.
This vignette shows how to transform common voting data formats into
a ternable-friendly format that can be used for ternary
plots. These include PrefLib-formatted data, raw ballot data in long and
wide forms, and AEC distribution of preference data.
ternable-friendly dataas_ternable() creates a ternable object,
which is a S3 object that contains the data and metadata necessary for
ternary plots.
A ternable-friendly data frame must have the following
characteristics:
df <- tibble(
electorate = c("A", "B", "C"),
PartyA = c(0.5, 0.4, 0.6),
PartyB = c(0.3, 0.4, 0.2),
PartyC = c(0.2, 0.2, 0.2)
)
df
#> # A tibble: 3 × 4
#> electorate PartyA PartyB PartyC
#> <chr> <dbl> <dbl> <dbl>
#> 1 A 0.5 0.3 0.2
#> 2 B 0.4 0.4 0.2
#> 3 C 0.6 0.2 0.2The above data frame is a minimal example of a
ternable-friendly data frame. It contains 3 columns, each
representing the preferences for a party in a specific electorate. The
values in these columns are all non-negative and sum to 1.
PrefLib
format is a common format for storing preferential data, and can be
easily downloaded using prefio::read_preflib()
function.
In this example, we will use the NSW Legislative Assembly Election dataset.
nswla <- read_preflib("00058 - nswla/00058-00000171.soi", from_preflib = TRUE)
nswla
#> # A tibble: 1,104 × 2
#> # ℹ 1,094 more rows
#> # ℹ 2 more variables: preferences <prefrncs>, frequency <int>The dataset contains preferential data, with each row representing a
unique set of preferences and their respective frequencies. What is
missing to transform this data into a ternable-friendly
format is the breakdown of preferences for each candidate in each round
of voting.
The function dop_irv() in this package helps us simulate
the Instant Runoff Voting (IRV) process to get the round-by-round
preferences for each candidate, and then transform the data into a
ternable-friendly format. A caveat for this function is
that it does not handle ties in preferences given a ballot, which works
fine under Australian IRV rules, but may not be appropriate for other
electoral systems.
dop_irv(
nswla, value_type = "percentage",
preferences_col = preferences,
frequency_col = frequency)
#> # A tibble: 4 × 8
#> round `HAYLEN Jo` `WEI Leo` `RAUE Tom` `MAKRIS Andrea` `ROMANOVSKY Teresa`
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.464 0.233 0.206 0.0572 0.0252
#> 2 2 0.469 0.236 0.209 0.0595 0.0270
#> 3 3 0.478 0.240 0.218 0.0646 0
#> 4 4 0.504 0.253 0.244 0 0
#> # ℹ 2 more variables: `SINDEN Dale` <dbl>, winner <chr>Raw ballot data mostly comes in long or wide format. Both formats can
be transformed into Preflib format, which can then be converted to
ternable-friendly format using dop_irv().
Processing long format
Consider the following example of simulated ballot data in long format for the Melbourne electorate, with 3 parties: ALP, LNP, and Other, and a typical column for preference rank.
| ballot_id | elect_division | party | preference_rank |
|---|---|---|---|
| 1 | Melbourne | ALP | 1 |
| 1 | Melbourne | LNP | 2 |
| 1 | Melbourne | Other | 3 |
| 2 | Melbourne | ALP | 2 |
| 2 | Melbourne | LNP | 1 |
| 2 | Melbourne | Other | 3 |
| 3 | Melbourne | ALP | 3 |
| 3 | Melbourne | LNP | 2 |
| 3 | Melbourne | Other | 1 |
| 4 | Melbourne | ALP | 1 |
| 4 | Melbourne | LNP | 3 |
| 4 | Melbourne | Other | 2 |
| 5 | Melbourne | ALP | 2 |
| 5 | Melbourne | LNP | 3 |
| 5 | Melbourne | Other | 1 |
# Convert to PrefLib format
preflib_long <- prefio::long_preferences(
ballot_long,
vote,
id_cols = c(ballot_id, elect_division),
item_col = party,
rank_col = preference_rank
)
# Convert to ternable-friendly format
dop_irv(preflib_long$vote, value_type = "percentage")
#> # A tibble: 2 × 5
#> round ALP LNP Other winner
#> <int> <dbl> <dbl> <dbl> <chr>
#> 1 1 0.4 0.2 0.4 ALP
#> 2 2 0.6 0 0.4 ALPProcessing wide format
Consider the same example, but in wide format, with each column representing an item/candidate and its value representing the preference rank.
| ballot_id | elect_division | ALP | LNP | Other |
|---|---|---|---|---|
| 1 | Melbourne | 1 | 2 | 3 |
| 2 | Melbourne | 2 | 1 | 3 |
| 3 | Melbourne | 3 | 2 | 1 |
| 4 | Melbourne | 1 | 3 | 2 |
| 5 | Melbourne | 2 | 3 | 1 |
# Convert to PrefLib format
preflib_wide <- prefio::wide_preferences(ballot_wide, vote, ALP:Other)
# Convert to ternable-friendly format
dop_irv(preflib_wide$vote, value_type = "percentage")
#> # A tibble: 2 × 5
#> round ALP LNP Other winner
#> <int> <dbl> <dbl> <dbl> <chr>
#> 1 1 0.4 0.2 0.4 ALP
#> 2 2 0.6 0 0.4 ALPThis section applies specifically to the AEC distribution of preferences data. However, it can also be adapted to datasets of similar format, where
In this example, we will work with 2025 Distribution of Preference
data, which is included in this package as aecdop_2025. We
are interested in preference percentages of 4 main parties: Labor, The
Coalition, Greens, and Independents. The rest is grouped into Other.
Before transforming the data, let’s do some data wrangling to filter for relevant information.
# only include preference percentage and parties of interest
aecdop_2025 <- aecdop_2025 |>
filter(CalculationType == "Preference Percent") |>
mutate(Party = case_when(
# all parties not in the main parties are grouped into "Other"
!(PartyAb %in% c("LP", "ALP", "NP", "LNP", "LNQ", "GRN", "IND")) ~ "Other",
# group all parties in the Coalition into "LNP"
PartyAb %in% c("LP", "NP", "LNP", "LNQ") ~ "LNP",
TRUE ~ PartyAb
))The data is now ready for transformation, with the following characteristics:
DivisionNm and CountNumber columns
together form the primary key that uniquely identifies each row.CalculationValue column contains the preference
percentage.Party column contains the item/party.Elected column contains the elected/electoral status of
the party.We will use the dop_transform() function to transform
the data into a ternable-friendly format.
dop_transform(
data = aecdop_2025,
key_cols = c(DivisionNm, CountNumber),
value_col = CalculationValue,
item_col = Party,
winner_col = Elected,
winner_identifier = "Y"
)
#> # A tibble: 976 × 8
#> DivisionNm CountNumber ALP GRN LNP Other IND Winner
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Adelaide 0 0.465 0.190 0.242 0.104 0 ALP
#> 2 Adelaide 1 0.467 0.194 0.242 0.0967 0 ALP
#> 3 Adelaide 2 0.476 0.205 0.244 0.0745 0 ALP
#> 4 Adelaide 3 0.483 0.210 0.249 0.0578 0 ALP
#> 5 Adelaide 4 0.493 0.222 0.285 0 0 ALP
#> 6 Adelaide 5 0.691 0 0.309 0 0 ALP
#> 7 Aston 0 0.373 0 0.377 0.209 0.0414 ALP
#> 8 Aston 1 0.373 0 0.378 0.206 0.0434 ALP
#> 9 Aston 2 0.376 0 0.380 0.210 0.0338 ALP
#> 10 Aston 3 0.378 0 0.384 0.202 0.0358 ALP
#> # ℹ 966 more rows