This week Cliff Thomas from the Stat Check team is dropping in to talk about their Elo system, how it works, and why it’s useful.
Warhammer 40k is a complex game, requiring mastery of rules, list construction, risk management, emotional state management, the stochastic nature of d6 dice rolls, and thousands of micro-decisions over the course of a game, making it inherently difficult to engage with competitively. Tracking the outcome of a Warhammer 40k game can be similarly challenging. Player skill, style, matchups, and the occasionally vast differences in power between the lists in a given game can make predicting outcomes tough.
However, there is a way to track these variables using a data point that elegantly contains information about a player’s ability to successfully navigate those challenges while summarizing it in a single score: the Elo player rating system. Elo not only provides us with a means to measure player skill and validate its ability, but it’s also incredibly transparent. That elegance and transparency are why my colleagues and I at Stat Check (courtesy of the heroic work of Jeremy “Curie” Atkinson) have produced Elo scores for all of the players in our quarter-million game dataset. In this post, I’ll be talking about Elo, how we use it for 40k, its comparative strengths relative to other Warhammer 40k internet points, and what areas of improvement we’re looking to address in the future.
First, a quick rundown on Elo’s development. Elo was originally developed by the Hungarian-American physicist Arpad Elo, drawing on the foundation of the Harkness rating system. Since it’s debut in 1960, Elo spread from its original adoption by the United States Chess Federation throughout the rest of the competitive chess world, and has either been used or served as the foundation for player ratings across just about every other competitive card, board, or sports game there is.
The core logic underlying the Elo system is quite straightforward. It assigns a numerical rating to each player. When two players compete, it recalculates each player’s rating based on the outcome of that game: win, and your rating goes up; lose, and it goes down. The magnitude of these changes, however, is not constant. It depends on the relative strength of the opponents, as determined by their current ratings. If a player wins against a higher-rated opponent, they gain more points than if they won against someone with a lower rating. Conversely, losing to a lower-rated player results in a larger rating decrease. This self-correcting mechanism ensures that the ratings are dynamic and strive to zero in on the true measure of a player’s skill level.
Before getting into Elo’s usage in real time, here’s a breakdown of the key components of the Elo rating formula:
- Player Ratings (R)
- Definition: Numerical Representation of a player’s skill level
- Use: Base value for calculating rating changes
- Rating Difference
- Definition: The difference in ratings between two players
- Formula: Ropponent – Rplayer
- Purpose: Determines the expected outcome of a match based on the disparity in skill levels.
- Expected Score (E)
- Definition: The probability of a player winning a match
- Formula: E = 1 / [1 + 10 ^ (Rating Difference / 400)]
- Purpose: Calculates each player’s chances of winning prior to the match, based on the difference in player ratings
- Actual Score
- Definition: The actual outcome of the match
- Values: 1 for a win, 0.5 for a draw, and 0 for a loss
- Purpose: Represents the real result of the game
- K-Factor
- Definition: The constant which determines the maximum amount a rating can change after a single game
- Values: It varies – 30 or 32 are commonly-used values here. We currently use 32 at Stat Check.
- Purpose: Controls the volatility of a rating change – if you want Elo ratings to adjust more quickly to game outcomes, you push this higher. If you want them to adjust more slowly, you make it lower.
- Rating Update
- Definition: Each player’s recalculated new player rating (R) after a match
- Formula: New Rating = Old Rating + K * (Actual Score – Expected Score)
- Purpose: Adjusts a player’s rating based on their performance compared to the expected outcome.
The actual calculation of the scores is relatively straightforward – we can use a scenario to illustrate the process.
Let’s say I (Cliff) make my way to Chicago for work, and decided to catch up with my Stat Check colleague (Stats Daddy) Nathan over a 40k game. For the sake of this game, we’ll leave alone the fact that Nathan is actually good at 40k and I’m a list-building meme lord.
- Pre-game, we both have Elo ratings of 1500. Since we both have same rating, our expected chance of winning is equal – this would be true no matter high or low each player’s rating is, as long as its equal.
- Each of our expected chances of winning (E) is calculated using the following formula:
E = 1 / [1+10^(1500−1500)/400] = 1 / [1 + 10^(0/400)] = 1 / [1 + 1] = 0.5
This means we each have a 50% expected chance of winning the game. - Nathan wins! Elo now produces a ratings update for each of us depending on the outcome of the game:
- Nathan’s new rating: 1500+32×(1−0.5)=1500+32×0.5=1500+16=1516
- Cliff’s new rating: 1500+32×(0−0.5)=1500−32×0.5=1500−16=1484
- Nathan has won, since he is 1. actually good and 2. was probably using Eldar or World Eaters while Cliff showed up with Imperial Knights. Given their identical starting ratings, Nathan’s win leads to a direct swap in points between the two players. Nathan gains 16 points, bringing his new rating to 1516, while Cliff loses the same amount, dropping his rating to 1484. Those new ratings will be use to inform the expected outcomes of their games in the future, and will adjust depending on the results of those games, which then inform the expected outcomes of future games, etc.
For reference, here’s a table showing the expected win probability and potential points gained and lost for a given player with a 1500 Elo rating in matchups against players with lower and higher ratings.
Opponent's Elo | Expected Win Rate | Points Gained for Win | Points Lost for Loss |
---|---|---|---|
1300 | 0.76 | +8 | -24 |
1350 | 0.70 | +9 | -23 |
1400 | 0.64 | +12 | -20 |
1450 | 0.57 | +14 | -18 |
1500 | 0.50 | +16 | -16 |
1550 | 0.43 | +18 | -14 |
1600 | 0.36 | +20 | -12 |
1650 | 0.30 | +23 | -9 |
1700 | 0.24 | +24 | -8 |
1750 | 0.19 | +26 | -6 |
1800 | 0.15 | +27 | -5 |
1850 | 0.12 | +28 | -4 |
1900 | 0.09 | +29 | -3 |
1950 | 0.07 | +30 | -2 |
2000 | 0.05 | +30 | -2 |
The dynamic adjustments that Elo calculations make are the most important part of the system, and are its biggest strength. unexpected outcomes provide the ratings system with new, important information, which it attempts to incorporate into player ratings through larger rating adjustments. This does, of course, have some risks – upsets can feed the “wrong info” to the system, producing a ratings adjustments that may not be the most accurate reflection of a give player’s skill – even great players can get sick, be jetlagged, get a ruling wrong, or be on the receiving end of a statistically unlikely, game-changing dice roll.
As long as players continue to compete, Elo has the opportunity to correct itself, eventually balancing out any adjustments that may have been influenced by external factors. It’s worth noting the viewpoint that expects great players to adjust to those circumstances (Jordan’s iconic Flu Game, anyone?), or avoid them in the first place, enduring hardship on their way to victory.
So how does Stat Check calculate it’s Elo Player Ratings for Warhammer 40k? First, we use our dataset of competitive events – over a quarter of a million 2,000 point games played in events with at least five rounds and at least 25 players, dating back to February 7th, 2022. Each player in the dataset is assigned an initial rating of 1500, and we use a K-Factor of 32, allowing for relatively quick ratings updates. We then use the same calculations described in the example scenario above to update the rankings.
So far, Elo’s been fairly accurate in its estimations of game outcomes. In a random sampling of 1,000 games in our dataset, Elo correctly predicted an individual’s game outcome 80% of the time. That’s a pretty good number, especially given the intensely random nature of Warhammer 40k’s underlying mechanics. As gluttons for data-driven accuracy, we’re considering introducing ratings decay to fine-tune the scores – if players are inactive over time, it’s likely that they’re going to be a bit out of practice on their return to competitive play.
There are a few assumptions built in to Elo tracking that are specific to 40k. We’re all aware of the fact that there is wide performance variation across factions – there is extremely rich data in our Meta Dashboard and our partner Goonhammer’s 40k Stats to back that up. We also know that terrain types can differ dramatically across events – Player-Placed Terrain, Canadian Player-Placed Terrain, GW Open, UKTC, WTC, etc. all have a substantial impact on the expected performance of lists and their parent factions. Events can have different faqs, may or may not use clocks, may or may not include paint scores, etc. We assume that player skill – as measured by Elo – includes the ability to succeed within and across these different contexts. If we’re interested in measuring individual player skill, their ability to adapt to these contextual shifts is necessarily included in their performance.
That said, there are a few limitations inherent to our dataset. Many of the world’s top players focus on team play, which isn’t well represented in our data. The team dynamic doesn’t lend itself to individual Elo score tracking, as success for a given player can mean producing close losses where an average player would suffer a blowout defeat. We may also be missing some highly skilled players who don’t attend the kinds of events that we track – the “playground legends” of Warhammer 40k.
We think the pros outweigh the cons, especially given Elo’s transparency. Any changes or adjustments to our system are currently (and always will be) clear and easy to understand. Elo is also based purely on game outcomes, rather than a formulaic derivation basedx on the number of events attended or number of players present at a given event. Compared other Warhammer 40k player rating systems, we think Elo’s transparency, simplicity, and accuracy do the best job yet of determining player skill.
As a quick closeout, here’s a sample of the information you can find at https://www.stat-check.com/elo – these are the top 10 players in the world according to Elo ratings, as of December 6th 2023. You’ll find global and regional Elo ratings there, and if you’ve participate in a five round, 25+ 40k event since February 2022 you’ll find yourself there! We’ll be updating this continually, so be on the lookout for any changes and their accompanying announcements/ If you have any questions, feedback, or general commentary, please reach out to me or any of the folks on the Stat Check team in the comments or elsewhere in the 40k internet. As always, if you’re looking to support Jeremy’s Elo work, find us on Patreon at https://www.patreon.com/statcheck. Keep gaming, and GW please save my golden boys.
Rank | Change | Player | Elo | WIns | Losses | Draws | Win Rate | Primary 10E Faction |
---|---|---|---|---|---|---|---|---|
1 | - | Mani Cheema | 2119.4 | 143 | 12 | 0 | 92.3% | Chaos Space Marines |
2 | - | John Lennon | 2038.8 | 87 | 11 | 0 | 88.8% | Tyranids |
3 | - | Thomas Ogden | 2012.5 | 92 | 6 | 0 | 93.9% | T'au Empire |
4 | - | Anthony Vanella | 2011.8 | 85 | 16 | 2 | 83.5% | World Eaters |
5 | - | Mike Porter | 2006.6 | 91 | 6 | 0 | 93.8% | Aeldari |
6 | - | Liam Vsl | 2005.9 | 50 | 4 | 3 | 90.4% | Chaos Space Marines |
7 | -1.1 | Vik Vijay | 1986.5 | 112 | 14 | 3 | 88.0% | Adepta Sororitas |
8 | - | Brad Chester | 1980.1 | 120 | 18 | 1 | 86.7% | Orks |
9 | - | Jack Harpster | 1977.9 | 89 | 10 | 2 | 89.1% | Black Templars |
10 | - | Arne Zerndt | 1961.1 | 79 | 19 | 3 | 79.7% | Aeldari |
Have any questions or feedback? Drop us a note in the comments below or email us at contact@goonhammer.com.