TrueSkill: How Halo 5 Matches Players

We often don't think about online multiplayer games as having difficulty in the same way that single-player games do and don't typically hold them to the same standard when it comes to tuning that difficulty. However, I'd argue that we should. An online match where we don't have the skill to compete can feel just as frustrating as a level of a single-player campaign that's too punishing. The difference is, in the solo experience, we can turn the difficulty setting down, and in a local multiplayer match we may even get to be discerning about our opponents, but in matchmade multiplayer games, we're at the mercy of the matchmaking systems. For this reason, accurate player matching is an essential aspect of online multiplayer games, and matchmaking algorithms should be able to correctly adjust difficulty as well as, if not better than, us. All the thoughtful map and gameplay design in the world doesn't matter if we don't get play where it feels like we could succeed if we tried hard enough.

Despite their nature as the linchpins of online competition, we don't spend a lot of time talking about matchmaking systems in mainstream game discussion. This is partly because the issue is dry and often gets into academic mathematical territory, but it's also because we don't get to see the code behind these mechanisms. We can talk about whether matching is fair or unfair, but can't probe into why it's that way or how it might be improved, except in the case of TrueSkill-based games, and particularly, in Halo 5. Microsoft has been only too happy to publically publish details of how much of their software works because if they can teach a lot of up and coming programmers how to develop with Microsoft technologies, those software engineers will help prop up the Microsoft platforms of the future. Then there's 343 Industries which understands that Halo has a demanding community which longs for direct interaction with the developers of the games and so talks loudly about how it matches players.

Between the contributions of Microsoft and 343, a lot of openly available material describing TrueSkill and TrueSkill2 exists online, just not in a form where most players are likely to piece together how these systems work overall. You've probably heard of TrueSkill before: It's a patented algorithm for ranking players of video games, developed by Microsoft but utilised by plenty of other companies from Blizzard to Epic. TrueSkill2 is its successor, one which aims to improve on its big brother, but so far has only been rolled out for Gears of War 4 and Halo 5. We're going to peer into both of these systems to learn how Microsoft matches players without needing to know any advanced maths or history of matchmaking algorithms.

We can get a complete overview of how these systems work from just two sources. The first of these is the posts of Josh Menke, the Lead Engagement Designer at 343 Industries. It's Menke's job to tune Halo 5's matchmaking, and he has posted nearly weekly updates on the player-matching apparatus to the Waypoint forums. He has also been replying to almost every user that interacts with him either on the boards or Twitter. It's an extraordinary commitment by a designer at a titan game studio to communicate with their players and respond to criticism. Menke's posts are useful in this context because he clarifies how Halo 5 takes data from Microsoft's ranking system and uses it to match players. To find out how Microsoft generate those numbers, to begin with, we can direct our attention to a research paper the company published in March 2018 titled "TrueSkill 2: An improved Bayesian skill rating system".

Before we discuss how TrueSkill operates, let's use these resources to explain how it doesn't. The profiles in Halo 5 display a merit called Competitive Skill Ranking or CSR. In every playlist, every season, you must play ten placement matches, and based on those matches, the game sorts you into a league and ranks you within that league, awarding you a CSR like "Silver 6" or "Diamond 1". Every time you win a match you edge closer to being promoted to the next rank, and when you lose one you slip closer to demotion. When 343 first unveiled CSR, they made it look as though this would be the metric used to match users and some outlets incorrectly reported it as such. I made the same mistake.

It's easy to get the impression that the game is grouping teams together via CSR as it's the only rank Halo 5 shows you, the game displays it only in the ranked mode, and you generally fight players of roughly the same CSR. If you did think that the matchmaking system was reliant on CSR, you were likely pretty peeved because CSR was far from an infallible measure of player performance. After your placement matches, it only ever responded to your teams' performances in games and was never a one-to-one reflection of your performance as an individual. You can still argue that as an outward-facing signifier of player skill, it's flawed, but as Menke confirms, CSR is not part of TrueSkill and has never been used to rank players, even if it is used to calculate your initial CSR after your placement.

So how does a title like Halo 5 match you? The simple answer is this: TrueSkill assigns you a number to represent your skill at the game, but one that's hidden from everyone but the engineers. This number is referred to by the community as MatchMaking Rank or MMR. Halo's matchmaker then compiles two or more opposing teams such that the total MMR of any one team roughly matches the total MMR of any other. It also tries to ensure that each player in a team has a similar MMR to all others. At the conclusion of matches, TrueSkill adjusts your MMR based on whether you won or lost. It also considers how long you were in the game when deciding how many points to shift your MMR by; longer matches count more towards your MMR while shorter matches count less, as the former provide a larger sample of your skill than the latter. However, as we touched on, the wins and losses that TrueSkill studies are not clean measurements of ability. The TrueSkill2 paper admits as much.

In the document, the researchers discuss some of the relevant factors that TrueSkill does not take into account when assessing users. For example, the algorithm doesn't consider player kills, it doesn't reflect that players in squads might perform better than those without squads, and it doesn't take into account that a user's skill is naturally going to have lapsed if they haven't logged in in a while. It also assumes that a person's ability is as likely to decrease over time as it is to increase, which is fallacious when we know that more practice makes players better. Furthermore, the algorithm doesn't deal with quitters appropriately. When you drop out of a match, TrueSkill updates your MMR according to whether your team won or lost, but if you've quit out, then you probably didn't contribute much to that win or loss and shouldn't be credited with it. Many users quit out of unfavourable matches in the first few seconds and can't be held responsible for the outcome. We can see that TrueSkill needed an upgrade, and Microsoft proved so in the research. By comparing TrueSkill's predictions of match results to real match results, Microsoft objectively displayed a number of areas in which the algorithm failed its predictive duties.

For example, it would predict that players in four-person squads would win 5% fewer matches than they did. TrueSkill also slightly over-estimated the skills of players new to the game and its lack of modelling KPM (Kills Per Minute) served as a severe blind spot. Microsoft found that variances in KPM accounted for up to a 13% difference in players' win rates when TrueSkill only assumed that a player with high KPM would have 2% difference in win frequency to one with low KPM. Very low-end and high-end players were the most misjudged by the algorithm; it predicted that those who scored 0.0-0.4 KPM would win 5% more than they did, while those who scored 2.8-4.0 KPM would win 5-9% less than they did. Remember, when the algorithm underestimates someone's likelihood to win, that player is going to get grouped with others who are too low for their skill level, and when the algorithm overestimates how likely someone is to win, they're going to get grouped with players too high for their skill level. Put another way, TrueSkill was wont to match you with players who were overpowered or underpowered, both in opposing teams and on your team. This placed undue pressure on other players to perform beyond their means, either to hold their own against opponents far more capable than them or to compensate for the lagging skill of Spartans on their team.

There are other examples of differences between TrueSkill's predictions and reality, but the effect they have on MMR is subtle enough that we won't fret over it here. Of course, when TrueSkill inaccurately matches two teams, the outcome of the match is not a fair reflection of their skill, so the update to their MMR based on whether they won or lost is inaccurate. After a false MMR adjustment, the participating players are more likely to be mismatched against other players, who may then experience the same erroneous updating of their MMR because the match wasn't fair, and the problem begins to pollute the whole player pool. Microsoft doesn't mention that in the paper; that's just a personal observation. Overall, the researchers said that TrueSkill has a 52% chance of predicting the outcome of a match correctly.

When crunching the numbers, TrueSkill2 takes into account a lot of the missing factors that TrueSkill didn't, including quits, kills over time, and whether players are part of a squad. It also assumes that a player is likely to get better the more matches they play and get worse the more days they're away from the game. In their research, Microsoft unequivocally demonstrates that TrueSkill2 is superior to TrueSkill in every one of these areas. Although, for what it's worth, they did find the algorithm is a little overzealous in de-ranking quitters. Working on the same dataset that they tested the original TrueSkill on, TrueSkill2 was proven to have a 68% success rate at predicting match outcomes. That 16% increase may not sound much, but designer Josh Menke describes it as one of the biggest leaps forwards he's seen in the field. When the system was implemented in Halo 5's social and warzone playlists in mid-April 2018, Menke reported that predictability of matches was at a roughly perfect rate and that the bottom 10% players who were previously only seeing a 37% win rate were now seeing a 50% rate.

Players having a 50% win rate is the ideal, and that may sound like the designers artificially limiting your ability to win, but the win rate is always going to come down to a certain percentage, and the game isn't forcing you to win or lose in any individual match. It's just when teams are consistently evenly matched over time then everyone's win frequency should regress to a mean of 50%. While TrueSkill bound you to the success of your team, often at risk of ignoring the aptitude you personally displayed, TrueSkill2 accounts for your performance as an individual and as part of a team. Under the watchful eye of the new algorithm, it's possible to lose a match and still have your MMR increase if you displayed prowess in the combat, and your team being underpowered is now less of a liability to your MatchMaking Rank.

With a successful test run, 343 implemented TrueSkill2 into Halo 5's ranked playlists at the start of May 2018, kicking off the Summer 2018 season. A few weeks later, they applied an update to more closely tie CSR to the MMR generated by TrueSkill2, making it a fairer measure of competence. I wish I could say "All's well that ends well", but there are a few loose threads. It was unpleasant enduring almost three years of lopsided matchmaking in Halo 5, and while Menke has been nothing but accommodating on social media, I wish 343 made it easier to educate yourself about the matching system without having to trawl through forums and technical documentation. None the less, this is a much-needed shot in the arm for Halo 5, Gears, and hopefully in the future, plenty of other online multiplayer games. And how often do you see all the maths and reasoning of a cutting-edge games technology given to you for free? We're in a very fortunate position to have TrueSkill on our side. Thanks for reading.

Halo 5: Guardians

Eight months after the events of Halo 4, the Master Chief has reunited with his former SPARTAN-II comrades. After they go AWOL, a team of SPARTAN-IVs known as Fireteam Osiris is assigned to hunt them down.

Comment and Save

TrueSkill: How Halo 5 Matches Players

Move topic to another board

Pick a List

Comment and Save