In game development, triage is one of those many activities that happens continuously, is barely discussed with people coming into games, and additionally is somewhat vaguely defined between different studios and people in games. In general, if you've made games, you've done triage, but you might've never had it defined or discussed. In general, it means to rank tasks by a (hopefully pre-defined) system of priority and feasibility.
Triaging is a soft science - part art and creativity, part spreadsheet-level management, part production (and production debt), and part being able to read a room. There's no right way of doing it, but there are plenty of wrong ways to do it.
When I say there's no right way to do it, consider this (borderline utopian) example: you're weeks from launching your next game, and things are looking pretty good. Wishlists are high, and the game is entirely stable. A final pass from QA suddenly reveals that the utopia has turned to dystopia: recent changes have introduced two bugs.
- One bug occurs in 99% of playthroughs, near the start, and creates a clear annoyance during the tutorial where a notification gets visually stuck. It is clear that some people might be annoyed enough that they declare the game "buggy" and quit - but the bug does not cause a crash or a situation the player cannot continue from.
- One bug occurs in 0.00001% of playthroughs, near the end of the game, and not only crashes the game, but has a chance of randomly corrupting files on the user harddrive. The chance that those files are important or critical is extremely low, due to massive amount of files on a computer, but obviously the chance exists.
You only have time to fix one bug before launch, and the other one won't be fixed until a month after launch. So what is right?
One argument might say that the odds of something bad happening are so miniscule (1 in 10,000,000, only near completion, and even then critical files are unlikely to be damaged) that the very common annoyance at the start is a higher priority - especially with the buggy tutorial, the chances of someone making it to completion are extra low already.
The other argument is that anything known that damages user files could be considered a breach of trust, and opens the studio up to extreme criticism and distrust (and perhaps, legal liability). As such, even though the chances are slim, you could argue that fixing this is far more important.
So which is correct? I can't tell you: this is really up to you, your producer, or your studio to figure out. My personal preference has always been to protect the user, and as such, I always prioritize issues that can harm the user device or data, then the user experience, and then anything that doesn't harm either. I also know studios that have rolled the dice on pretty rough bugs, and came out fine.
This is obviously an extreme example, but you can adjust it to be less extreme. What if the one bug was a 'fix by reload' that happened in 5% of players during the tutorial, and the low-probability bug near the end was a crash-to-desktop that happened in 5% of players? Or what if one feature was an additional spectacular sequence near the start of the game (where most players will see it), and the other was an additional spectacular sequence added near the end (where few will see it, but they'll be engaged)?
The reason the industry uses 'triage' is presumably because the word is taken from medical terminology, where it is defined as "a practice invoked when acute care cannot be provided for lack of resources. The process rations care towards those who are most in need of immediate care, and who benefit most from it". The games industry has a tendency to be over-dramatic in its jargon, but it can't be denied that 'lack of resources' is a chronic problem: triaging happens throughout all of development. It is also true that the solutions to complex triage are often... creative.
One of my favorite examples of triage was a designer challenged with fixing a physics bug that caused an entire level to drop frames and overload the audio engine into a screeching noise: every object in the level spawned slightly off the ground, and the engine somehow insisted on handling all of it at once. It didn't help that the sequence started with a spectacular helicopter crash - going from such a high to weird technical issues was a definite disappointment.
Ultimately, the designer fixed it by adding a sequence that cut to black, muted the audio engine, waited for the physics bug to play out behind the dark screen and with no sound, and then faded in the screen again with a loud ringing-in-the-ear sound. The fix ended up playing into the scene, making it better with literal duct tape.
This sort of shenanigan, too, is triage. Throughout development there will be times where you can properly fix things - and there will be times where you can not. As long as the issue is resolved in a way that minimizes the amount or severity of issues you're introducing with the fix - that's a fix. It's (often, but not always) better to have a few rare and barely noticable bugs than one show-stopping one. And it is triage, too, when you realize one of the rare and barely noticable bugs has more severe effects down the line and you need to fix that, now.
Triage is also accepting that somethings will not be fixed, because the risk of applying a fix outweighs the benefits of fixing it. This leads to the common misunderstanding that "Quality Assurance didn't find the bug" - often, QA did find the bug, reported it, and triage prioritized other issues or didn't want to risk stability issues in fixing it.
Just to emphasize a point I made earlier but somewhat subtle: triage isn't only relevant for bugs - it is also relevant for features or content. Just like anything added or removed from a game, weighing the benefits and risks is critical to ensuring a production that is as smooth as possible.
My personal triage system is based on a throw-away remark someone made at Vlambeer back in the early 2010's, where I find my first reference to the phrase. I can't remember whether it was my co-founder, Jan Willem, or a freelancer working with us, or whether it was me that coined it - but I have used this phrase ever-since: how much Metacritic will this get us for the effort?
Metacritic aggregates review scores and user scores from across the internet into a single score, and has been both a boon and a terrible burden for the industry. It helps people quickly estimate how good a game is, but shady practices have long attached bonuses and promotions to the score a game ends up with on Metacritic.
It's obviously (partially) in jest, but it has helped me communicate to non-producers what I mean with triage over-and-over. You want to add this level? Well, it'll take us two months to make at least, we have four months left - how much Metacritic do you think we'll get out of it? If the answer is just 0.1 extra points on Metacritic, that might not be worth spending half the remaining development time on. You might argue a bit, back and forth, about how much Metacritic exactly is gained by a feature, or lost by a bug - but generally you should be able to come to some sort of consensus.
As a producer, it is critically important that motivation stays up near the end of development, and involving your team & clearly communicating that you're trying to ship the best game possible instead of just being a nay-sayer can really help. What system you use is entirely up to you: I use the silly Metacritic one - you can use whatever helps you communicate.
Just keep in mind: triage is a function of priority, necessity, and feasability. As long as your system keeps all of those in mind, communicates clearly, and is pre-defined, you'll be good. You can adjust on the fly if you have time, but often during stressful periods of development there will not be much time to decide: the last thing you want is to have to argue about how your team decides what gets fixed as time runs out on making either of the fixes.
- Do yourself a favour, and watch Mike Acton's incredible 2017 GDC talk on communication, ROI, and triage.
- Whether you're working alone or with a team, try to create a workable system of triage. Decide how you default prioritisation: is severity more important than frequency? Do you count issues near the start more than near the end? Discuss potential scenarios and try to create a workable system out of it. Document it, and save it somewhere where every stakeholder involved in triage can find it if needed.
- Carefully consider your communication surrounding triage: are you ensuring that your team understands why certain decisions are being made? Are you involving the right people in the decision-making? You can't always make fun choices, and sometimes triage is depressing because you know certain issues are going to cause problems and you can't fix those, but it is important to keep reflecting on your communication. If you work on a team, after a triage process, see if you can get feedback from key players. If you work alone, see if you can keep track of your motivation, mental health, and happiness throughout the process.
Was this helpful?
Consider subscribing to Levelling The Playing Field! Every article will always be free-to-read, but subscribing helps me gauge interest in the effort & ensure that I'm using my limited time to help the most developers possible. After subscribing, you'll get every new post of game development advice delivered to your inbox as soon as it goes live. If you can afford it & want to support LTPF, please consider supporting the newsletter with a fully-optional Paid membership to help make useful industry knowledge available for free.Subscribe to & support LTPF!
Want more Levelling The Playing Field?
Subscribe to LTPF to never miss another game dev advice post. Learn the answers to commonly asked questions about most aspects of game development, and the things you didn't know you didn't know about the art, craft, and science of game development.
Become a paid Levelling The Playing Field supporter to join the conversation.