Everyone’s busy, tl;dr
- Trust is hard to gain and easy to lose and this can be applied to software quality
- Software quality is based on user’s perception, and each user’s experience is unique
- It’s not single events (unless severe enough) that move the needle on software quality, but repeated issues (even if of different types).
- Build metrics and usage on the user journey, and back it up with the system/component metrics.
- Use “golden path” to tackle it in a strategic/systematic way.
- UI/UX plays the biggest role in software quality
Trust is a tricky thing
“Trust comes on foot but leaves on horseback” is a Dutch proverb often attributed to the 19th-century Dutch statesman Johan Rudolph Thorbecke. It means that credibility is slow to build but can be quickly destroyed.
When it comes to software quality it is very relatable. You could have years of stable, reliable software only for it to become unreliable in what feels like overnight. Trust destroyed instantly isn’t usually the typical case with software because in our modern area individuals have been conditioned to many software problems in their daily lives. Some forms of bugs are often tolerated in short quantities. It is only the prolonged or repeated exposure to them, or severity of the issue, that can change someone’s general perception.
Software quality from a user’s perspective can be summed up in a simple statement: “Does this software work the way I expect it to?” This ends up being somewhat tricky. Software could be working as designed with no errors and a user could think the software quality is poor due to UX or UI. A button not where you’d expect it to be, visual indicators popping up in unexpected ways. Or in a more common case bugs or incidents surface and the software doesn’t work as it is intended to.
Since software quality is based on expectation and perception of an individual’s experience, software quality perspectives can also differ wildly even when the same end result is delivered.
What have we seen?
Netflix had a clever approach to quality that often went unnoticed. You may remember Netflix only having partial outages, where only a select number of users would be down at one point in time. For the Netflix user this was certainly a sign of poor software quality. But if they asked their friend – “is Netflix working for you?” And it was, then this changed their perception. It wasn’t a problem with Netflix, it was a problem with their setup. Facebook does the same thing. Rarely are there full complete outages where all users are offline. Issues get fixed quickly and people forget about them over time (unless they repeat).
Twitter users would be familiar with the “Fail whale” that showed a large whale being held up by birds during an outage. Github had the failing unicorn for a long time. These failure scenarios were light hearted and brought some users joy even though things were not working as planned.
Github, Twitter, and a number of the large tech sites commonly had issues but weren’t regarded as sites with poor software quality. A lot of issues would be transient or affect a small portion or edge case of users. In a lot of cases it would not be a daily issue.
It is not a single instance of a bug that makes a user think that software is poor, unless that single issue is catastrophic. Github has come under a lot of scrutiny lately for its major outages, and claude code, a popular AI tool has also had significant repeated major issues putting their platform under the availability microscope. It’s the repeat of issues that cause the perception change. If the bank lost all your money due to a software glitch, you might have a worse opinion of their software quality from a single incident.
Railway another recent major incident that eroded trust, but it also doesn’t seem like that 1 incident alone was enough, it was piled on to repeated issues.
But what can we measure?
In a dream world you would have a simple metric – number of defects a user has seen per day, or per hour, in an extreme case. If you add criticality to that, you could set a number of average defects you’d expect a user to still be ‘satisfied’ with per day. Everyone knows all software has bugs so expecting it to always be 0.0 just isn’t realistic. This metric is a challenge though, oftentimes bugs or outages are unexpected. To be able to label or categorize all of the bugs, that you don’t know exist, would be near impossible. Incidents are also often time surprises.
These types of metrics already do exist. If you think of CSAT or NPS scores, these are commonly used approaches to a similar problem space. In some ways you could think of these as software quality metrics, but aren’t exactly that.
Using a layered approach of Symptom and Caused based alarming will help tie in user expectations into systems, components and services. This post is pretty good on why to focus on symptoms, not causes: https://cloud.google.com/blog/topics/developers-practitioners/why-focus-symptoms-not-causes
Think of it this way – do you alert on ‘Logging in on mobile is slow’ or on 10 different metrics that could mean logging in on mobile is slow (like database query performance). Ideally you want both for different reasons and different teams. But when it comes to software quality and user perception you likely want Symptom based metrics and alerts.
Symptom based monitoring.
To get going on Symptom based monitoring you need to understand the user experience. Map out the features of your product, score them based on importance and note category and whether it is a golden path for the user or not. Golden path can mean many things but in this context – does the user need this to be able to complete the core user flow. In a video game, this might mean can the user login, play a match, and exit the game. Do they need to find their facebook friends or see an activity feed? Probably not, so that would not be a golden path signal.
Score your features, and for each feature monitor if it is working (up or down) and the performance of that feature (usage, latency, etc). Monitor this from a user perspective. This is likely from the client itself. Complete this for all features, golden path first. These should equate to the symptom based metrics and alerts. Decide what an acceptable performance for that feature would be, set an alarm on it.
But what happens when a user login alert starts firing? This is where caused based alerts should kick in. If you have all the component/system/services covered, it should be clear what component is failing and needs to be corrected. Where it gets interesting is when the symptom based alert fires, and no corresponding component based alert fires. This usually tells you about a gap somewhere.
UI/UX and software quality
The above already mentions how much software quality is related to user perception. The UI/UX of software is probably the most important part of software quality. Whether that be a web browser, API, cli, or game client, this is going to be what the user interfaces with when they have problems. If you have used any software or website, you have probably seen a loading screen. Whether it’s Netflix, your bank login site or a video game, loading is a common operation. Imagine clicking a button and seeing nothing happen for several seconds or longer, versus a pinwheel spinning or a loading bar. The application with no loading indicator will have a lower perception of software quality, as long as the loading takes and meaningful amount of time.
UI/UX can paper over complicated issues, or when issues due occur, appease the user (like the twitter fail whale or GitHub unicorn). Or UI/UX can create software quality problems when no real issues are present.

