Useless Machine Learning
TL; DR — Many ML applications are useful in theories, but they become useless in practice. Let’s look at a real case study, highlight two types of biases, and how can understanding Moments of Trust can help with a concrete example. Finally, I leave you with 3 small actions to consider.
My inner voice: “Is she having an affair? She must be. F*ck me!” You can’t blame me for having such thought when your beautiful wife suddenly spent so many nights away from home — she never had to work so much.
So, I asked: “What’s going on? How come you seem so busy lately? Is everything okay?”
Jess: “I don’t know why you guys keep making these useless AI things, again! They keep making us do more work.” Well, I totally didn’t see that coming. And ouch. As a loving husband, the right thing to do was to give her a hug and take on additional housework (yes, I did that.)
More importantly, I needed to listen — well, for the sake of satisfying my selfish curiosity, the pursuit to better my own work, and the advancement of our ML community.
So, I probed: “Tell me more. What happened? Why do you say that?”
Two Types of Biases & A Different Question
Jess continued: “We got this new Next Best Action app [an ML application]. Every week, it gives us a list of customers to call and product to offer. It’s so damn useless. I look at the list; I work with most of the customers; I know they won’t buy the product … it’s a waste of time, but we still have to do it to make our KPI look good.”
Jess just described two unique problems that every ML application faces. Let’s unpack them:
1. Machine Bias. The machine is wrong; we, humans, are right. Machine Learning models aren’t and can’t be perfect — even we hope and perceive them to be. Humans see lots of data as we go about our day-to-day work, and subsequently, internalize it as intuition and knowledge; but such data may not be collected or represented properly in databases. As a result, the machine can’t analyze fully and make better predictions with it.
2. Human Bias. The machine is right; we are wrong. Humans subject to Overconfidence Bias. In other words, we usually overestimate the accuracy of our predictions, especially in our domain of expertise. Perhaps, the machine actually knows more and up-to-date information than we do; so it produces better predictions. But we trust ourselves more than the machine.
Many of us in the ML community may immediately ask: how can we fix the machine (e.g. the models)? Companies are hiring better data science talents, putting more rigorous ML Ops practices in place, and upgrading to better tools. But, doing these only solves the machine bias problem. A chain is as strong as the weakest link: the users of ML applications.
We can’t eliminate innate human biases, but we can design for trust. With trust, users are more open to collaborating with machines. So, How might we foster trust between users and ML applications? — this is the question we should ask instead.
Let’s walk through a specific example and discuss how we can design for trust with Moment of Trust analysis and simple UX techniques.
Moments of Trust
Caveat: the example below is inspired by Jess’s work as a frontline who need to service clients in person. Although it focuses on one use case, the principles apply to many processes where humans need to take information from ML apps and decide how to act.
To get a clearer and more specific picture of the problems, first, we should take a look at the Moments of Trust — the split seconds when humans need to make a judgement call based on their trust in the machines. We can highlight these moments by using a User Journey analysis.
Imagine when you go to a branch of a bank and meet with an advisor, here is a typical journey from the lens of advisor — the user of an ML application:
Understanding the Moments of Trust helps us to find the specific pressure points when users might struggle with predictions from ML systems. Let’s unpack the problems of each Moment of Trust reveals.
Deeper Problems & Solutions
Moment of Trust #1: Let’s look at Step 4. This is the moment when the users see the predictions, understand what the predictions mean, and decide if they should and how to act.
Problems: Users don’t act upon the predictions. Typically, users see a simple description of what to offer or to do (see illustration on the left). When users need to decide if they want to follow, either they take a blind faith or lean towards their judgements of the moment. We tend to choose the latter given our Overconfidence Bias.
In other words, many predictions — the ones that Data Scientists spend hearts and souls creating — never really “go into the market”; they just flash on the users’ screens, then disappear.
This leads to two additional problems. First, user-machine interaction will be captured as “failed” in the database for the wrong reason. The machine thinks the interaction failed because of mathematical mistakes; but in reality, it failed because users didn’t act or did it properly. Second, when the machines learn, they learn the wrong reality. This is known as the feedback loop problem (we will discuss below). As a result, it creates vicious loops of producing non-sensible predictions and ultimately ruins the trust of users.
These are not prediction problems. They are UX problems.
Goals: The main goals for this step should be two folds: 1) increase the percentage of action taken and 2) improve the quality of the action.
Solutions: With this in mind, I propose to replace out-of-context suggestions with a contextual prompt. The contextual prompt should highlight the “why”, “what”, and “how” of executing an ML prediction with human-readable language (illustration on the left). Depends on the type of algorithm, there are various ways to translate machine decisions into human-readable language.
Problems: Users can’t provide good feedback. Typically, this interface allows users to track their progress and coordinate with other members; however, it’s often not designed to capture feedback for the ML apps (the users are also not incentivized to do so). As mentioned previously, a broken or poor feedback loop creates problems for ML models to learn from mistakes. Just like humans, good feedback allows us to learn better and faster.
Goals: At this step, the main goal is to encourage users to share better feedback, for themselves and the machines.
Solutions: The simplest solution is to use UI elements that are familiar to most people; these UI elements can help to standardize inputs and shorten the time to input information.
In addition to offering a UX for capturing feedback, the ML app can also share how it uses the feedback to improve future predictions. This can create a sense of participation and reward for the users.
Note: all screenshots are cleansed for confidentiality. They represent common elements from organizations with similar applications.
One Step At A Time
In fact, the “useless ML experience” is a recurring theme. Jess and I experienced a similar issue on a different occasion. As I described in The Last Mile Problems of AI, it comes down to a disconnection in human-AI collaboration. Solving the problem requires an integrated approach to address issues on three fronts.
What happens if we don’t solve this problem? Everyone will be upset if we don’t solve the trust problem. Users like Jess will keep complaining about how useless “AI things” are and not benefit from amazing technologies because of self-biases; data scientists may lose their jobs because the real results of the models will never match the wishful estimations; businesses will never actualize the promise values of AI; most importantly, I will have to keep listening to Jess, my friends in data science, and my clients complaining about each other.
There are many steps we need to take to build useful ML applications. Designing for Trust is a great, and simple, first step. Take action now.
- identify a user-facing ML use case at your company
- incorporate the User Journey and Moment of Trust analysis into your ML design workflow
- host a collaborative design session based on the Moment of Trust with representatives from data science, engineering, and user group (e.g. frontline).
Many ML apps are deemed to be useless by users, not because of the accuracy of predictions, but because of human biases. It’s our responsibility to design for trust as designers and developers of ML applications.
The solutions often are simple. By identifying the Moment of Trust, we can design effective UX to provide more contextual predictions and close the feedback loop that enables continuous improvement.
In reality, we, the humans, and machines are both imperfect. Designing for trust offers something beyond quick fixes — it creates a bridge for humans and machines to calibrate our trust and improve together.