Static Machine Learning Models in a Dynamic World
If you work in academia, or in industry, you work on real-life problems. The data we work on are not synthetic, they are from the readings from the real world. While working in our offices, abstracted from the real-life events happening related to our projects, businesses, or challenges; we tend to lose the connection with the problems we are solving. It is easy to see the projects as numbers. Static numeric values, strings of characters, and meaningless IDs. We store them in binary, disconnected from the real world. In this setting, it is easy to forget that the real world is dynamic, and so are the data. The problem starts to arise when we neglect, or naively forget about this reality, and focus on our beloved KPIs. This story is about how our machine learning models stay static, while the data change day by day with the dynamic world.
After I had an incident in one of the previous companies I worked for, I started to become a little bit obsessive about the continuity of my models’ performance. In short, one of our models that was live and directly responsible for the output of a critical engine started to predict absurd results. The worst thing was that we did not discover something was wrong until we got angry phone calls from the client and spent long days to debug the entire pipeline before finding the root cause. The model was outputting extremely high or extremely low predictions. In retrospect, we found out that there were no faulty components in our stack. No bugs, no errors. The problem caused by the upstream data. The data that were consumed from upstream, the real-world readings, had changed.
Models are Built by the Snapshot of the World
If you are not employing some kind of online learning where models update themselves at regular intervals, you work on historic data. You consume some data from your data store and train your models. You constantly do this. You fetch the historic data, move it through the pipeline, get your model. The keyword here is historic. You work on data that are already old. You are building your solution for the future while looking at the past.
With your data, you actually develop your solution on the snapshot of the world. The raw data, the real world readings represent the snapshot. This snapshot can be weeks, months, or years. The longer the snapshot the better. Still, this gives you no guarantee that the future will follow the same behavior that was in your snapshot.
Models Fail When the World Changes
When you work with historic data, you sign a contract with your model. You promise your model that the properties of the data will not change. In exchange, the model gives you an accurate description of history. You rely on your KPIs. Celebrate when you exceed a threshold. Push your solution into production. While the same model is still up there, your contract is still binding.
The real-world changes constantly, and so do our upstream data. If you are not taking an additional step, your models will be ignorant of the changes. Your contract is still there. Following a change, when a major event happens, or slowly over time, your upstream data start to change. It may change so drastically that the upstream data may not resemble the historic data your model was trained on anymore at some point. At this point, you violate your contract, and your model starts to fail silently.
Continuity of the Performance
As the world change, and so does our data, we need to be sure that our model is still doing a good job. Even though we may not do anything to put the model on the right path, we need to know the performance. To make sure my models’ performances are still satisfactory, I tend to put the following checks. Mind that these checks are in the service of the model/solution you are serving.
1. Upstream Data Checks
For this one, you define which stage is upstream depending on your project. The idea is to observe the data at any stage before entering the model to make a prediction. While this is the case, probably the last stage of the data before they are fed into the model is a good stage.
The idea in this check is to make sure the upstream data still resembles the distribution, or follow the same pattern, of the historic data used while training the model. A simple but powerful way could be calculating the distribution of the upstream data at every fixed time interval, and comparing it with the historic data’s distribution. If the similarity between the two distributions is close, you are green. Otherwise, you raise an alert, and let the owner of the solution know.
2. Downstream Output Checks
The idea behind the downstream check is very similar to the upstream data checks. Instead of checking the upstream, in this one, you check the output of your model or your service. Like the upstream check, you monitor the similarity between the distribution of the outputs of your model in production and the output of your model when fed with historic data. Monitor the similarity between two distributions, raise an alert if it starts to drift away.
These two checks are simple to implement. You can put them in a real-time monitoring dashboard, or just send a mail to the owner of the repository when a threshold is exceeded. Like with any checks, it is good to know that they are there.
If you are curious about the end of the incident I mentioned at the start of this story, let me continue. The change came from the client-side. Our readings were dependent on a complex and long list of configurations of frameworks that were running on the client-side. It was later apparent that the client changed some major configurations. This change was so impactful to our readings, it resulted in major shifts in the upstream data. As a result, the new data points fed into the model were completely different than the data we used for training. Only if we had some checks on the data, we could be aware of that change and could take some actions to put things in the right direction. Luckily this incident did not impact the business much and everything was back to normal in a couple of days. This valuable lesson made me a little more paranoid about the static nature of our models, and if my solutions still work as I expected in the production.
Denne artikel var orginalt bragt på Medium.