Machine Learning Model Testing — How to automate

4 min readMay 5, 2021

In the previous article we reviewed the strategy to follow for testing and monitoring the machine learning models development lifecycle. In this article we will see how to apply an MLOps approach to this process, automating the maximum number of checks and actions described.

Source: The ML Test Score

To do so, we will review the different types of checks and how to implement it:

Data Tests

Although this type of testing could be considered within the scope of DataOps, it is important to consider it in the advanced analytics lifecycle.

Check that training input data has enough quality: Traditional approaches to data validation through schemas, or machine learning techniques can be used for automatic detection of data quality. For the latter case, in addition to enterprise tools, there are opensource libraries with this type of functionality like TensorFlow Data Validation, Great Expectations or Amazon Deequ
Training data must meet the defined privacy requirements: use libraries that detect automatically this type of data in your ETL procesess like piicatcher or use more advanced approaches like NLP. There is an interesting opensource python package from Microsoft called Presidio that uses several techniques to detect this type of information.
The code used to process the input data must also be analyzed and tested as it may also contain bugs. Actual code analysis tools have plugins to analyze errors, vulnerabilities and security of the most used languages in data analysis like python, R, Scala or SQL.

Model Development Tests

As in the data processing phase, it is also necessary to adequately test the model development phase:

All code used to develop the model has to be analyzed, tested and versioned: there are software engineering tools that are can analyze the training code to find vulnerabilities, poor quality code, possible performance problems, test coverage, etc. For example, SonarQube has available plugins for R and Python.
To check the evolution of the model, you can use tools like MLFlow to automatically register all the trains and validate the improvement in performance or accuracy of the model before productionalize it.
With such tools it’s possible to automatize the comparation of model results with simpler models to verify that more complex developments are more effective than simpler ones.
It is possible to verify the data distribution of your datasets, so that it can be validated the performance of the model in the whole range of data, for example using distribution tests.

ML Infrastructure

If it is possible to reproduce automatically the entire training process, almost all the ML Infrastructure tests can be automatized:

Tools like MLFlow give the possibility to validate the quality of the model before deploying it, for example setting minimum prediction thresholds and comparing with previous models.
Use progressive deployment strategies (Canary, A/B) to verify that the new version of the deployed model improves accuracy over the previous one or that has not performance problems, with tools like ELK.
With a CI/CD approach, where all de deployment cycle is automatized, it is possible to have automated rollback strategies to allow rollbacks to previous versions in case of production problems

Monitoring Tests

There are several monitoring tests that can be implemented in an automatized way too:

For every developed model, train a simpler model with the same data that can detect changes in the data pattern (for example, a new variable, a significant change in the distribution, etc.). It can have a big impact in the performance of the deployed model.
Automatically checks against the model catalog can be implemented to review that there aren’t models that have not been trained for a long time, to retrain them and review that they havn’t lose performance, that the training flows are still available, etc.
With tools like ELK mentioned in the previous point, it’s possible to monitor model performance in execution, in training times, bandwidth, etc.

AI Architecture

All these automatizations can be implemented with an AI Architecture, applying an MLOps approach. This type of architecture manages the entire analytics lifecycle, from training until deployment, providing the tools to manage and govern the machine learning models.

The main characteristics that such an architecture with an MLOps approach must have are:

Implement Software engineering best practices (code versioning, several environments, testing, etc.)
Get traceability through data, code and model
Automatize model training and deployment
Automatize training logging
Enable model versioning
Provide AutoML capabilities
Implement Functional monitoring

It is possible to implement this type of architecture using the standard DevOps toolset and new tools provided by the main cloud vendors (like Sagemaker, Databricks, AzureML or Google AI Platform).

Conclusions

In this article we have reviewed all the tasks of a machine learning models testing strategy with an automated approach. As we can see, there are tools in the market (opensource and cloud) to implement it.

In the next article, we will review in more detail how to implement this approach with an AI Architecture.