MLOps is a newly emerging field that can be a key discriminator in how efficiently you can push AI models from your Proof-of-Concept phases to Production and let those models create value for your organization, which in the end, is always the goal.. right? As the name suggests, MLOps has quite some overlap with DevOps. Today, we want to share with you our view on that overlap and how the key principles of DevOps can be applied to Machine Learning. This should help in understanding MLOps' best practices.
The term DevOps refers to the combination of software development and IT operations. These used to be the responsibility of completely separate teams: DevOps strives to bring these together within the same team. The goal that DevOps is trying to reach can be boiled down to these two items:
Improve quality.
Reduce lead times.
DevOps principles
While DevOps is not owned by anyone and not clearly defined as a set of practices, some important practices that are often considered part of DevOps are:
Shared ownership
Development and operations are no longer separate but go hand in hand from the start. There is no need for the developed product to be “handed over” (or “thrown over the wall”) back and forth between dev and ops teams: this reduces the time necessary to deploy the product and allows defects to be caught and fixed earlier. Everyone in the team knows how to deploy and operate the software.
Automation
The disadvantage of a manual process is that an error can occur at every step and that it is slower than the computer. By automating the testing, deployment, and monitoring, ... of the product, all of these processes can be sped up and executed more frequently at a lower cost and with a lower error rate.
Continuous feedback & improvement
There should be continuous and immediate feedback at every step of the software lifecycle. For example, during the development phase, a CI/CD system should test every change and deploy it to the relevant environment; during the operations phase, an automated monitoring system should observe the behavior of the system in production and trigger warnings if quality issues show up. When it comes to serving the end user, the same principle should be respected: the software should be shipped frequently to allow users to get new valuable functionality and give feedback on it.
Versioning & reproducibility
Software should be strictly versioned via a version control system like Git so that it is possible to quickly determine what version of the software users are running and what known defects are in that version, ... This also allows reproducibility: in “normal” software, you can then check out the exact version which has a defect in the version control system, and be certain that you are running the same code as the user who reported the defect. This cuts down on lost time trying to reproduce and fix defects.
These practices help each other
By automating the development and operations process (2), it's possible to iterate and get feedback more quickly (3). By versioning your software and making it reproducible, you make it possible for feedback to be more precise, which allows you to improve more quickly.
Applying DevOps principles to ML: MLOps!
In “normal” software development, many of these practices have become relatively standard, but when developing systems that are based on machine learning, they aren't so common. MLOps tries to bridge that gap. A central difference is that for “normal” software, the code is the artifact. For machine learning systems, the artifact is the code for training the models, the models themselves, the data you train them on and the code which runs the models. The added value of MLOps centers on this aspect. For each of the practices described above, here's how:
Shared ownership
Many teams have a strongly siloed approach, with data scientists working only on machine learning models and software engineers integrating the models they produce into final software products and deploying them. The MLOps version of this would be that data scientists are also capable of integrating their machine learning models into the final software, and delivering, deploying and operating it. (Note that this doesn't preclude the existence of more specialized profiles, the important thing is that no one “lives on an island”.)
Automation
The training and release of models are often performed manually on development machines. In contrast, the MLOps approach is to automate the training and release of new models, typically via a CI/CD system, on a powerful integration machine. The result is less time wasted training, reduced possibility of errors, and a more well-defined process for creating and releasing new models.
Continuous feedback and improvement
A lot of process waste can occur by shipping a “bare” manually built model, which has to be integrated and operated by a separate team. In contrast, immediately integrating the model in the released software, in combination with automation, allows much quicker feedback on new models. It also wastes less time on preventable integration issues, which frees up time for improving the model.
Versioning and reproducibility
Models and the data that was used to train them are often not versioned and certainly not reproducible. The vision of MLOps is to have versioning applied not only to the code for training the models and the code that runs the trained models but also to the models themselves and the data on which they were trained. This goes hand in hand with reproducibility: if I can trace back a model to a certain version of the training code and the data, I can rebuild that model (modulo non-deterministic aspects like random seeds for randomized learning algorithms).
So to conclude, this high-level overview shows how you can easily transition DevOps principles to MLOps and bring out the value of your AI models as soon as possible.
Want to stay up-to-date with our MLOps content? Register here!