In traditional development projects, companies are more and more relying on the DevOps approach (in which development and operations are closely embedded from the start of the project) or even on the DevSecOps approache (aiming to design the security of the application and of the infrastructure from the start of the process).
ML and AI
However, with the emergence of the machine learning (ML) and artificial intelligence (AI), companies are considering to better exploit the data they collect, preferably in real time. These data are always becoming bigger and emerging in an unstructured form, which requires new analytical approaches based on algorithms defined by ‘data scientists’. Nevertheless, the traditional methodologies turn out to be rather unsuitable for such data projects.
Indeed, the data come from various sources such as databases, flat files, data lakes, data warehouses, messaging systems, and more. Before any processing, it is necessary to prepare these data (by means of techniques such as exploratory data analysis or features engineering) so that they can subsequently be processed by means of tools such as AI or ML. If this processing can be performed by data scientists, who understand the scheme and features of these data, it will also be possible to rely on algorithms allowing to automate several manual tasks, before proceeding to the design of algorithms able to process larger data volumes.
Subsequently, the automation of the process will allow to generate models which will be progressively evaluated and compared to other models, but also analyzed in terms of performance, so that they can be refined.
Finally, these models can be put into production in the intended environment. This deployment can be performed by means of a Rest API, by integrating the model on a terminal or smartphone, or by installing it on a ‘batch’ prediction system.
In order to ensure the maximum performance of the model, it must be continuously monitored, with an iteration process to constantly improve it.
Automation! Yes, but…
In an MLops strategy, several maturity levels can be defined. Level 0 consists of entrusting the construction of the models to specialists who will subsequently train and improve them throughout the iterations; the major disadvantages being the slowness of the process and the more limited adaptability of the model in case of new data. Maturity level 1 involves the automation of the training process, which allows to experiment more rapidly and to deploy models closer to reality and its evolution. Finally, at level 2, the entire process is automated without any human intervention, whether in terms of development, testing, supervision or production.
In other words: MLops is meant to be a central point for the deployment, supervision, management and governance of all machine learning models, regardless of how they have been created and where they are being deployed.
Although such an MLops process might seem idyllic, several challenges must be taken into account. It will, for instance, be necessary to bring together teams with very varied skills (data scientists and business experts in particular) and to encourage them to apply the defined methodology. Moreover, the use of machine learning imposes several constraints. The incoming data must be of high quality, while the generated predictions must be analyzed and must feed the iteration process (for the constant improvement of the models).
Other aspects shouldn’t be neglected either, e.g. the diversity of the languages and frameworks used, the importance of the experimentation stage (with a history of the data used and of the models implemented), the need to include in the tests not only the data, but also the software components, the duration and complexity of the process stages (even if automation will offer a solution in the long term) and the obligation to constantly monitor and adjust the models in order to improve their quality (and performance).
Step by step
To reach the objectives linked to MLops, there are several axes of intervention: the company culture (which requires the acceptance of new ways of working and of collaboration between teams from different backgrounds), the technique (automation, tests, monitoring tools), the organizational aspect (by associating data scientists with field experts) and governance (especially if the data are considered sensitive, in particular within the framework of compliance with GDPR). Finally, we need use cases to experience the limited, but tangible benefits of the technology in the short term.
Aprico Consultants is a leading consultancy company guiding your ICT strategy and transformation in order to stimulate the performance, productivity and competitiveness of your organization. We combine cutting-edge expertise with a perfect understanding of the context and of the customer experience, as well as an end-to-end approach in all sectors; from consultancy to solution deployment.