A wave of obsession for all-things machine learning (ML) has washed over the technology and business communities — and society more broadly — in the last several years, and understandably so; machine learning-enabled products and services can present myriad benefits to an organization — not least the ability to harness large swaths of data to make previously tedious tasks more easy and efficient.

Having a solid foundation for real-world ML is a major determinant of success for new initiatives, and is an exciting area of research and engineering in its own right, but the implementation of ML can even be challenging for organizations with mature engineering strength, and it goes without saying that there can be pitfalls and misconceptions in attempts to make the jump between machine learning research and ML in production environments. A frequently overshadowed and often under-appreciated aspect of getting it right is the infrastructure that enables robust, well-managed research and serves customers in production applications.

A key lever in setting the foundation for a successful ML program is building a culture and an atmosphere that allows you to trial these efforts at scale: accelerating the rate of scientific experimentation on the road to production and, ultimately, to business value. The cloud is an integral part of these efforts, and it can enable teams to develop and deploy well-governed, accurate ML models to high-volume production environments. Beyond production deployments, a solid infrastructure paves the way for large-scale testing of models and frameworks, allows for greater exploration of the interactions of deep learning tools, and enables teams to rapidly onboard new developers and ensure that future model changes do not have masked effects.

Here, I’ll outline some tactical and procedural guidelines for setting the foundation to bring effective machine learning to production across your enterprise through automated model integration/deployment (MI/MD).


High Level Challenges & Production ML Concerns

Machine learning can be complex enough in production environments, and only becomes more so when considering the necessity of addressing adversarial learning (a subfield of ML exploring its applications under hostile conditions) such as cybersecurity and money laundering. Adversarial attacks — from causative to exploratory — encourage your model to change in response to carefully devised inputs, reducing efficacy.

In cybersecurity and other complex domains, decision boundaries often require robust context for human interpretation, and modern enterprises of any size generate far more data than humans can analyze. Even absent such adversarial concerns, user activity, network deployments, and the simple advances of technology cause data drift over time.

With this in mind, production ML concerns are almost universal. Data and model governance affect all models, and retraining is a fact of life, so automating the production process is key for sustainable performance.

Common production concerns that must be solved for when building an ML foundation include:

  • Model problems in production. Models need to be trained, updated, and deployed seamlessly, but issues can arise with disparate data sources, multiple model types in production (supervised/unsupervised), and multiple languages of implementation.
  • Temporal drift. Data changes over time.
  • Context loss. Model developers forget their reasoning over time.
  • Technical debt. Known to be an issue in production learning environments. ML models are difficult to fully understand by their creators, and this is even more difficult for employees who are not ML experts. Automating this process can minimize technical debt.

The ideal system can address these overarching ML production considerations while also serving common adversarial concerns, including:

  • Historical data and model training
  • Model monitoring and accuracy tracking over time
  • Ability to work with distributed training systems
  • Custom tests per model to validate accuracy
  • Deployment to production model servers


Model Management & Setting a Technical Foundation

While each organization differs, these are high-level considerations for effective model management:

  • Historical training data with fine-grained time controls
  • Distributed training functionality
  • Ability to support multiple languages
  • Robust testing and reporting support
  • Model accuracy must be understood easily
  • Model feature-set, methodology, and code tracking
  • Provenance of data and definitions for internal data definitions
  • Open Source tooling
  • Custom retrain and loss functions on a cron-like basis to refresh stale models
  • Minimal impact on model developers and dedicated ML engineers

On the technical side, several tools/processes will be critical in meeting these requirements:

  • A strong CI/CD server. For example, Jenkins has excellent support, build, reporting, and command plugins for virtually all use cases, and its distributed functionality can be a future benefit.
  • Flexible platform for cloud service deployment. AWS’s EC2, S3, and EMR are good examples.
  • Git integration. This is important when generating code is tagged against specific versions for production release artifacts.
  • Model accuracy. Submit accuracy and test results to an external server, such as GRPC.
  • Integration. Integrate model serving layer into streaming applications.


The Benefits & Practice of a Solid ML Foundation

Once the technical components are in place, it’s critical to ensure the proper protocols and practices are established to continue reaping the benefits of a well-designed ML foundation.

One area is model governance. This covers everything from ethical concerns to regulatory requirements. You should aim to make the governance process go as smoothly as possible. Similarly, historical tracking is another key component here, and helps assuage temporal drift. Model tracking over time is difficult, and requires fine-grained temporal data; a distributed model logging framework, such as an internal tool we built called Rubicon, can help keep track of your model training, testing, and deployment history.

With historical tracking, retrain and loss thresholds are user-provided, and are used to automatically refresh models over time. In turn, this leads to more seamless model reproducibility — the immediate ability to generate historical models for validation against current data conditions — and a strong understanding of where drift has occurred along with the areas it has affected. Furthermore, practicing journaled knowledge retention mitigates context loss (how many of us have returned to even simple projects and asked “what is going on here?!”), and ensures that even though models are being retrained and published automatically based on time, changes to the underlying code and simple updates are easily identified.

The burden on model developers here is three-fold:

  • Explicit test creation is required. This requires configurations for variables like the time period of data for training and hyper-parameter selection. It’s not intended to stop human error, but is an amortized cost over time.
  • Success/Accuracy definitions must be defined in advance. This is a matter of range for model variance over time, defined as a compromise between business requirements and technical limitations.
  • Knowledge of the language of model implementation required. While this is a technical hurdle, it allows for very permissive test definitions.

Ultimately, to democratize machine learning and fully harness its potential, organizations must be able to make experiments repeatable and automatically verified. Setting up this foundational environment is what allows us to develop brilliant algorithms that deliver value at scale at Capital One, and I hope it can be used as a guide for your organization, too.