Artificial intelligence is widely discussed in CIO conversations down to IT resourcing meetings in companies today. There is a range of topics that enterprises are involved in from research to novel applications and enterprise product usage. The last year, the theme at major cloud conferences like Google I/O and re:Invent has centered on MLOps, which is the process of taking models created by data scientists into applying them in production to drive business impact. MLOps is still new, so I wanted to spend time diving into what it is and why it’s different. I brought in Alex Chung, a former Senior Product Manager for SageMaker at Amazon Web Services, Computer Vision Data Operations at Facebook, and previously a machine learning hedge fund manager.
Gary Drenik: Hi Alex, you have a unique background. Could you share with readers what was your start in MLOps?
Alex Chung: Absolutely. I’ve worked at a number of large tech companies, but my beginning was back in 2012. I launched a start-up hedge fund (The Simple Group) out of my dorm room using machine learning algorithms to buy emerging market bonds. I took the data science and engineering techniques I was learning in school, along with the management of finance funds I studied in the business school and tied them all together. What ended up happening was I built this hedge fund and the work was split between 70% data and real-time engineering infrastructure, 20% business operations, and 10% algorithms. It was fascinating because 70% of my startup’s work is what we call MLOps today. For the five years we invested, The Simple Group allocated tens of millions of dollars using MLOps systems that enabled low-cost investment speed and scale.
Drenik: Wow, you were really young to start building a hedge fund. Walk me through the MLOps piece a bit more, what was that 70% of the work that you did?
Chung: Thanks. We were lucky. We had investors from the owners of the Boston Celtics to the founder at Spark Capital and professors at Cornell who believed in us.
Sure, so let’s dive deeper into the definition of MLOps. When people describe MLOps, they make it sound like it is a single process. It’s not, and it’s important to understand the nuance. MLOps is actually eight smaller pieces that come together: data collection, data processing, feature engineering, data labeling, model design, model training, model optimization, and model deployment and monitoring. I have seen people call this the “big eight”.
MORE FOR YOU
Regardless of the organization, use case, or team maturity, if you want machine learning models in your business, you go through those eight steps in some shape. I’ll put this in context for The Simple Group and explain how the Big Eight apply.
Stage 1 is data collecting. For The Simple Group, we had to collect all the historical and real-time data on the bonds we wanted to invest in. We procured the market datasets with all the historical loan data available and repayments history. Stage 2 was to clean and process the data. There were loans that had incorrect repayment information, fraudulent borrower profiles, and corrupted loan profiles during the copying process. All of that had to be filtered out or fixed before we could feed into our model training process.
Once we had all the data we needed, we had to extract insights from it. That brings us to Stage 3, feature engineering. In this step, a data scientist takes the raw data that you have and creates metrics that are tied to the prediction. An example feature from the hundreds we had for our models at The Simple Group was the “number of previous loans a borrower” had. Intuitively, if a borrower had multiple loans before, it will dramatically increase a borrower’s ability to repay. That takes us to Stage 4, data labeling. A label is used to guide what the model should predict. In a hedge fund that buys bonds, the labeling process is straightforward. Did the borrower pay back in full and were the payments on time?
Drenik: I find it interesting that the first half of MLOps are all related to data. This is similar at my business Prosper Insights & Analytics, where the majority of our work is the collection of anonymous 1st party data from consumers via our monthly survey. Since the data is clean and factual, businesses in Financial Services, Marketing, e-Commerce, Retail, and CPG are licensing it for their modeling and forecasting.
Chung: Data quality and transformation is more than half of the work to achieve high-quality machine learning models. There are actually techniques like semi-supervised learning techniques to use low-quality data that is cheaper to acquire. However, even Andrew Ng recently put emphasis on data as critical to MLOps. Better data techniques should emerge as the industry makes improvements to the usability of developer tools.
Drenik: What is that developer journey in the modeling stages of MLOps?
Chung: Right, so that’s Stage 5 through 8.
Stage 5 is the model design. In this stage, the emphasis is on picking the mathematical architecture. There are two schools of thought here: classical vs deep learning. Classical models are statistical in nature, such as decision trees, clustering, and regressions. The deep learning model instead utilizes large matrix multiplication, which is more complicated. Most data scientists in enterprises still pick classical models for their use cases. That included us too. In our hedge fund, we used decision trees, we could generate 90%+ accuracy without bigger models found in deep learning.
Stage 6 is to train the model. This is its own step because creating the training infrastructure requires cloud-scale computation for large datasets. At The Simple Group, our benchmark for training was to find a model candidate that maximizes the performance of our hedge fund returns. After each training run was completed, we took that model weight and evaluated the performance in a market simulator to run historical backtests.
Stage 7 is to optimize the model. This stage is a feedback loop with Stage 6 to improve their model’s performance. There are two dimensions: how accurate is the model and how fast it runs. For instance, even within one architecture, you can adjust how big the model is and there usually is a curve to the accuracy. Selecting the right balance between accuracy, speed, and cost of training drives the final model used.
Stage 8 is deployment and monitoring. Many applications are sufficient with offline processing of the model on a schedule when there is new data. An example of this would be analyzing clients of a company for churn prediction. You run that analysis weekly or monthly. However, we needed to score the bonds that debuted on the markets. We had an application endpoint that would score the risk using the model, determine if the fund made an investment, and for what amount.
Gary: Thanks Alex, this was very informative hearing about your work in MLOps at other companies and the Machine Learning work you are doing at A.W. Chung (www.awchung.com). I am looking forward to talking with you in the future.