May 31, 2019 - 1702 views|
Here's how organizations can reduce the friction data scientists face in developing and deploying models into production – and realize value more quickly.
A bank turned to one of its data scientists to predict the risk of customers missing their next collections payment, with the goal of reducing bad debt. The data scientist was fortunate enough to have access to a comprehensive data mart containing data across many dimensions – including customers, product holdings, transactions and collections activity.
He started by building a snapshot of every customer at a certain stage in collections and an indicator of whether they made a payment in the next month. He added many features (curated variables used in machine learning algorithms) to the dataset, building code to combine, aggregate and transform the raw data to be ready for model development.
Once the dataset was complete, the data scientist tried multiple machine learning algorithms on the data, using a holdout set to check the predictive power of each resulting model. He decided on the best-performing model and created the necessary deployment code.
To prepare for deployment, he worked with a data engineer to rewrite his chosen features for the model into a production data pipeline. He had to drop a few features in the model and rebuild it when they realized some data wouldn’t be available in the batch production process.
Finally, with the model ready for production, he worked with his business stakeholders in collections on a suitable action plan to use the model outputs. His colleagues were impressed by how well the model could predict customers who would make or miss their next payment. One asked him, “What should we do with customers who are very likely to make their next payment? Do we need to call them at all? And what about customers with a low probability of making their next payment? Should we call them or are they a lost cause?”
The data scientist wasn’t sure how to answer these questions – so they agreed on some simple rules to apply on top of the model and proceeded to change the business process. Overall, the end-to-end process took a few months, and the data scientist then moved onto his next project.
Reducing Friction for Data Scientists
I’m sure the above story sounds like a typical model development and deployment project in many organizations today. What may surprise you is that the data scientist was me – almost two decades ago!
Despite the progress made in machine learning and AI since that time, data scientists still face the same challenges in developing and deploying models into production – even while the number of use cases for machine learning across organizations grows exponentially and talent is in short supply. Only by reducing the friction that data scientists face will organizations realize more business value in a shorter period of time.
The story highlights three key areas of friction in a typical machine learning project:
We are quite lucky to live in an era when AI can truly change the world around us. But the only way to enable this exciting future is by reducing the friction on the development and deployment of AI across the organization. Now is the time to help data scientists scale AI and transform businesses.
To learn more on this topic, visit our website.