October 29, 2020
Two years ago we first announced the launch of Feast, an open source feature store for machine learning. Feast is an operational data system that solves some of the key challenges that ML teams encounter while productionizing machine learning systems.
Recognizing that ML and Feast have advanced since we launched, we take a moment today to discuss the past, present and future of Feast. We consider the more significant lessons we learned while building Feast, where we see the project heading, and why teams should consider adopting Feast as part of their operational ML stacks.
Feast was developed to address the challenges faced while productionizing data for machine learning. In our original Google Cloud article, we highlighted some of these challenges, namely:
Whereas an industry to solve data transformations and data-quality problems already existed, our focus for shaping Feast was to overcome operational ML hurdles that exist between data science and ML engineering. Toward that end, our initial aim was to provide:
Guided by this design, we co-developed and shipped Feast with our friends over at Google. We then open sourced the project in early 2019, and have since been running Feast in production and at scale. In our follow up blog post, Bridging ML Models and Data, we touched on the impact Feast has had at companies like Gojek.
Teams, large and small, are increasingly searching for ways to simplify the productionization and maintenance of their ML systems at scale. Since open sourcing Feast, we’ve seen both the demand for these tools and the activity around this project soar. Working alongside our open source community, we’ve released key pieces of our stack throughout the last year, and steadily expanded Feast into a robust feature store. Highlights include:
Feast has grown more rapidly than initially anticipated, with multiple large companies, including Agoda, Gojek, Farfetch, Postmates, and Zulily adopting and/or contributing to the project. We’ve also been working closely with other open source teams, and we are excited to share that Feast is now a component in Kubeflow. Over the coming months we will be enhancing this integration, making it easier for users to deploy Feast and Kubeflow together.
Through frequent engagement with our community and by way of running Feast in production ourselves, we’ve learned critical lessons:
Feast requires too much infrastructure: Requiring users provision a large system is a big ask. A minimal Feast deployment requires Kafka, Zookeeper, Postgres, Redis, and multiple Feast services.
Feast lacks composability: Requiring all infrastructural components be present in order to have a functional system removes all modularity.
Ingestion is too complex: Incorporating a Kafka-based stream-first ingestion layer trivializes data consistency across stores, but the complete ingestion flow from source to sink can still mysteriously fail at multiple points.
Our technology choices hinder generalization: Leveraging technologies like BigQuery, Apache Beam on Dataflow, and Apache Kafka has allowed us to move faster in delivering functionality. However, these technologies now impede our ability to generalize to other clouds or deployment environments.
“Always in motion is the future.”
- Yoda, The Empire Strikes Back
While feature stores have already become essential systems at large technology companies, we believe their widespread adoption will begin in 2021. We also foresee the release of multiple managed feature stores over the next year, as vendors seek to enter the burgeoning operational ML market.
As we’ve discussed, feature stores serve both offline and production ML needs, and therefore are primarily built by engineers for engineers. What we need, however, is a feature store that's purpose-built for data-science workflows. Feast will move away from an infrastructure-centric approach toward a more localized experience that does just this: builds on teams’ existing data-science workflows.
The lessons we’ve learned during the preceding two years have crystallized a vision for what Feast should become: a light-weight modular feature store. One that’s easy to pick up, adds value to teams large and small, and can be progressively applied to production use cases that span multiple teams, projects, and cloud-environments. We aim to reach this by applying the following design principles:
1. Python-first: First-class support for running a minimal version of Feast entirely from a notebook, with all infrastructural dependencies becoming optional enhancements.
2. Production-ready: A collection of battle-tested components built for production.
3. Composability: Modular components with clear extension, integration, and upgrade points that allow for high composability.
4. Cloud-agnostic: Removal of all hard coupling to cloud-specific services, and inclusion of portable technologies like Apache Spark for data processing and Parquet for offline storage.
Our vision for Feast is not only ambitious, but actionable. Our next release, Feast 0.8, is the product of collaborating with both our open source community and our friends over at Tecton.
We’ve been inspired by the soaring community interest in and contributions to Feast. If you’re curious to learn more about our mission to build a best-in-class feature store, or are looking to build your own: Check out our resources, say hello, and get involved!