11-30, 10:55–11:25 (Europe/Amsterdam), Bohr
This talk describes the development and deployment of a reinforcement model to reduce food waste in a supermarket chain. Markdowns are dynamically increased through the day on short shelf-life products following a particular policy. The goal is to minimize food waste without significantly increasing the cost of the markdowns. The sequence of choices of markdown levels is modelled as a Markov Decision Process and offline Q-learning is used on historic data to learn a policy. The talk introduces the context of the problem, how a reinforcement model was applied, and the challenges faced with offline and online evaluation.
Food waste is a large global problem that has a huge impact on our environment. Currently, about one-third of the world’s food goes to waste. In this talk we focus on an initiative to reduce food waste in a chain of supermarkets. The initiative applies increasing markdowns to products nearing their expiry date. The objective is to minimize food waste without significantly increasing the cost of the markdowns. This involves making a sequence of decisions about what markdown level to set over time while maximizing a cumulative expected future reward based on markdown costs and products destroyed.
In this talk we show how we tackle this problem with reinforcement learning. We model the problem using a discrete state-space and action set. We learn the policy from historical data using offline Q-learning.
Additionally, we show how we evaluate new policies using off-policy evaluation techniques and describe the limitations of these evaluation methods.
Lastly, we address the experimental setup used to measure the policies’ true performance in production.
Agenda:
- 0-5 min: Introduction about food waste and dynamic markdown
- 5-25 min: reinforcement learning
- Implementation
- Features & state space
- Q-learning
- Off-policy evaluation
- Experiments
- 25-30 min: Conclusions and future work
- 30-35 min: Q&A
Target audience: anyone interested in the application of reinforcement learning, or technologies for food waste reduction. Familiarity with reinforcement learning will be useful.
Main take-away: practical information around training, evaluating and deploying a simple reinforcement learning model
Previous knowledge expected
Patrick de Oude is a Senior Data Scientist at Albert Heijn, a leading Dutch supermarket chain, where he focuses on operation data science projects with a specific emphasis on reducing food waste. He has a strong background in artificial intelligence and has used these skills to drive data-driven initiatives at Albert Heijn.
Patrick De Oude received his Master's degree from the University of Amsterdam in 2006 and his PhD in 2010. His academic background and industry experience has equipped him with a strong foundation in machine learning, with specific interest in causal inference, experimention, reinforcement learning, probabilistic modelling and inference, which he applies to real-world problems at Albert Heijn.
Patrick De Oude's work at Albert Heijn aligns with the company's broader sustainability strategy. Albert Heijn aims to eliminate at least half of the waste across the food chain by 2030. Patrick De Oude's work is a critical part of this effort, using data science and artificial intelligence to drive food waste reduction initiatives.