Vincent D. Warmerdam
Vincent D. Warmerdam is a software developer and senior data person. He’s currently works over at Explosion to work on data quality tools for developers. He’s also known for creating calmcode.io as well as a bunch of open source projects. You can check out his blog over at koaning.io to learn more about those.
Sessions
Want a dataset for ML? Internet says you should use ... active learning!
It's not a bad idea. When you're creating your own training data you typically want to focus on examples that can teach a machine learning algorithm the most. That's why active learning techniques typically fetch examples with the lowest confidence scores to annotate first. The thinking is that low confidence regions represent the areas where the algorithm might learn more than regions where the algorithm seems sure of itself.
Again, it's not a bad idea. But it's an approach that can be improved by rethinking some parts. Maybe it would be better for the human to understand the mistakes that the model makes and uses this information to actively teach the model on how to improve.
This talk is all about exploring this idea.