Supervised classification is built on the backs of labeled datasets, but a good set of labels can be hard to find. Great data is everywhere, but the corresponding labels can sometimes be really tricky. Take a few examples we've already covered, like lie detection with an MRI machine (have to take pictures of someone's brain while they try to lie, not a trivial task) or automated image captioning (so many images! so many valid labels!)
In this epsiode, we'll dig into this topic in depth, talking about some of the standard ways to get a labeled dataset if your project requires labels and you don't already have them.
Link: Higgs Hunters