In the previous chapters, you learned about supervised learning. In those scenarios, we had datasets where each example came with a known answer or "label". For instance, we had house features and their prices (for regression) or emails and whether they were spam or not (for classification). The machine learning model's job was to learn a mapping from the input features to the correct output label.
Now, we step into a different part of machine learning: unsupervised learning. What happens when you have data, but no predefined labels or correct answers to go with it? Imagine being given a large collection of customer information but without any existing categories like "high-value" or "likely-to-churn". Or perhaps you have thousands of news articles but no predefined topics assigned to them. This is where unsupervised learning comes in.
Unsupervised learning algorithms work with unlabeled data. Their goal is not to predict a specific output based on past examples, but rather to find interesting structures, patterns, relationships, or groupings within the input data itself. Think of it as letting the algorithm explore the data and tell you what it finds interesting.
Instead of learning a mapping from features X to labels Y like in supervised learning, unsupervised learning algorithms try to learn something about the inherent structure of the data X directly. Common goals include:
Imagine you're given a large box containing many different types of buttons, all mixed together. You don't have labels telling you what type each button is. In an unsupervised approach, you might start sorting them based on observable characteristics:
You are discovering the underlying structure (groups based on color, size, or holes) without any prior labels telling you how they should be grouped. This is the essence of unsupervised learning, particularly clustering.
Unsupervised learning is a valuable tool in several situations:
In this chapter, we'll concentrate on clustering, a fundamental unsupervised task. You'll learn about K-Means, a popular algorithm used to automatically group your data points into a specified number of clusters.
© 2025 ApX Machine Learning