A task-oriented approach to Machine Learning allows us to divide it into three categories: supervised learning (data are labelled according to some criteria), unsupervised learning (no labels exist for the data), and reinforcement learning (no data is directly available, only an interactive environment). In this article we will focus on unsupervised learning.
In unsupervised machine learning, the term ‘unsupervised’ comes from the fact that the machine must detect patterns within the data on its own (e.g.: figure out which data should be grouped together), without receiving any feedback. This is needed when the data is not labelled. Unlabelled data has been steadily growing in volume and importance, mainly due to generalized use of social media, smart devices and IoT networks, which in turn leads to an increasing need to perform unsupervised learning in order to extract value from the data. Since the output of unsupervised learning models per se is not useful for many applications (no predictions are produced and therefore metrics such as accuracy cannot be computed), they are typically used as an intermediate step for generating data labels, which are then used in a semi-supervised learning setting.
There are three main tasks associated with unsupervised learning – clustering, discovering association rules and performing dimensionality reduction:
Clustering: Clustering can be interpreted as performing classification when the categories to predict are unknown (both which ones and how many of them). A clustering algorithm attempts to group together (in a cluster) objects that are more similar to each other than to objects in other groups. The meaning of ‘similar’ will depend on the data features used to represent the objects and the relative importance between them (the distance function). For instance, splitting photos of animals by species or splitting photos of dogs by breed may require different feature setups. Besides detecting which objects are similar to each other, clustering may also be used to identify objects that are different from all others, thus detecting outliers.
Examples of real-life use-cases where clustering is beneficial include customer segmentation, data labelling and anomaly detection.
Association Rules: Association rules learning attempts to discover relevant patterns or relationships in groups of objects where the order of these objects is not important. This is a common situation in retail, where there is a very large number of products and services available and subsets of them are often bought together. Some of the purchase relationships discovered may be intuitive (e.g.: milk + bread), but others not so much (e.g.: beer + diapers). Association rules are known for their good scalability, producing outputs that are useful for obtaining business insights and as support for decision-making processes.
Examples of real-life use-cases where discovering association rules is beneficial include product placement and shopping basket analysis.
Dimensionality Reduction: Dimensionality reduction can be interpreted as a form of compression because it transforms the input features into a smaller amount of output features that, nevertheless, are intended to represent the same information. There are two approaches for performing this: 1) feature selection, which retrieves the subset of the most relevant input features (there is no feature transformation), and 2) feature projection, which maps the input features into a lower-dimensional space whose axes are better aligned with the intrinsic dimensions of the data (the original features no longer exist, being replaced by new ones). The choice between performing feature selection or feature projection usually lies on the compromise between having output features that allow for better business-wise interpretability or output features that make it easier for Machine Learning algorithms to detect relevant patterns, respectively.
Examples of real-life use-cases where performing dimensionality reduction is beneficial include recommendation systems, topic modelling and data visualisation.
José Portêlo
Lead Machine Learning Engineer