Clustering Models Clause Samples
Clustering Models. Cluster analysis or clustering is the task of grouping a set of objects so that those in the same group, or cluster, are more similar to each other than to those in other groups. The literature on cluster analysis is very wide, and includes hierarchical clustering, centroid-based models, distribution-based models, density models, among many others. Centroid-based models are of particular interest for us, because they are especially useful for numerical, multi-dimensional objects such as those that arise with chemical-physical data. The concept of centroid is essential in the most well-known centroid-based clustering algorithm, that is, k-means [80]: given a group of objects and a notion of distance, its centroid is the set of values that describes an object C (which may or may not be a concrete object of the group) such that the geometric mean of the distances between C and every other element of the group is minimal. In the k-means algorithm, the groups (and even their number) are not known beforehand (this type of cluster analysis is called exploratory), and the algorithm is based on an initial random guess of the centroid that eventually converges to a local optimum. KNN [81] is a distance-based classification algorithm, whose main idea is that close-by objects can be classified in a similar way.
