Datatab

Hierarchical cluster analysis

Sample data

A hierarchical cluster analysis is a clustering method that creates a hierarchical tree or dendrogram of the objects to be clustered.

Hierarchical cluster analysis dendrogram

The tree represents the relationships between objects and shows how objects are clustered at different levels.

Example Hierarchical Cluster Analysis

Example: We asked people about how many hours a week they spend on social media platforms and at the gym.

Hierarchical cluster analysis example data

We now want to know if there are clusters in this dataset and perform a Hierarchical Cluster Analysis.

How is a Hierarchical Cluster Analysis calculated?

First, we plot the points in a scatter plot.

Scatter plot Hierarchical Cluster Analysis

With this we can now start to create the clusters. In the first step we assign a cluster to each point. So we have as many clusters as we have persons.

each point a cluster

The goal now is: to merge more and more clusters little by little, until finally all points are in one cluster.

Calculate clusters Hierarchical cluster analysis

In each step, the clusters that are closest together are always merged. What does "closest together" mean?

For this we need to determine two things:

  • How the distance between two points is measured.
  • How points in a cluster are connected.

Distance between two points

Let's start with the question, how do we calculate the distance between two points? Here are the most known distances:

  • the Euclidean distance,
  • the Manhattan distance
  • and the Maximum Distance.

Let's take the distance between Max and Caro. The difference on the y-axis is 1 and the difference on the x-axis is 4.

Euclidean Distance

The Euclidean distance is the square root of the sum of the squared differences.

Euclidean distance

Manhattan Distance

The Manhattan distance uses the sum of the absolute differences. So we simply calculate 4 plus 1 and keep a distance of 5

Manhattan Distance

Maximum Distance

The maximum distance is simply the maximum value of the absolute differences. In this case it is 4.

Maximum Distance

Linking method

Now that we know what ways there are to calculate the distances between points, we need to determine how to link the points within a cluster.

Linking method Hierarchical cluster analysis

Let's say we have a cluster with the points Joe and Lisa and a cluster with Max and Caro. Now how do we determine the distance between these two clusters? Here are the most popular methods:

  • Single-linkage,
  • Complete-linkage
  • and Average-linkage.

Single-linkage

Single-linkage uses the distance between the closest elements in the cluster. This is the distance between Caro and Joe.

Single-linkage

Complete-linkage

Complete linkage uses the distance between the farthest elements in the cluster. So between Max and Joe.

Complete-linkage

Average-linkage

Average-linkage uses the average of all pairwise distances. From each combination the distance is calculated and from it the average.

Complete-linkage

Example Hierarchical Cluster Analysis

For our example we use the Euclidean distance and the single-linkage method. So now we need the distance from each cluster to the other clusters.

Distances between clusters

For this we first need to calculate the distance matrix. In the distance matrix we enter the clusters on both dimensions and then calculate the distances from each cluster to each other cluster.

Distance matrix

The distance between Alan and Lisa is given by:

Calculate distance matrix

We can now do this for all other combinations until we have calculated the total distance matrix. Now we can merge the first clusters. For this we look between which two clusters we have the smallest distance. This is the case between Joe and Lisa.

Example Hierarchical Cluster Analysis

With this, we now combine Joe and Lisa into one cluster. In our tree diagram or dendrogram we can draw the first connection.

First connection in the tree diagram

Now we need to update our distance matrix. We decided to use the single linkage method. So the distance between two clusters is given by the elements that are closest to each other. To the clusters Alan, Max and Caro, from the cluster Lisa and Joe respectively, Joe is always the closest person.

Merge hierarchical clusters

So we calculate the distance from Alan to Joe, the distance from Max to Joe, and the distance from Caro to Joe.

Now we again merge the clusters that are closest. These are Max and Alan.

Hierarchical Clustering Example

In our tree diagram or dendrogram, we can draw in the second connection.

Dendrogram Connection

Now we update the distance matrix again. We calculate the distance between Alan and Joe, Caro and Joe and between Caro and Alan. We get the smallest distance between the Caro cluster and the Lisa and Joe cluster.

Hierarchical clustering cluster merge

So we connect these two clusters and draw the third connection in the tree diagram.

Now there are only two clusters left, and we merge them in the last step. And we get our finished dendrogram.

Calculating a large cluster

Calculate hierarchical cluster analysis with DATAtab

Sample data

To calculate a hierarchical cluster analysis online, just visit the statistics calculator and copy your own data into the table or use the link to load the dataset. Now we click on cluster and select hierarchical cluster.

If we now click on Social Media and Gym a hierarchical cluster analysis will be calculated for us. Additionally we can specify the label, in our case the names of the persons.

Calculate hierarchical cluster analysis with DATAtab

Now we can specify which connection method should be used and how the distance should be calculated. We simply take Single linakge and the Euclidean distance again.

Calculate hierarchical cluster analysis online

Now we get the results output down here. We see the tree plot, a scatter plot and the elbow plot. In the elbow plot we can now read how many clusters we take. We can see a kink here, so we'll take 4 as the cluster count. We can still select these up here and then in the tree plot we get the 4 clusters highlighted by different colors. We see the first cluster, the second cluster, the third cluster and the fourth cluster.


Statistics made easy

  • many illustrative examples
  • ideal for exams and theses
  • statistics made easy on 276 pages
  • 3rd revised edition (July 2023)
  • Only 6.99 €
Free sample
Datatab

"Super simple written"

"It could not be simpler"

"So many helpful examples"

Cite DATAtab: DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net

Contact & Support FAQ & About Us Privacy Policy Statistics Software