Table HierarchicalClustering.pdf

From Array Suite Wiki

Hierarchical Clustering

Overview

The Hierarchical Clustering Table menu option allows the user to perform hierarchical clustering on any “table” type of data, including calculating both the row tree and column tree. For more information on hierarchical clustering, see the Hierarchical clustering wikipedia entry.

Table HierarchicalClustering Menu.png

Input Data Requirements

This function works on all Table objects.

Array Studio can easily handle (with a normal computer) Hierarchical Clustering of up to 10,000 variables.  More than 10,000 variables may require a computer with greater memory, with an upper limit in Array Studio of 30,000 variables. 


Step 1: Select source table

The user will first be asked to choose the table for which to run the Hierarchical Clustering:

SelectData0.png

Step 2: Hierarchical Clustering

The Hierarchical Clustering window is then presented:

THC1.png

The columns on which to perform the analysis are listed and can be highlighted as desired. The Rows on which to perform the analysis can also be selected (all, visible, selected, or a pre-generated List).

Options

  • Compute Column tree: If selected, the clustering algorithm will cluster the observation tree.
    • Link: Options include "Ward", "Single", "Complete", "Average", "Mcquitty", "Median", and "Centroid".
    • Distance: Options include "Euclidean", "Maximum", "Manhattan", "Canberra", "Binary", "Pearson", and "Correlation".
  • Compute Row tree: If selected, the clustering algorithm will cluster the variable tree.
    • Link: Options include "Ward", "Single", "Complete", "Average", "Mcquitty", "Median", and "Centroid".
    • Distance: Options include "Euclidean", "Maximum", "Manhattan", "Canberra", "Binary", "Pearson", and "Correlation".
    • Normalize rows: For the variable tree, there is also an optional checkbox option for normalizing the variables.
  • Replace missing values with: The user also has the option to replace missing values with the "RowMedian", "RowMean", "RowMin" or "Zero".
  • Output table name: The new table can be optionally named using the "Output table name" box.

Output Results

After running the analysis, the module generates a dendrogram that the user can further interrogate and customize, as demonstrated below.

Example Usage

Given the following starting table:

THC3.png

This function will generate a HeatmapTableView with dendrogram, which will be added to the table object.

THC2.png

Related Articles

EnvelopeLarge2.png