Table HierarchicalClustering.pdf
Hierarchical Clustering
Overview
The Hierarchical Clustering Table menu option allows the user to perform hierarchical clustering on any “table” type of data, including calculating both the row tree and column tree. For more information on hierarchical clustering, see the Hierarchical clustering wikipedia entry.
Input Data Requirements
This function works on all Table objects.
Array Studio can easily handle (with a normal computer) Hierarchical Clustering of up to 10,000 variables. More than 10,000 variables may require a computer with greater memory, with an upper limit in Array Studio of 30,000 variables.
Step 1: Select source table
The user will first be asked to choose the table for which to run the Hierarchical Clustering:
Step 2: Hierarchical Clustering
The Hierarchical Clustering window is then presented:
The columns on which to perform the analysis are listed and can be highlighted as desired. The Rows on which to perform the analysis can also be selected (all, visible, selected, or a pre-generated List).
Options
- Compute Column tree: If selected, the clustering algorithm will cluster the observation tree.
- Link: Options include "Ward", "Single", "Complete", "Average", "Mcquitty", "Median", and "Centroid".
- Distance: Options include "Euclidean", "Maximum", "Manhattan", "Canberra", "Binary", "Pearson", and "Correlation".
- Compute Row tree: If selected, the clustering algorithm will cluster the variable tree.
- Link: Options include "Ward", "Single", "Complete", "Average", "Mcquitty", "Median", and "Centroid".
- Distance: Options include "Euclidean", "Maximum", "Manhattan", "Canberra", "Binary", "Pearson", and "Correlation".
- Normalize rows: For the variable tree, there is also an optional checkbox option for normalizing the variables.
- Replace missing values with: The user also has the option to replace missing values with the "RowMedian", "RowMean", "RowMin" or "Zero".
- Output table name: The new table can be optionally named using the "Output table name" box.
Output Results
After running the analysis, the module generates a dendrogram that the user can further interrogate and customize, as demonstrated below.
Example Usage
Given the following starting table:
This function will generate a HeatmapTableView with dendrogram, which will be added to the table object.