HierarchicalClustering.pdf
Hierarchical Clustering
Overview
The Hierarchical Clustering module performs hierarchical clustering on an -Omic data object's observations and/or variables. Array Studio can easily handle (with a normal computer) Hierarchical Clustering of up to 20000 variables. More than 10000 variables require a computer with greater memory, with an upper limit in Array Studio of 30000 observations. This will generate a HeatmapTableView with dendrogram that will be added to the Data object.
To run this module, type MicroArray | Pattern | Hierarchical Clustering.
Input Data Requirements
This module works on -Omic data types.
General Options
Input/Output
- Project & Data: The window includes a dropdown box to select the Project and Data object to be filtered.
- Variables: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).
- Observations: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).
- Output name: The user can choose to name the output data object.
Options
- Compute observation tree: If selected, the clustering algorithm will cluster the observation tree.
- Link: The user can select a method to specify the dissimilarity of sets as a function of the pairwise distances of observations in the sets.
- Distance:How to measure dissimilarity between pairs of observations.
- Compute variable tree: If selected, the clustering algorithm will cluster the variable tree.
- Link: The user can select a method to specify the dissimilarity of sets as a function of the pairwise distances of variables in the sets.
- Distance: How to measure of dissimilarity between pairs of variables.
- Normalize variables: If selected, a Z-score calculation is done (except if the Distance method is Correlation, where this wouldn't apply).
- Note: The "Normalize variables" option here is used for dendrogram calculation only. The resulting heatmap will automatically use a "Robust Center Scale” normalization for display. This is necessary as most datasets require normalization for heatmap visualization.
- Generate classic dendrogram view: Checking this box will generate the dendrogram view using the old Array Studio format. This is for users familiar with the older option, that do not want to use the new and improved dendrogram view. Some users prefer the classic dendrogram, because it gives different options for exporting the pictures, however has less interactivity.
- replace missing values with: Methods to replace missing values. Available options are RowMedian (The Median of the variable) and Zero.
- Heatmap normalization: The user can specify a way to normalize data to draw the heatmap. The calculation of distance and linkage are based on the normalized data.
- Observation Grouping: The user can specify the Grouping factor for the clustering. Once selected, the observation level clustering is only calculated within each group for the specified grouping factor.
- Only the Classic Dendrogram is compatible with the Observation Grouping option.
Output Results
The generated view operates similar to HeatmapTableView, except that the created dendrograms can be selected in the left hand window. Both the observation and variable dendrograms are selectable, and the right-hand window will be automatically update to reflect the user's selection. An example of a generated Dendrogram is shown below.
This view includes options to flip the x or y-branches.
Dendrogram Formatting
OmicScript