ClusterVariables.pdf

From Array Suite Wiki

Cluster Variables

Overview

The Cluster Variables module allows the user to Cluster variables using a number of clustering methods, including PAM, KMeans, CAST, and SOM. As opposed to other clustering functions (e.g. hierarchical clustering), which usually generate a HeatMap View from the -Omic data, this command will create a Cluster object in the Cluster section of the Solution Explorer. Using the View Controller, this Cluster can be used in a variety of views with Data objects.

Note: While there is no theoretical limit on the number of variables that can be clustered, some algorithms require more memory than others. An Out-of-Memory error may be generated when there is not enough available RAM in the computer to handle the number of selected variables.

To run this module, type MicroArray | Pattern | Cluster variables.

Cluster variable menu.png

Input Data Requirements

It works on -Omic data types.

General Options

ClusterVar0.png

Input/Output

  • Project & Data: The window includes a dropdown box to select the Project and Data object to be filtered.
  • Variables: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).
  • Observations: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).
  • Output name: The user can choose to name the output data object.


Options

  • Clustering method: Select the clustering method. Available Clustering methods are: PAM, KMeans, CAST and SOM
  • Search cluster#: For PAM and KMeans, the user can input a fixed number of clusters or a range of cluster numbers. Array Studio will use the sum of each variable's silhouettes score to determine the best number of clusters.
  • Search threshold: This option is only for CAST. CAST works directly on the microarray matrix and calculates the similarity matrix implicitly. The number of clusters is determined by the similarity threshold. The user can setup different thresholds to find the best one to cluster data.
  • Search layout: This option is only for SOM. SOM tries to transfer the high dimension data (when clustering variable, each data point is a variable with dimension n (the number of observations)) into a lower 2 dimension data, keeping a similar topology structure in the high dimension data. The layout option indicates the topology structure in the lower 2 dimension, e.g, 2*3 layout indicates the topology structure in 2 dimension is 2 rows with 3 columns, thus in total 6 clusters.
  • Repeat number: This option is only for KMeans. Due to the randomness of KMeans' initialization, the final output of this hill-climbing algorithm may only achieve a local optimization. Clustering is often repeated with random initial means and the most commonly occurring output means are chosen.
  • Max iteration#: This option is only for KMeans. It defines the max iteration of each KMeans.
  • Normalize variable: Checking this option will normalize the variables before performing clustering.
  • Generate design column: Checking this box will append the clustering result to the design table.


Output Results

This command will generate a table showing the silhouettes for all cluster numbers. This table is located in the "Pattern" folder under the "Table" section:

ClusterVar01.png

ClusterVar02.png

This gives the silhouettes information for each variable.

In the cluster part, user can also find the cluster information:

Cluster1.png

which can be used to Trellis variables in some data views such as profile view:

Cluster2.png


OmicScript

Cluster Variables

Related Articles

EnvelopeLarge2.png