ClusterObservations.pdf

From Array Suite Wiki

Cluster Observations

Overview

The Cluster Observations module will allow the user to Cluster Observations using a number of clustering methods, including PAM, KMeans, CAST, and SOM. As opposed to other clustering functions, which usually generate a HeatMap view for the Omic data, this command will create a Cluster object in the Cluster section of the Solution Explorer. Using the View Controller, this Cluster can be used to trellis a variety of Views.

Note: While there is no theoretical limit on the number of variables that can be clustered, some algorithms require more memory than others. An Out-of-Memory error may be generated when there is not enough available RAM in the computer to handle the number of selected variables.

To run this module, type MicroArray | Pattern | Cluster Observations.

Cluster observation menu.png

Input Data Requirements

This module works on -Omic data types.

General Options

ClustObser0.png

Input/Output

  • Project & Data: The window includes a dropdown box to select the Project and Data object to be filtered.
  • Variables: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).
  • Observations: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).
  • Output name: The user can choose to name the output data object.


Options

  • Clustering method: Select the clustering method. Available Clustering methods are: PAM, KMeans, CAST and SOM
  • Search cluster#: For PAM and KMeans, the user can input fixed number of clusters, or a range of cluster numbers. Array Studio will use the sum of each observation's silhouettes score to determine the best number of clusters.
  • Search threshold: This option is only for CAST. CAST works directly on the microarray matrix and calculates the similarity matrix implicitly. The number of clusters is determined by the similarity threshold. The user can setup different threshold to find the best one to cluster data.
  • Search layout: This option is only for SOM. SOM tries to transfer the high dimension data (when clustering observation, each data point is an observation with dimension p (the number of variables)) into a lower 2 dimension data, keeping a similar topology structure in the high dimension data. The layout option indicates the topology structure in the lower 2 dimension, e.g, 2*3 layout indicates that the topology structure in 2 dimension is 2 rows with 3 columns, thus in total 6 clusters.
  • Repeat number: This option is only for KMeans. Due to the randomness of KMeans' initialization, the final output of this hill-climbing algorithm may only achieve a local optimization. Hence the clustering is often repeated with random initial means, and the most commonly occurring output means are chosen.
  • Max iteration#: This option is only for KMeans. It defines the max iteration number of each KMeans.
  • Normalize observation: Checking this option will normalize the observation before performing clustering.
  • Generate design column: Checking this box will append the clustering result to the design table.

Output Results

The command will generate a table showing the silhouettes for all cluster numbers. This table is located in the "Pattern" folder under the "Table" section:

ClustObser2.png

ClustObser3.png

This gives the silhouettes information for each observation.

In the Cluster section, the user can also find the cluster information:

Cluster1.png

which can be used to Trellis observations in some data Views, such as profile View:

Cluster2.png


OmicScript

Cluster Observations


Related Articles

EnvelopeLarge2.png