The Pairwise Correlation command will calculate pairwise correlations for observations in an experiment. A pairwise correlation matrix will be generated in the Solution Explorer under the Summary folder in the –OMIC data section. This module is used for QC purposes, for determining how well each sample correlates with other samples in the experiment, and other experiments in each experimental group.
To run this module, type MicroArray | Summarize | Pairwise Correlation.
Input Data Requirements
It works on -Omic data types.
- For normalized data, users can apply Pearson method to calculate correlation and the corresponding p value.
- For non-normalized data, users can transform them to get normalized data, e.g, for gene expression FPKM data users can use upper Quantile + log2 transformation to normalize data. Then users can apply Pearson method to get correlation. Or, users can simply apply a non-parametric method such as Kendall or Spearman to calculate correlation.
- Project & Data: The window includes a dropdown box to select the Project and Data object to be filtered.
- Variables: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).
- Observations: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).
- Output name: The user can choose to name the output data object.
- By Observation/Variable: users can select to run the correlation between each Observation pair (calculating the correlation of genes) or between Variables (calculating the correlation of all the observations).
- Group: There is an option to use the Group by drop-down box to choose a column (from the Design Table if users choose By Observation and from Annotation Table if users choose By Variable) to group the correlations. For instance, if the experiment contains a Time and Treatment column, the user may be interested in using this column as a group by which to see correlation data. Not selecting a group will show a correlation matrix for the entire dataset.
- Split by:There is an option to further split data by drop-down box to choose a column (from the Annotation if users choose By Observation and from Design Table if users choose By Variable) to get the correlations(e.g., users can split the genes into autosome and sex chromosome here so that each observation would be further split into two parts and the correlation between the same parts would be calculated: autosome vs. autosome and sex chromosome vs. sex chromosome.)
- Method: The methods available for calculating of correlation include "Pearson", "Kendall", and "Spearman" for more information on these different correlation methods).
- Calculate between-group correlations: This will will calculate all of the correlations for a Data object (even if a Group has been selected).
- The output Heatmap will, by default, still only display within-group correlations. To see the full comparison matrix, click Remove Trellis in the View Controller.
- Generate Dendrogram: If Calculate between-group correlations is selected, or no Group was specified, the Generate dendrogram option becomes available. A dendrogram will be generated showing the correlations between all chips (see below).
- Output Correlation p-values: If this option is checked, a p value for each correlation between pair is outputted based on the method users choose.
This function will generate a pairwise correlation matrix in the Solution Explorer under the Summary folder.
In the example shown below, a number of groups are shown. The heatmap shows the correlations between observations in each group. It is clear in this example that chip 22A (group DBP.t18) does not correlate well with the other chips in its group; if desired, this chip could be excluded from downstream analysis.
If interested, the user can add a Table View to the Correlation Table in the Solution Explorer in order to see the exact numbers of the correlations.
To exclude the sample 22A from samples, users can select all other samples in heatmap and then generate a new list containing only the rest 23 samples.
The legend (shown below) shows the range of correlations (this can be set using the Task tab of the Project Explorer).
An example of the Dendrogram View generated by clicking the Generate dendrogram checkbox is shown below. It functions very similarly to the HeatmapTableView.