Combine.pdf

From Array Suite Wiki

Combine

Overview

The Combine command will combine technical replicates (based on a column from the Design Table), or combine variables (based on a column from the Annotation Table) using one of a number of summarization methods, and create a new dataset.

To run this module, type MicroArray | Preprocess | Combine.

Combine menu.png

Input Data Requirements

It works on -Omic data types.


General Options

Combine1.png

Input/Output


  • Project & Data: The window includes a dropdown box to select the Project and Data object to be filtered.
  • Variables: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).
  • Observations: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).
  • Output name: The user can choose to name the output data object.


Options

  • Summarization method: Various options are available in specifying the Summarization method, and are defined in the table below.
  • Combine observations/Combine variables: The user can select to either combine data by looking for matching identifiers among Design metadata (observations) or Annotation metadata ( variables)
  • Change Case: Upon combining, there is the option to "Change case" for the selected grouped identifier (e.g. Gene Symbol) so that there is consistency in the identifier names if one wants to merge data at a later stage.
  • Group(s): The user can specify the column from the Design Table or Annotation Table to be used for the combining of the Observations or Variables. In cases where the user has technical replicates, the Design Table must include a column that indicates the sample name that the group of technical replicates represent.
Summarization Methods used in Array Studio
Option
Meaning
URL

N

number of data points


Mean

average


StdDev

Standard Deviation

http://en.wikipedia.org/wiki/Standard_deviation

Min

minimum value in that variable or observations


Max

maximum value in that variable or observation


MinAbs

minimum absolute value


MaxAbs

maximum absolute value


Range

range of values in that variable or observation


NMissing

number of missing values in that variable or observation


NMissingPercentage

percentage of missing values in that variable or observation


NNotMissing

number of non-missing values in that variable or observation


NNotMissingPercentage

percentage of non-missing values in that variable or observation


Sum

sum of values for that variable or observation


Variance

variance of values for that variable or observation

http://en.wikipedia.org/wiki/Variance

StdErr

standard error for that variable or observation

http://en.wikipedia.org/wiki/Standard_error_%28statistics%29

CV

coefficient of variation

http://en.wikipedia.org/wiki/Coefficient_of_variation

Median

median for that variable or observation


IQR

interquartile range for that variable or observation

http://en.wikipedia.org/wiki/IQR

Skewness

skewness for that variable or observation N

http://en.wikipedia.org/wiki/Skewness

Kurtosis

kurtosis for that variable or observation

http://en.wikipedia.org/wiki/Kurtosis

MAD

median absolute deviation for variable or observation

http://en.wikipedia.org/wiki/Median_absolute_deviation

NPositive

Number of positive data points for that variable or observation


NNegative

Number of negative data points for that variable or observation


PositivePercentage

Percentage of positive data points for that variable or observation


NegativePercentage

Percentage of negative data points for that variable or observation


PositiveChangeSize

maximal positive value * percentage of positive values for that variable or observation


NegativeChangeSize

minimal negative value * percentage of negative values for that variable or observation


PositiveMean

average of positive values for that variable or observation


NegativeMean

average of negative values for that variable or observation


GeometricMean

mean or average which indicates the central tendency or typical value of a set of numbers

http://en.wikipedia.org/wiki/Geometric_mean

Output Results

A new Data object generated by this command will be found in the Solution Explorer under the "–Omic" data section.

Example Usage

This function can be used to group technical replicates, but can also be used to combine redundant gene expression data.

For example, the Ensembl gene model uses unique EnsemblID's (e.g. ENSG00000112246) for each annotated gene, and has several thousand identifiers that are redundant in the GeneName column, so cannot be set as the ID column.

Microarray CombineByGeneName Before.png

If you would like to display GeneName instead of EnsemblID as the ID column, the redundant identifiers must be made unique, such as by calculating the mean value of these entries in Combine:

Microarray CombineByGeneName Menu.png

After combining, the GeneID will be set to GeneName.

Microarray CombineByGeneName After.png


OmicScript

Combine

Related Articles