Combine.pdf
Combine
Overview
The Combine command will combine technical replicates (based on a column from the Design Table), or combine variables (based on a column from the Annotation Table) using one of a number of summarization methods, and create a new dataset.
To run this module, type MicroArray | Preprocess | Combine.
Input Data Requirements
It works on -Omic data types.
General Options
Input/Output
- Project & Data: The window includes a dropdown box to select the Project and Data object to be filtered.
- Variables: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).
- Observations: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).
- Output name: The user can choose to name the output data object.
Options
- Summarization method: Various options are available in specifying the Summarization method, and are defined in the table below.
- Combine observations/Combine variables: The user can select to either combine data by looking for matching identifiers among Design metadata (observations) or Annotation metadata ( variables)
- Change Case: Upon combining, there is the option to "Change case" for the selected grouped identifier (e.g. Gene Symbol) so that there is consistency in the identifier names if one wants to merge data at a later stage.
- Group(s): The user can specify the column from the Design Table or Annotation Table to be used for the combining of the Observations or Variables. In cases where the user has technical replicates, the Design Table must include a column that indicates the sample name that the group of technical replicates represent.
|
|
|
N |
number of data points |
|
Mean |
average |
|
StdDev |
Standard Deviation |
|
Min |
minimum value in that variable or observations |
|
Max |
maximum value in that variable or observation |
|
MinAbs |
minimum absolute value |
|
MaxAbs |
maximum absolute value |
|
Range |
range of values in that variable or observation |
|
NMissing |
number of missing values in that variable or observation |
|
NMissingPercentage |
percentage of missing values in that variable or observation |
|
NNotMissing |
number of non-missing values in that variable or observation |
|
NNotMissingPercentage |
percentage of non-missing values in that variable or observation |
|
Sum |
sum of values for that variable or observation |
|
Variance |
variance of values for that variable or observation |
|
StdErr |
standard error for that variable or observation |
http://en.wikipedia.org/wiki/Standard_error_%28statistics%29 |
CV |
coefficient of variation |
|
Median |
median for that variable or observation |
|
IQR |
interquartile range for that variable or observation |
|
Skewness |
skewness for that variable or observation N |
|
Kurtosis |
kurtosis for that variable or observation |
|
MAD |
median absolute deviation for variable or observation |
|
NPositive |
Number of positive data points for that variable or observation |
|
NNegative |
Number of negative data points for that variable or observation |
|
PositivePercentage |
Percentage of positive data points for that variable or observation |
|
NegativePercentage |
Percentage of negative data points for that variable or observation |
|
PositiveChangeSize |
maximal positive value * percentage of positive values for that variable or observation |
|
NegativeChangeSize |
minimal negative value * percentage of negative values for that variable or observation |
|
PositiveMean |
average of positive values for that variable or observation |
|
NegativeMean |
average of negative values for that variable or observation |
|
GeometricMean |
mean or average which indicates the central tendency or typical value of a set of numbers |
Output Results
A new Data object generated by this command will be found in the Solution Explorer under the "–Omic" data section.
Example Usage
This function can be used to group technical replicates, but can also be used to combine redundant gene expression data.
For example, the Ensembl gene model uses unique EnsemblID's (e.g. ENSG00000112246) for each annotated gene, and has several thousand identifiers that are redundant in the GeneName column, so cannot be set as the ID column.
If you would like to display GeneName instead of EnsemblID as the ID column, the redundant identifiers must be made unique, such as by calculating the mean value of these entries in Combine:
After combining, the GeneID will be set to GeneName.
OmicScript
Related Articles
- Latest Tutorials
- Omicsoft aligner wiki and publication