Single Variable Classification
The Single Variable Classification module allows the user to assess the how well individual variables can be used as predictors of classification for observations' design information. This analysis employ LDA (Linear discriminant analysis) to perform the analysis done. For example, in microarray data, we can test each probeset's ability to predict each observation's treatment level.
Array Studio outputs different metrics to evaluate each probeset's prediction ability such as True Positive, False Positive, False Negative, True Negative, Sensitivity, Specificity, PPV(Positive predictive value) and NPV(nagative predictive value). The user can find more details and meaning of the metrics here.
This module only works for binary classifications. To classify a factor with more than two levels, the users need to group one level as a case, and the remaining levels as the other case to do the binary classification.
To run this module, type MicroArray | Pattern | Single Variable Classification.
Input Data Requirements
This module works on -Omic data types.
- Project & Data: The window includes a dropdown box to select the Project and Data object to be filtered.
- Variables: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).
- Observations: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).
- Output name: The user can choose to name the output data object.
- Classify: Specify which column in the design table contains the factor on which to classify.
- This module only works on binary classifications.
- Case: Specify one level in the target factor column as a case; the remaining levels in the same factor column will be treated as the other case.
- Observation normalization: The user can choose to perform observation normalization before the classification. (options include "CenterScale", "Center", or "None"). This is set to "CenterScale" by default and provides the best option when comparing datasets that may have differing values (due to normalization).
- Normalize against all variables: This option is only meaningful when the user did not use all variables to do the classification. If so, this option will normalize the observation using all the variables in the dataset (instead of just the variables chosen in the Variables section).
- Cross validation fold: Set the number of cross validation the user want to perform.
- User defined prevalence: The prior information about the probability that a sample can be classified as case. This is useful for controlled experiments. For samples from a random population where the prevalence is unknown, this should be left unchecked.
- Predict new data:Check this box if the user wishes to also Predict new data while running the classification.
- Project: The project that contains the data.
- New data: The new dataset that use want to predict.
- Random number seed: This value is used to initialize the randomizer in the module (e.g., the cross validation). By default this is set to 0, which uses the system clock.
The results include a CVReport table in the Prediction folder of the Solution Explorer showing TruePositive, FalsePositive, FalseNegative, TrueNegative, Sensitivity, Specificity, PPV, and NPV calls.
If the user chose to predict new data, a Predicted table in the Prediction folder of the Solution Explorer is generated, showing the predicted classification group for each variable.