Table SummarizeMissingPattern.pdf
Summarize Missing Pattern
Overview
The Missing Pattern command will summarize missing data from a Table object, generating a new table with a patterns of missing data, and their frequencies. It is most commonly used to figure out if there are any particular patterns of missing data (for instance, whole observations or whole variables with missing data). The results will be generated in the Solution Explorer under the Summarize folder of the Table section. It can be accessed by going to: Table | Summarize Missing Pattern.
Input Data Requirements
This function works on all Table data, including design and annotation tables.
Step 1: Select source table
The user will first be asked to choose the table to summarize:
Step 2: Select columns for summarization
- The user can specify which columns should be used in the summarization, or the user can just "Select all" to select all columns.
Output Results
Selecting the Submit button will generate a new table with patterns of missing data in the Solution Explorer under the Summarize tab.
Each pattern is summarized under the Pattern column, where each column is represented by either '0' (data present) or '1' (data missing).
- Count: The number of rows matching this pattern.
- N Cols Missing: A summarization of the number of columns with missing data in this pattern.
- For each column, the Present or Missing status will be displayed.
See below for an example.
Example Usage
Given the following starting table and summarize missing pattern option:
The table has missing value for Sample D and F for both Gene1 and Gene2, and only missing value for SampleA for Gene4, and here is the resulted table for missing pattern:
Thus, there are 3 types of missing pattern for all of the columns, the first rows means that there are 3 rows has no missing values; the second row shows that there are 1 row has missing pattern with "000100", which means that the forth column has missing value; and the third row shows that there are 2 rows has this missing pattern "101000", which means that first column and third column have missing value.