Filter for significant genes from multiple tests

From Array Suite Wiki

Case Study:Filter for significant genes from multiple tests


Large-scale genomic experiments often include multiple timepoints, treatments, genotypes, and tissues. Array Studio has multiple modules, including ANOVA and General Linear Model, that can account for experimental design in identifying significantly up- or down-regulated genes in one or more tests. The output Inference Report contains fold-change and p-value for each test, and can be filtered by one or more columns to find significant genes from the specified test.

But what if you want to find the set of genes that were significantly different in any of the statistical tests, instead of one particular test? For example, what if you wanted to find a set of candidate gene expression markers for different cell types? Depending on how complex you want the filter logic to be, Array Studio has multiple methods to find interesting genes.

Example:Blueprint Gene Expression

The example dataset is count data from the RNA-seq expression data from Blueprint, using DEseq to look for differentially-expressed genes in the Tissue column, with all permutations of comparisons between Cord Blood, Bone Marrow, Thymus, Tonsil, and Venous blood.

FilterSignificantGenes VolcanoPlot.png

Using Lists as Filters

Although -Omic data, Table data, and Inference Reports can be permanently subset by observation and variable, it is usually more convenient to "hide" data you aren't interested in, by filtering.

One convenient way to filter is by a List object, which is just a list of items (such as sampleIDs, gene/probeset names, etc).

List Example.png

Lists can be applied to filter columns (e.g. the GeneID column) to only show GeneIDs in your list.

List UseAsFilter.png

The examples below demonstrate ways to generate lists of the genes passing one or more criteria, so that these lists can be used to look at these significant genes in various ways.

Filtering by matching any columns

By default, the Filter window in the View Controller works by AND logic, so that if filters are set for P-value <.05 in Cord Blood vs Bone Marrow and Thymus vs Bone Marrow,


only variables that have P-value <.05 for BOTH tests will be displayed:

FilterByPvalue AndLogic.png

However, this AND logic can be switched to OR logic by clicking on the Options button in the Row filter tab, and selecting Match Any Column:

Filter LogicChoices.png

Now, the Volcano plot shows all genes that are significantly different in Cord blood vs Bone marrow or Thymus vs Bone marrow.

FilterByPvalue OrLogic.png

A list of these genes can be generated by right clicking List in the Solution Explorer, then selecting Add List from Visible Rows:


Combining Lists to build complex logical filters

If you are interested in a more complex logic, such as P-value <.05 AND Fold-change > 2 in Cord blood vs Bone marrow or Thymus vs Bone marrow, simply matching ANY filter column will not work.

Instead, you can build a list of genes that pass both the fold-change and P-value criteria in the first test:


Then, clear the filters, and build a list of genes that pass both the fold-change and P-value criteria in the second test:


Finally, combine the two lists, by right-clicking List and selecting Manage Lists:

ManageLists Menu.png

Select the lists to combine, select Make List from Union, and click Perform operation, to create a new list with genes from either list:

ManageLists Window.png

The "Union" list contains all genes that passed the criteria P-value <.05 AND Fold-change > 2 in either test. In this way, multiple tests can be filtered for specific criteria, and genes that pass these tests can be grouped for further study.

Summarize Inference Reports to generate significant lists

Another way to generate a set of lists for complex queries is by SummarizeInferenceReport.pdf.

For example, if you wanted to find the list of genes passing the criteria P-value <.05 AND Fold-change > 2 for any of the ten tests, you could either set up the filter logic for each test, and save a separate list each time, or simply click MicroArray | Inference | Summarize Inference Report:

Microarray SummarizeInferenceReport Menu.png

Select the Estimates (tests) of interest (e.g. All of them), specify the query (e.g. Fold change >2, Raw PValue <.05), and click Add, then Submit:

Microarray SummarizeInferenceReport Window.png

Clicking on any row of the resulting table will display the passing genes in the Details Window, and selecting all rows of the table will display the union of those individual results:

Microarray SummarizeInferenceReport Results.png

As before, create a list from selected rows to have a list of any gene that was significantly up-regulated in any of the tests.

Test Case

To rapidly generate a list of significantly up- and down-regulated genes, at the cutoff of log2-fold-change(Estimate)<-5 or >5 AND P-value <.0001 (highlighted below),

VolcanoPlot UpDownGenes.png

Summarize Inference Report can be set up like this:

Microarray SummarizeInferenceReport UpDownGenes Window.png

Generating a union list of ~2000 genes significantly up or down in at least one test.

This list can be used, for example, to filter the input genes for Hierarchical Clustering:

Microarray HC UpDownGenes Window.png

To see the significant gene expression patterns within each tissue:

Microarray HC UpDownGenes Results.png

Tonsil, bone marrow, and thymus all cluster, while some cord blood samples have gene signatures more closely matching venous blood.

Related Articles