Molecular Signatures


The Molecular Signatures command will allow the user to run a type of “Gene Set Enrichment Analysis” using the Molecular Signatures Database from the Broad Institute ( This command first takes any number of lists of significant probe sets, and maps these probe sets to a master symbol, for instance, Gene Symbol. These lists of gene symbols are then compared to the gene sets at the Molecular Signatures Database, and a Fisher Exact test is run to generate p-values for each gene set.

There are several differences between this function and GSEA. The input of this function is a list (which usually contains the variables that the user is interested in). The main idea of this function is that if a gene set is highly correlated to the list of variables, it should contains more variables in the list. In contrast, the input of GSEA is microarray data. For each known gene set, GSEA tries to find the expression of each gene in the gene set to see whether overall the expression changes between different Phenotypes (such as treatment vs. control).

To run this module, type MicroArray | Annotation | Molecular Signatures.

Input Data Requirements

This module works on -Omic data types with lists of variables.

General Options



  • Project & Data: The window includes a dropdown box to select the Project and -Omic Data object to be filtered.
  • Lists of classification:Define which list of variables in the specified project should be used to compare to the Molecular signature data set.
  • Output name: The user can choose to name the output data object.


  • Gene Set: The user can specify which GeneSet collection to use.
    • Build: Build a custom gene set by selecting a user defined "buckets text file" containing 10 data columns. This file is described at the end of this document (Gsea.pdf#Example_Custom_Gene_Set_File).
      • Currently we only support XML files downloaded from GSEA (, we will update the table format in Custom Gene Set File soon
      • If you build a custom gene set, the analysis must be run locally, unless an ArrayServer admin copies the ".buckets" file to the appropriate server location.
  • Organism: The user needs to choose the Organism of the study (some gene sets are organism-specific).
  • Map by annotation column: The user can inform Array Studio which annotation column from the specified data's Annotation Table should be used for mapping the genes (usually Gene Symbol should be chosen).
  • Multiplicity: The user can specify the adjustment method to be used for the Fisher-exact test.

Output Results

An example result is shown below. Results include the Bucket(GeneSet name), [all] (total number of genes in the gene set), how many genes in the gene set is included in the specified list and the corresponding p-value, the Alias of the gene set, Organism, URL, Chip, CategoryCode, Contributor, ContributorOrganization, Description, and Tags.



