TCGA Land Content: Comparisons by Mutation Status

From Array Suite Wiki


The 2018Q1 update to TCGA_B37 (and 2018Q2 update to TCGA_B38) land now includes 4,789 additional RNA-seq comparisons (see the ComparisonLand wiki page for more details). In addition to the previous "Tumor vs. Normal" comparisons within each tumor type, TCGA_B37 samples are additionally stratified by (1) DNA-seq mutation status (across commonly mutated genes in cancer, such as those found in the QIAseq Targeted DNA Panels) and (2) clinical variables related to cancer diagnosis or prognosis (such as MSI Status, HPV Status, etc.).

TCGA Comparison Distribution.png

Processing Pipeline

1. Gene Selection

To facilitate gene selection, we utilized the gene lists profiled on seven QIAseq Targeted DNA Panels. We then assigned each TCGA Tumor Types to a relevant QIAseq Targeted DNA Panel:

Gene Panel Tumor Types Profiled Number of Genes
Breast Cancer 96 Panel BRCA 93
Colorectal Cancer COAD;READ 73
Myeloid Neoplasms Panel LAML 142
Lung Cancer Panel LUAD;LUSC 72
Actionable Solid Tumor Panel BRCA;CESC;CHOL;COAD;ESCA
Comprehensive Cancer Panel ACC;BLCA;BRCA;CESC;CHOL

For each gene assayed in a QIAseq panel above (381 genes), mutation status in relevant tumors was used to set up statistical contrasts.

2. Generate Gene Mutation Status

Next, we utilized the Omic Data Query module to query each sample's mutation status. The DnaSeq_SomaticMutation Land Query queries the WXS DnaSeq_SomaticMutation data across each TCGA sample and reports the mutation status as WT or MUT for each gene specified in the query. In this way, we can report the mutation status in TCGA somatic mutation data for each gene assayed in QIAGEN's Targeted DNA panels.

Our Land Query utilized the default DnaSeq_SomaticMutation search parameters. Below is an example of the output of a DnaSeq_SomaticMutation Land Query on BRCA samples, searching genes included in the BRCA1 and BRCA2 QIAseq Targeted Gene Panel.

BRCA OmicQuery DnaSeqMutStatus.png

3. Generate sample groups for each tumor and profiled gene

To set up each statistical comparisons, samples in a TumorType were assigned to Case or Control based on MUT/WT status. Potential comparisons were then filtered on the following criteria:

  1. Only tumors that were assigned to a QIAseq panel will be considered for mutation status of genes in the panel
  2. Samples must have RNA-seq data and DNA-seq somatic mutation status
  3. At least two samples must have Case status, and at least two samples must have Control status

Because of these criteria, not all tumors will have comparisons for all genes in a panel.

Generating sample groups for clinical covariates

Similar rules were used for establishing sample groups to compare on clinical covariate status:

  1. Samples must have RNA-seq expression and DNA-seq somatic mutation status data
  2. At least two samples must have Case status, and at least two samples must have Control status

In cases where a clinical covariate has more than two levels (e.g. Therapy Outcome [Primary] has "Complete Remission/Response", "Partial Remission/Response", "Progressive Disease", and "Stable Disease"), one level will be selected as Control (e.g. "Complete Remission/Response"), and contrasts will be set up against each of the remaining levels that pass the above criteria.

4. Compare Global Gene Expression Between 'WT' and 'MUT' Sample Groups

Once samples are partitioned into WT and MUT for each profiled gene, global gene expression between the two groups is compared using Voom. In the example below, 1,002 BRCA samples are subdivided into two groups based on TP53 mutation status. Any genes that exhibit a different expression pattern between the two groups are likely impacted by the TP53 mutation.

BRCA TP53 Comparison Schematic.png

Related Articles