Introduction to BeatAML Land

From Array Suite Wiki

BeatAML_B37 and BeatAML_B38

The BeatAML project is an effort to investigate in depth the various genetic classes of AML which have recently been discovered. OmicSoft's BeatAML_B37/B38 Land release provides analysis and visualization of DNA somatic mutations, mRNA expression, and more, for 672 tumor specimens collected from 562 acute myeloid leukemia patients.

These data can also provide the link between pharmacologic vulnerabilities and genomic/expression patterns, with Land Measurement Queries. The drug response measurement data is located here:

Land Version Genome Build Gene Model
BeatAML_B37 Human.B37.3 OmicsoftGene20130723
BeatAML_B38 Human.B38 OmicsoftGenCode.V24

Data Source: Vizome

Data Types

   DNASeq Mutation
   DNASeq Somatic Mutation
   RNA-Seq, including:
       Single-end and Paired-end fusion calling
       RNA-Seq somatic mutation, from matched tumor/normal pairs
       Exon Junction and Exon Usage
       Expression (Gene- and Transcript- level quantification) 

Laboratory Methods

   Illumina HiSeq RNA sequencing (HiSeq 2500)
   Illumina Nextera RapidCapture Exome capture probe sequencing

A note on Subject ID labeling

  • SubjectID in BeatAML Land corresponds to LLS_PatientID in the source data

Processing Methods

RNA-Seq data: OmicScript RNAseq Pipeline and Building Lands From RNA-Seq Data

OmicSoft does not reprocess other genomic data, but extracts data directly from original datasets. Key Meta Data Columns

   DiseaseState: The type of leukemia the patient was diagnosed with.
   DiseaseStage: If Specimen is obtained at time of relapsed disease or de novo disease or if Patient transformed from another heme malignancy before or at the time of Specimen collection. Three options are available: isRelapse|isDenovo|isTransformed; "NA" means the subject was false for all three.
   Histology: Histological types of cancer, such as carcinoma, glioma and sarcoma.
   Tissue: The tissue from which the cell line was derived, using OmicSoft's curation Controlled Vocabulary
   Sample Type: A detailed description of the cell type from which the cell line was derived, using OmicSoft's curation Controlled Vocabulary
   Tumor or Normal: Indicates whether a sample is from a tumor or normal sample. 

Note: "Unknown" values have been defined by the study authors as "Unknown = not enough information to determine classification"

Primary Grouping: Disease State

Sample Distribution by DiseaseState


Key Views

Gene Expression

One of the most common ways to visualize gene expression data is a per-sample Scatter plot (e.g. Gene FPKM), with each sample grouped by DiseaseState on the Y-axis, and expression level plotted on the X-axis:

BeatAML B37 RNASeqView.png

Additional Views include transcript-level and exon-level views, pairwise comparison plots, and direct visualization of RNAseq coverage with the OmicSoft Genome Browser.

DNA Mutation

DNA sequencing from whole exome sequencing was performed on all samples. Multiple visualizations display frequency and locations of gene mutations in CCLE samples, including the Mutation Landscape View. Many individual genes do not contain DNA mutations in AML cancers, and will display the message "No data is available for charting" because there are no deviations from the wild-type sequence.

To filter down to samples containing only the data type of interest, use the Data filters in the Sample Metadata window.

BeatAML filters.png

The numeric "Data" filters allow the user to filter samples based on characteristics of the searched gene. For example, RNAseq expression filter would hide any sample that didn't pass the (linear) threshold defined. Similarly, the DNAseq mutation filter would filter out any samples that didn't have the requested number of mutations in a given gene. The user could use it to clean up a plot with a few outliers, or plot expression by number of mutations in the gene of interest. This filter acts, basically, as a quicker -omic data query, for simple tasks.

DNAseq NPM1 DiseaseStage.png

Ex Vivo Drug Sensitivity Data

A key strength of the BeatAML dataset is the collection of drug sensitivity measurements performed on ex-vivo samples from the patients. This allows discovery of correlations between mutation status, expression patterns, and drug sensitivity.

The OmicSoft Server administrator will first need to add the drug measurement data as an orthogonal data type, which will make the data available for analysis.

BeatAML AddMeasurementData.png

After this, you can search for any drug ID in the same way you would search for a gene ID, and plot the sensitivity of each sample:

BeatAML IbrutinibSensitivity AUC.png

With the measurement data, you can identify gene mutations or expression that correlates with measurement sensitivity, using tools like Measurement to Expression/Measurement to Mutation integration, or by defining cohorts with Omic data queries and using Sample Grouping to Expression/Sample Grouping to Mutation.

BeatAML IbrutinibVsMutationsWindow.png

BeatAML IbrutinibVsMutations.png

Where do I find the data presented in the paper?

  • Disease_type (Figure 3) - a term used in the paper, but not in the official BeatAML metadata downloads. The most likely definition of this term, as referenced in BeatAML Land, is DiseaseStage, which was merged from a combination of three columns in the BeatAML metadata: isRelapse|isDenovo|isTransformed
  • Cytogenetics (Figure 3) - this information is contained in the WHO_Fusion column, located in the Clinical Data metadata table.

Additional Notes

Drug response measurement data for 119 samples whose IDs were not listed in the BeatAML metadata table are included in a separate table here:

Benchmark Paper

Other terms for searching: beat aml, beataml