Introduction to OncoGEO Land Content

From Array Suite Wiki


OncoGEO_B37 is a collection of oncology-related datasets in NCBI's GEO and SRA repositories, including samples of cell lines and normal controls. This collection also includes comparison data from projects comparing disease vs normal, treatment vs control, etc., with statistical inferences to identify differentially-expressed genes. Data has been processed using the same genome build: Human.B37.3 and gene model: OmicsoftGene20130723

Data Source


Data Types

  • RNA-Seq data
  • microarray platforms (including Affymetrix and Illumina)
  • Copy number variation
  • Methylation450 BeadChip

Laboratory Methods

Affymetrix Expression Arrays

Illumina HiSeq RNA sequencing

Processing Methods

Expression Data: Omicsoft Affymetrix Microarray Preprocessing

RNA-Seq data: OmicScript Pipeline and Building Land From RNA-Seq Data

Key Meta Data Columns

OncoGEO is curated at the comparison, sample and project level, with hundreds of meta data columns available.

Comparison level:

  • Comparison Cutoffs: Sample size, fold change, p value and expression cutoffs for each comparison.
  • Comparison details: Comparison Category, Contrast, case and control sample IDs.

Sample level:

  • DiseaseCategory (controlled vocabulary) : Disease category of the sample based on the details disease state.
  • TissueCategory (controlled vocabulary) : Tissue category such as skin, muscle, heart, kidney etc.
  • DiseaseState (controlled vocabulary) : Curated at sample level from each project.
  • SampleSource (controlled vocabulary) : Either cell type or tissue information. When a sample has cell type information, cell type is used. Otherwise, tissue category is used.
  • Land Tissue: The tissue from which the cell line was derived, using OmicSoft's curation Controlled Vocabulary
  • Land Sample Type: A detailed description of the cell type from which the cell line was derived, using OmicSoft's curation Controlled Vocabulary
  • Tumor or Normal: Indicates whether a sample is from a tumor or normal sample.
  • Survival days and event: When available, data is added to show overall survival (OS), as well as specific measurement of survival, such as:
    • DiseaseFreeSurvival
    • MetastatisFreeSurvival
    • ProgressionFreeSurvival
    • RecurrenceFreeSurvival

Project level:

  • ProjectName: The name of individual projects where the data is from.
  • TherapeuticArea: Specific clinical focus of individual project (can be multiple areas depending on project)

Key Views

Gene Expression

The most common way to research the data is to examine gene expression levels across sample meta data or other genomic features.


Project View

OncoGEO Land is a collection of individual GEO projects. Experimental designs in projects can be different, and batch effects in microarray projects, for example, are difficult to remove. OmicSoft created project-specific views to display expression values based on experiment design in each project.


IL8 gene expression grouped by subject treatment in project GSE3284.

Comparison View

OncoGEO Land provides comparison views for projects with gene expression comparison results. By searching a gene, user can "visualize" the association (fold change by p-value) with the comparisons across projects, and narrow down to find interesting projects interactively. Comparison view is Omicsoft's highlight view, especially for Omicsoft DiseaseLand. For more details, please refer to: ComparisonLand


IL8 Treatment vs Control Comparison View.

Related Articles