OmicSoft Land Metadata Definitions

From Array Suite Wiki

Definitions for Common Land Metadata Columns


The OmicSoft Curation Team processes hundreds of disease-related projects every quarter, which includes carefully categorizing every sample's metadata, to ensure that you can find the data you are interested in.

Sample metadata are derived from metadata submitted to GEO, as well as from the source publication.

Every metadata column has a precise scope, but the logic can sometimes be unclear to the new user. This page will provide definitions and examples for the most commonly-used metadata columns.


  • Disease Category - Grouping of multiple related specific disease states (auto-generated from DiseaseState, Controlled Vocabulary)
  • Disease State - The specific disease for the defined subject (Controlled Vocabulary)
    • normal control - subject was healthy with no known disease (with or without treatment); wild-type animals or reporter animals, untreated or with short-term treatment, or long-term treatment of compound that does not induce disease; cells isolated from healthy subjects
    • disease control - tissue with normal pathology from a subject with unknown disease, or from deceased subject; biopsy from normal tissue that is not easily accessible (e.g. requiring surgery); non-wild-type animals with no known disease phenotype (e.g. transgene or targeted mutation); cell lines derived from the above
      • "easily" accessible biopsy tissues include skin, blood, hair, saliva, throat swabs, urine/fecal samples
    • general disease - human tissue from subject with a disease, but the specific disease is not specified; animal with a targeted mutation or treated with a set protocol to recapitulate a general disease pathology
    • genetic disease - subjects with a known genetic mutation but no specific disease
  • Sample Pathology - Within a tissue (such as skin from a patient with Psoriasis disease state), whether or not the sample was exhibiting the disease pathology (Controlled Vocabulary)

Controlled vocabulary terms from Human Disease Ontology, Monarch Disease Ontology, MeSH and Orphanet.


For oncology datasets only, the sample-level column "OncoSampleType" is curated to indicate the source of the sample. This column is useful for differentiating normal tissue from tumor samples from the same patient (identifiable by SubjectID).

Common normal samples include Bone Marrow Normal, Solid Tissue Normal, Blood Derived Normal Common cancer samples include Primary Tumor, Metastatic, Cell Lines Tumor, Primary Blood Derived Cancer - Peripheral Blood, Primary Blood Derived Cancer - Bone Marrow


  • Tissue Category - General category that groups multiple tissues, automatically generated from Tissue curation (Controlled Vocabulary)
  • Tissue - The most specific tissue term describing where the sample was isolated (Controlled Vocabulary)
  • Cell Type - Identifies the cell type of the sample (Controlled Vocabulary)
  • Sample Source - The source of the biological material of the Sample, unmodified from GEO submission.

Controlled vocabulary terms from Uberon and BRENDA Tissue.


If a specific cell type was isolated and identified by the authors, will be indicated here.

Controlled vocabulary terms from Cell Ontology, ImmGen, and source publications.


If a commercially-available cell line was used, the cell line will be defined here.

Controlled vocabulary terms from ATCC, Cell Line Ontology, and Cellosaurus.

Use DiseaseCategory and DiseaseState to filter cell lines derived from disease vs normal samples.


When a sample has cell type information, CellType is used. Otherwise, TissueCategory is used.


  • Treatment: For in vitro studies, describes the treatment on a sample (Controlled Vocabulary)
  • Subject Treatment: For in vivo studies, describes the treatment (Controlled Vocabulary).
    • If the same subject was sampled before and after treatment, Subject Treatment will be the same, but Treatment Status will indicate which sample is post-treatment
  • Treatment Status: Indicates an individual sample's treatment, if the sample came from a Subject (i.e. patient) that was sampled pre- and post-treatment (Controlled Vocabulary)
  • Pre Treatment: treatment given in vivo or in vitro before the main treatment. Many times the pre-treatment is the disease induction model for mouse studies.
  • Maternal Treatment: in vivo treatment given to the mother prior or during gestation. The sample is collected from is from the offspring.

Controlled vocabulary terms from PubChem, NCIT, DrugBank, ChemSpider, and treatment source company's web site.


  • Response: When response was observed to a particular treatment (in Treatment or SubjectTreatment) as part of the experimental design, the Response will be indicated in this field.
  • Response[Drug]: When a patient's response to a drug was noted in the clinical metadata, but was not relevant to the experimental design (i.e. not corresponding to Treatment or SubjectTreatment), the response will be indicated here.


A special column available in Land Comparisons metadata summarizes the treatment columns as a single additional column called "AllTreatment". Because of the complicated experimental models, the logic for determining which information to include is as follows: a) if "TreatmentStatus" and "SubjectTreatment" appear in the same project, use "TreatmentStatus" in "AllTreatment". b) Use "SubjectTreatment" when there is no "TreatmentStatus". b) if "TreatmentStatus" and "Treatment" appear in the same project, use "TreatmentStatus" in "AllTreatment". d) Use "Treatment" when there is no "TreatmentStatus".

Disease and Relapse Status

RelapseStatus and DiseaseStatus both capture information about a sample's cancer relapse, but the former captures an event subsequent to the sample collection, whereas DiseaseStatus captures the status at the moment of sample collection.

  • DiseaseStatus – contains information about the cancer status of the patient at the moment of sample collection. Relapsed patients at the moment of sample collection can be found by filtering ‘recurrent’, ‘local recurrent’, and ‘metastatic recurrent’ values.
  • RelapseStatus – contains information about the relapse status of the patient from whom the sample was collected at follow-up after the sample was collected, contrary to DiseaseStatus which refers to a relapse that already occurred before sample collection. Value ‘yes’ means the relapse subsequently occurred, ‘no’ value means the patient didn’t show subsequent relapse.

Additional information is often captured in the RelapseFreeSurvival[RFS][event] column, which contains information about any recurrence (excluding deaths, which are censored). "with event" indicates that relapse occurred at the follow-up timepoint, and "no event" indicates no subsequent relapse.

  • OncoSampleType – contains information about the type of sample analyzed collected from the patient. Relapsed samples can be found by filtering ‘Recurrent Blood Derived Cancer - Bone Marrow’, ‘Recurrent Blood Derived Cancer - Peripheral Blood,’ ‘Recurrent Tumor’.


"Disease vs. Normal","Disease1 vs. Disease2","CellType1 vs. CellType2","Treatment vs. Control","Treatment1 vs. Treatment2","Responder vs. Non-Responder","Tissue1 vs. Tissue2","Healthy vs. Control","Other Comparisons" .

More details can be found here.