ArrayLand Configuration Options

From Array Suite Wiki

Configuration file

Multiple Lands can be hosted on ArrayServer. The land data is stored within the LandDirectory that is specified in the ArrayServer.cfg.

For example, if ArrayServer.cfg contains the line

LandDirectory=/mnt/Scratch/ArrayServer/LandDir/Test-08_LandDir

This directory will contain a series of subdirectories, one for each Land:

Server LandDirectories.png

and a file called Land.cfg within each subdirectory.

Customizing Land Configurations (LandName.cfg2)

Land options can be configured in ArrayServer.cfg. However, we strongly recommend using the LandName.cfg file inside the land sub-directory for default configuration of individual Lands.

Server CCLEcfg.png

Additional user configurations can also be added into a LandName.cfg2 file. Any parameters set in LandName.cfg2 will overwrite parameters in LandName.cfg.

Tips.png

Cloud-based Lands automatically over-write LandName.cfg in your Server's Land directory with every update. Be sure to place all customizations in LandName.cfg2


Configuration Options

Parameter Required Default
Name Y .
Version N 3
Description N No description
Folder Y Only required if used in ArrayServer.cfg; not used in LandName.cfg
ReferenceLibraryID Y .
GeneModelID Y .
MutationGeneModelID Y .
SubjectIDColumn N SubjectID
ClinicalJoinSampling N True
SampleTypeColumn N Sample Type
ControlSampleLevels N Blood Derived Normal, Solid Tissue Normal, Bone Marrow Normal
AutoFilterColumn N Same as PrimaryGrouping
AutoFilterLabel N AutoFilterColumn column name
AutoFilterOffGrouping N Tumor Type
AutoFilterOnGrouping N Sample Type
PrimaryGrouping N Same as AutoFilterOffGrouping
SecondaryGrouping N Same as AutoFilterOnGrouping
PrimaryProjectGrouping N TherapeuticArea
ComparisonsPrimaryGrouping N
ComparisonsSecondaryGrouping N
PrimaryAssociationGrouping N
SecondaryAssociationGrouping N
AutomaticGrouping N Same as PrimaryGrouping
EnableBam N FALSE
EnableBas N FALSE
BamFolder N ""
BasFolder N ""
DnaSeqTargetBamFolder N ""
DnaSeqTargetBasFolder N ""
DnaSeqBamFolder N ""
DnaSeqBasFolder N ""
DnaSeqExomeBamFolder N ""
DnaSeqExomeBasFolder N ""
SingleEndFusionBamFolder N ""
PairedEndFusionBamFolder N ""
SingleEndFusionBasFolder N ""
PairedEndFusionBasFolder N ""
VirusBamFolder N ""
VirusBasFolder N ""
MicrobesBamFolder N ""
MicrobesBasFolder N ""
FunctionalAnnotationFiles N ""
VariantClassifiers N ""
CosmicFile N ""
DbsnpVersion N Dbsnp138 for Human and Dbsnp137 for Mouse before 2015 Q2
ClinicalHistogramViewColumns N PrimaryGrouping,SecondaryGrouping
CnvAmplificationCutoff N 3
CnvDeletionCutoff N 1
ExpressionUpRegulationCutoff N 1
ExpressionDownRegulationCutoff N -1
SurvivalTimeColumn N Survival Days
SurvivalStatusColumn N Survival Status
SurvivalDeathStatus N DECEASED
MaxGeneCount N 500
MaxSampleCount N 2000
MaxMeasurementCount N 10000
HomePage N ""
IsReadOnly N FALSE
MinimalSeedFusionCutoff N ""
OrganizeBamByPrimaryGrouping N ""
OrganizeBasByPrimaryGrouping N ""
OrganizeFusionBamByPrimaryGrouping N ""
OrganizeFusionBasByPrimaryGrouping N ""
IntegrationLevel N ""
SEFusionCutoff N 3
PEFusionCutoff N 4
MinimalExonJunctionCutoff N 5
DefaultViewID.ContextType N Explicitly set the default view for the specified Land context, such as Default,Gene,ComparisonSet, etc.
DefaultViewID.Default N explicitly set the default view for each land context, such as "LandHomePage"
DefaultViewID.Gene N explicitly set the default view to be a gene for each land context
EnableLandNumericCache N False
CollectionName N A name you like
CNVCallingMethod N Select the Default Copy Number Calling Method, when both are available
TopHitsN N 2000 (also the minimum if a lower value is configured)
MaxTopHitsN N 500000 (also the minimum if a lower value is configured)
DefaultProjectAccess N [usergroup]administrators,[usergroup]standard users
DefaultProjectFrequencyAccess N [usergroup]administrators,[usergroup]standard users
AdminEmail N
MandatoryAssociationMetaDataColumns N
MandatorySampleMetaDataColumns N

Enabling BAM/BAS Streaming

Land administrators can map locations of most BAM files corresponding to the NGS results in an internal Land, or can store only highly-compressed "BAM Summary" (BAS) files, which represent nucleotide-level coverage, junctions, and mutation frequency without storing the full read sequence.

Follow these steps to enable BAM/BAS streaming for RNA-seq data; admins can also map DNA-seq mutation, targeted mutation, and exome mutation data corresponding to variation data stored.

  1. specify "EnableBam=True" and "EnableBas=True" in the Land.cfg file
    specify "BamFolder" and "BasFolder" (if in a separate location) to map to the proper OmicSoft Server Virtual Path containing the relevant RNA-seq files.
    1. Specify DnaSeqBamFolder, DnaSeqBamFolder, DnaSeqExomeBamFolder to map DNA-seq BAM files
    In the Land sample metadata, add a column BamFileName that contains the corresponding BAM file name (e.g. "C000S5B1.bam") for the sample, within the folder specified "BamFolder" parameter

File:BamFileName Metadata.png

If configured properly, when you search for a gene in the Land, on a sample-level data view (e.g. Gene FPKM) the Action link "Browse Selected Samples" should be enabled.

Example of Land Configuration

Example in ArrayServer.cfg

[Folder]
BAMFile=/IData/BAM
BASFile=/IData/BAS
ArrayLand=/IData/ArrayLand
[Land]
Name=TCGA
Folder=/IData/ArrayLand/TCGA
ReferenceLibraryID=Human.B37.3
GeneModelID=UcscGene20120907
AutoFilterColumn=Tumor Type
AutoFilterLabel=Tumor Type
AutoFilterOffGrouping=Tumor Type
AutoFilterOnGrouping=Sample Type
BamFolder=/BAMFile/bamfile
BasFolder=/BASFile/Bas
EnableBas=True
EnableBam=True
SingleEndFusionBamFolder=/BAMFile/FusionBamSE
PairedEndFusionBamFolder=/BAMFile/FusionBamPE
SingleEndFusionBasFolder=/BASFile/FusionBasSE
PairedEndFusionBasFolder=/BASFile/FusionBasPE
FunctionalAnnotationFiles=Human.B37.3_FunctionalMutation20130408.gbt,Human.B37.3_CosmicMutation_V64.gbt,Human.B37.3_1000Genome_2011_0521.compact.gbt
MaxGeneCount=500
HomePage=https://resources.omicsoft.com/TCGA-Land.html

Example in CompanyLand.cfg

Name=CompanyLand_B37
ReferenceLibraryID=Human.B37.3
GeneModelID=OmicsoftGene20130723
PrimaryGrouping=Tumor Type
SecondaryGrouping=Sample Type
FunctionalAnnotationFiles=Human.B37.3_FunctionalMutation20150406.gbt,Human.B37.3_CosmicMutation_V70.gbt,Human.B37.3_1000Genome_2013_0502.compact.gbt,OT:ucscGenePfam
VariantClassifiers=ClinVar_20160115,FunctionalMutation_20160115,1000GenomesSimple_20160115,ExAC_20160115,ESP6500_20160115,UK10K_20160115,GTExEqtl_20160115,RegulomeDB_20160115,Interpro_20160115
MutationGeneModelID=Uniprot.Ensembl75
Description=Company Internal Land Description
MaxGeneCount=500
MinimalSeedFusionCutoff=3
EnableLandNumericCache=True
EnableLandAnalysisCache=True
Version=3

EnableBam=True
BamFolder=/ArrayServer/CompanyLand/bam
EnableBas=True
BasFolder=/ArrayServer/CompanyLand/bam
SingleEndFusionBamFolder=/ArrayServer/CompanyLand/alv/SingleEndFusionBam
SingleEndFusionBasFolder=/ArrayServer/CompanyLand/alv/SingleEndFusionBas
PairedEndFusionBamFolder=/ArrayServer/CompanyLand/alv/PairedEndFusionBam
PairedEndFusionBasFolder=/ArrayServer/CompanyLand/alv/PairedEndFusionBas

Example GeneticsLand.cfg

Description=GeneticsLandTutorial
GeneModelID=OmicsoftGene20130723
IsGeneticsLand=True
MaxGeneCount=500
MutationGeneModelID=Uniprot.Ensembl75
Name=Tutorial
PrimaryAssociationGrouping=StudyID
PrimaryGrouping=ProjectName
ReferenceLibraryID=Human.B37.3
SecondaryAssociationGrouping=TraitCategory
SecondaryGrouping=Sex
SubjectIDColumn=USUBJID
VariantClassifiers=ClinVar_20170501,FunctionalMutation_20170501,1000GenomesSimple_20170501,ExAC_20170501,ESP6500_20170501,RegulomeDB_20170501,HaploregV4_20170501,Conservation_20170501,GWAVA_20170501,GRASP2_20170501,GTexEqtl_20170501,GWASCatalog_20170501,UK10K_20170501,Wellderly_20170501

Transparency

By default, a few core configurations are displayed to Studio users through the Show Land Statistics interface. This display can be customized for each Land:

VisibleParameters.txt

This additional tab-delimited table in the Land's directory (adjacent to the Land.cfg file) specifies which configurations to display to the Studio user and how to display them. Here is a template to demonstrate the expected format:

  • 3 tab-delimited columns
    • ID - the parameter ID as in the Parameter column of the table above (e.g. SubjectIDColumn)
    • Display name - the more-readable short name to display to the user (e.g. Subject ID Column)
    • Tooltip - the longer description to display when the user hovers over the Display name (e.g. Column in Sample Metadata used to join subject-level attributes from Clinical Data)
  • First row is header with column names
  • First column must be ID (remaining 2 columns can be in any order)
  • If a particular ID is not defined in the Land.cfg (or .cfg2) file:
    • If there is a default value (per the Default column of the table above), that value will be displayed with this Display name and Tooltip
    • Otherwise, this row of the VisibleParameters.txt file will be ignored (nothing will be displayed to the user)

VisibleParameters.txt2

For OmicSoft-hosted ("cloud") Lands, the VisibleParameters.txt file will be part of the OmicSoft-maintained content and may be overwritten by OmicSoft updates just as the Land.cfg file may be overwritten. Therefore, if you wish to customize one of these Lands, create a VisibleParameters.txt2 file. The Show Land Statistics display will be composed of the union of VisibleParameters.txt and VisibleParameters.txt2 with the latter having precedence.

Tips.png

If OmicSoft has not included a VisibleParameters.txt file in one of these hosted Lands, please create a VisibleParameters.txt file with just the header row so your VisibleParameters.txt2 file will be effective