OmicSoft Suite v12 - Oshell Installation

From Array Suite Wiki
Tips.pngThis page describes the installation steps for Oshell with OmicSoft Suite v12.1, released mid-May 2022. If you're using OmicSoft Suite v11.2 or newer, please visit Oshell for OmicSoft Suite v11.


Overview

Oshell.exe is a .NET application, that can also be run on Linux environment using Mono. This article gives an introduction to Oshell, its installation and wiki links to typical usages.

Oshell/OmicSoft Project Environment

Oshell environment is a project-oriented analysis environment which contains popular analysis modules for data generated from sequencing and microarray platforms. Each project in the environment is associated with its data objects and analysis modules. Comprehensive data analysis pipelines can be constructed as projects in the environment in a user-friendly fashion. Pipeline is written and executed in OmicScript format, which is a brief script specifying data objects and running parameters. Data objects can be passed on to their corresponding downstream analysis modules smoothly.

OmicSoft project is

  • A collection of data objects (NGS object, Omics object, and table)
    • NGS data is a collection of BAM file links. BAM file will load to software when necessary. Multiple projects can share the same BAM file.
    • Omics data can be any result table combined with sample design and feature (e.g. gene) annotation, such as gene expression or CNV results.
    • Table is anything like an excel table, such as sequence alignment report.
    • List can be a list of IDs (e.g. gene). It can be used to filter result in Omics data and table.
  • An environment for analysis
    • Analysis runs on one/multiple/subset of objects
    • Analysis steps/scripts are tracked
  • An entity sharable on the server

Installation

Based on direct implementation of all its analysis modules, Oshell environment can be installed and run without dependencies on other bioinformatics software.
Please not that Oshell v12 is not compatible with OmicSoft Server 11.7 or earlier.

Install Oshell v12 on Windows

Oshell is coded in C#, and Windows .Net is its native running platform. Users can install Oshell very easily:

  • Create a folder with name "Oshell"
  • Download and save https://resources.omicsoft.com/software_update/OmicSoftServiceUpdater.exe to "Oshell" folder
  • In "Oshell" folder, create an empty file with name oshell.exe [note: the file extension is .exe]
  • Double click OmicSoftServiceUpdater.exe and all software binaries will be automatically downloaded into "Oshell" folders
  • Supported operating system: Windows 10

Install Oshell v12 on Ubuntu20

Install Mono

Install Mono 6.12 from the official repository for Ubuntu (https://www.mono-project.com/download/stable/#download-lin-ubuntu). Installing Mono by compiling it from sources is no longer necessary.

Add the Mono repository to your system:

$ sudo apt-get install gnupg ca-certificates
$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF
$ echo "deb https://download.mono-project.com/repo/ubuntu stable-focal main" | sudo tee /etc/apt/sources.list.d/mono-official-stable.list
$ sudo apt-get update

Install Mono 6 from repository:

$ sudo apt-get install mono-complete
  
$ which mono
/usr/bin/mono
   
$ mono --version
Mono JIT compiler version 6.12.0.107 (tarball Wed Dec 9 21:44:58 UTC 2020)
...

Create symlinks for the Mono 6 directories (this is needed for backward compatibility):

$ sudo mkdir /opt/mono-6.12.0
$ sudo mkdir /opt/mono-6.12.0/bin
$ sudo ln -s /usr/bin/mono /opt/mono-6.12.0/bin/mono
$ sudo ln -s /usr/bin/mono-sgen /opt/mono-6.12.0/bin/mono-sgen
$ sudo ln -s /usr/bin/cert-sync /opt/mono-6.12.0/bin/cert-sync
$ sudo ln -s /usr/bin/certmgr /opt/mono-6.12.0/bin/certmgr

Install zlib-dev

sudo apt-get install zlib1g-dev

Install Oshell

Create Oshell installation directory

$ sudo mkdir /opt/oshell
$ cd /opt/oshell
$ sudo wget -c https://resources.omicsoft.com/software_update/OmicSoftServiceUpdater.exe
$ sudo touch oshell.exe

To run OmicSoft Server as a non-privileged user (ubuntu, not root), that user must be made owner of all OmicSoft-related folders:

$ sudo chown -R ubuntu:ubuntu /opt/oshell/

Run Omicsoft Service Updater

$ mono ./OmicSoftServiceUpdater.exe

Check Oshell was installed successfully

$ cd /opt/oshell
$ mono ./oshell.exe --version
OShell version=12.1.X.X


Install Oshell on Amazon Linux 2022

Amazon Linux 2022 is the next generation of Amazon Linux from AWS. It is still in preview, see https://docs.aws.amazon.com/linux/al2022/ug/what-is-amazon-linux.html for more details.

Install Mono

Install Mono 6.12 from the official repository for CentOS/RHEL {{{version}}} (https://www.mono-project.com/download/stable/#download-lin-centos). Installing Mono by compiling it from sources is no longer necessary.

Add the Mono repository to your system:

$ sudo rpmkeys --import "http://keyserver.ubuntu.com/pks/lookup?op=get&search=0x3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF"
$ sudo su -c 'curl https://download.mono-project.com/repo/centos{{{version}}}-stable.repo | tee /etc/yum.repos.d/mono-centos{{{version}}}-stable.repo'

Install Mono 6 from repository:

$ sudo yum install mono-complete
  
$ which mono
/usr/bin/mono
   
$ mono --version
Mono JIT compiler version 6.12.0.107 (tarball Wed Dec 9 21:44:58 UTC 2020)
...

Create symlinks for the Mono 6 directories (this is needed for backward compatibility):

$ sudo mkdir /opt/mono-6.12.0
$ sudo mkdir /opt/mono-6.12.0/bin
$ sudo ln -s /usr/bin/mono /opt/mono-6.12.0/bin/mono
$ sudo ln -s /usr/bin/mono-sgen /opt/mono-6.12.0/bin/mono-sgen
$ sudo ln -s /usr/bin/cert-sync /opt/mono-6.12.0/bin/cert-sync
$ sudo ln -s /usr/bin/certmgr /opt/mono-6.12.0/bin/certmgr

Install Oshell

Create Oshell installation directory

$ sudo mkdir /opt/oshell
$ cd /opt/oshell
$ sudo wget -c https://resources.omicsoft.com/software_update/OmicSoftServiceUpdater.exe
$ sudo touch oshell.exe

To run OmicSoft Server as a non-privileged user (ec2-user, not root), that user must be made owner of all OmicSoft-related folders:

$ sudo chown -R ec2-user:ec2-user /opt/oshell/

Run Omicsoft Service Updater

$ mono ./OmicSoftServiceUpdater.exe

Check Oshell was installed successfully

$ cd /opt/oshell
$ mono ./oshell.exe --version
OShell version=12.1.X.X

To install Oshell for older Amazon Linux 2 see below topics in Amazon Linux 2 Kernel 5.10 Array Server AMI Setup Notes:

  • Install Mono 6.12.0.122
  • Add appropriate SSL certificates
  • Install Oshell


Install OShell on MacOS

Oshell is not officially supported on MacOS.

Getting Started

Check Oshell Version

Get Oshell version

$ mono oshell.exe

You will get something like:

--------------------------------------------------------------------------------
Version: 12.1.0.10
Analysis mode not specified
--------------------------------------------------------------------------------

Keep updated

User can always update Oshell to our latest development using OmicSoftServiceUpdater.

$ mono OmicSoftServiceUpdater.exe

Run OmicScript in Oshell

If you have an OmicScript ready, it can be executed by

mono oshell.exe --runscript Base_Dir Script_path Temp_Dir Mono_Path > PathToRun.log

where

  • Base_Dir is the path to Oshell base directory where the ReferenceLibrary folder should be located, e.g. /opt/omicsoft Note, this is equivalent to the OmicsoftDirectory in ArrayServer.cfg
  • Script_path is the path to the oshell script, e.g. /opt/omicsoft/test/run.oscript
  • Temp_Dir is the path to a directory storing temporary files, e.g. /scratch
  • Mono_Path is the path to the mono so that Oshell will remember during the run, e.g. /opt/omicsoft/mono/mono
  • PathToRun.log is the path to the log file recording all logs, e.g. /opt/omicsoft/test/run.oscript.log

Note: The mono command is not required in Windows OS.

If running on a machine with Array Studio or ArrayServer, BaseDir and TempDir can use existing directories (i.e. no need to specify a second BaseDirectory for oshell to hold separate genome references/gene models etc).

In the section below, we will provide more details about How to write OmicScript.

Build genome reference index and gene model

In most of NGS functions, Oshell requires the user to have a reference genome and a gene model built prior to running the actual functions. The indexing needs to be generated only once for each reference. By default, when it is the first time to run jobs using certain reference and gene model, the program will automatically download a compiled genome and gene model.

User has to specifies the right name for the reference genome and gene model. See A list of compiled genome and gene model from OmicSoft. For example, if we run alignment detection with Human.B37.3 and RefGene model using the OmicScript for Alignment. It will download the Human.B37.3 and RefGene model in your local folder. You will find folders under the Base_Dir:

Base_Dir
--ReferenceLibrary
---- Human.B37.3.dreflib1
---- Human.B37.3.gindex1
---- Human.B37.3_GeneModels
---- ---- RefGene.gmodel2

Users can choose to build their own reference library, it is recommended to use Oshell --runscript with OmicScript functions: BuildReferenceLibrary and BuildGeneModel, see example below.

If users want to use the command line directly, please read Build Reference Library and Gene Model through Oshell subcommand.

OmicScript

If you have ArrayStudio software, please read Generate and run OmicScript in ArrayStudio GUI. Other users can write OmicScript based on our OmicScript Collection. We will provide some examples below.

OmicScript to build reference index and gene model

 Begin BuildReferenceLibrary /Namespace=NgsLib;
 Reference Reference_library_id;
 Files "/pathToFile/reference.fa";
 Options /cDNA=False /ReverseComplement=False /Build64BitIndex=True /Build32BitIndex=False /Species=Unspecified /NcbiBuild=1.0;
 End;

 Begin BuildGeneModel /Namespace=NgsLib;
 Reference Reference_library_id;
 GeneModel Gene_model_id;
 Files "/pathToFile/genemodel.gtf";
 Options /AppendChr=False /BuildGeneLevelAnnotation=True /BuildTranscriptLevelAnnotation=True;
 End;

Save above script into buildIndex.oscript and run the script using

mono oshell.exe --runscript Base_Dir Script_path/buildIndex.oscript Temp_Dir Mono_Path

OmicScript for OmicSoft Alignment

Details about OmicSoft Aligner (OSA) are in the following publication:

Hu, Jun, et al. "OSA: a fast and accurate alignment tool for RNA-Seq." Bioinformatics 28.14 (2012): 1933-1934.

We have migrated the OSA to Oshell environment. Because Oshell is a project-based environment as described at the top of this page, the RNA-Seq alignment function MapRnaSeqReadsToGenome has to be wrapped by NewProject (create the environment) and SaveProject, CloseProject (closes the environment). This will create a project in which the alignment will be performed and where output files will be managed:

Begin NewProject;
File "/test/omicsoft/AlignmentProject.osprj";
Options /Distributed=True;
End;

Begin MapRnaSeqReadsToGenome /Namespace=NgsLib;
Files 
"/pathToFile/SampleA_1.fastq.gz
/pathToFile/SampleA_2.fastq.gz
/pathToFile/SampleB_1.fastq.gz
/pathToFile/SampleB_2.fastq.gz";
Reference Human.B37.3;
GeneModel RefGene;
Trimming /Mode=TrimByQuality /ReadTrimQuality=2;
Options /ParallelJobNumber=2 /PairedEnd=True /FileFormat=AUTO /AutoPenalty=True /FixedPenalty=2 /Greedy=false /IndelPenalty=2 
/DetectIndels=False /MaxMiddleInsertionSize=10 /MaxMiddleDeletionSize=10 /MaxEndInsertionSize=10 /MaxEndDeletionSize=10 /MinDistalEndSize=3 
/ExcludeNonUniqueMapping=False /ReportCutoff=10 /WriteReadsInSeparateFiles=True /OutputFolder="/test/omicsoft/AlignmentProject/BAMOutput" 
/GenerateSamFiles=False /ThreadNumber=6 /InsertSizeStandardDeviation=40 /ExpectedInsertSize=300 /InsertOnSameStrand=False 
/InsertOnDifferentStrand=True /QualityEncoding=Automatic /CompressionMethod=Gzip /Gzip=True /SearchNovelExonJunction=True /ExcludeUnmappedInBam=False;
Output Alignment;
End;

Begin SaveProject;
Project AlignmentProject;
File "/test/omicsoft/AlignmentProject.osprj";
End;

Begin CloseProject;
Project AlignmentProject;
End;

Save above script into Alignment.oscript and run the script using

mono oshell.exe --runscript Base_Dir Script_path/Alignment.oscript Temp_Dir Mono_Path

When Oshell is run in standalone mode on a single workstation, multiple alignment or summary jobs are automatically spawned off so that each job occupies one process using multiple threads. Here /ParallelJobNumber=2 /ThreadNumber=6, two samples will run simultaneously, each will use 6 threads.

For details about each parameters, please read articles: MapRnaSeqReadsToGenome, NewProject, SaveProject and CloseProject.

OmicScript for FusionMap

Details about FusionMap are in the following publication:

Ge, H, et al. "FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution." Bioinformatics 27.14 (2011): 1922-1928.

We have migrated the FusionMap to Oshell environment, with the MapFusionReads function. Because Oshell is a project-based environment as described at the top of this page, the MapFusion Reads function has to be wrapped by NewProject (create the environment) and SaveProject, CloseProject (closes the environment). This will create a project in which the alignment will be performed and where output reports will be managed:

Begin NewProject;
File "/test/omicsoft/FusionDetection.osprj";
Options /Distributed=True;
End;

Begin MapFusionReads /Namespace=NgsLib;
Files 
"/pathToData/Illumina.Paired.1.fastq.gz
/pathToData/Illumina.Paired.2.fastq.gz";
Reference Human.B37.3;
GeneModel RefGene;
Trimming /Mode=TrimByQuality /ReadTrimQuality=2;
Options /FusionVersion=2 /ParallelJobNumber=4 /PairedEnd=False /RnaMode=True /FileFormat=BAM /AutoPenalty=True 
/FixedPenalty=2 /OutputFolder="/ouput/xxxx" /MaxMiddleInsertionSize= /ThreadNumber=2 
/QualityEncoding=Automatic /CompressionMethod=None /Gzip=False /FilterUnlikelyFusionReads=False 
/FullLengthPenaltyProportion=8 /OutputFusionReads=True /MinimalHit=4 /MinimalFusionAlignmentLength=0 
/MinimalFusionSpan=0 /FusionReportCutoff=1 /ReportUnannotatedFusion=False 
/NonCanonicalSpliceJunctionPenalty=2 /RealignToGenome=True;
Output FusionDetection;
End;

Begin ExportView;
Project FusionDetection;
OutputFolder "/test/omicsoft/FusionDetection/Results";
End;
 
Begin SaveProject;
Project FusionDetection;
File "/test/omicsoft/FusionDetection.osprj";
End;

Begin CloseProject;
Project FusionDetection;
End;

Also Read:

OmicScript pipeline for RNA-Seq data analysis

Please read OmicScript pipeline for RNA-Seq data analysis, the pipeline includes the alignment, fusion detection, mutation detection and many other steps.

OmicScript pipeline for DNA-Seq data analysis

Please read OmicScript pipeline for DNA-Seq data analysis

Deploy Oshell in Cluster

Use build-in scheduler

When Oshell is run in cluster mode on a grid engine, each job occupies one spot (one or more slots based on the thread number setting and cluster queue setting). The built-in scheduling system supports both SGE and PBS which can accelerate the analysis of tremendous amount of RNA-Seq data.

Oshell uses SetEnvironment function to set up the cluster for Oshell jobs. Here is one example of OmicScript which will schedule jobs to cluster, monitor the process of each job, handle running logs from multiple jobs, summarize jobs outputs into one Oshell project.

Example OmicScript running on SGE

#Enable cluster
Begin SetEnvironment;
Cluster /EnableCluster=True /ClusterAlignmentPath="/Oshell/ClusterAlignment.sh" /ClusterSummaryPath="/Oshell/ClusterSummary.sh" 
/ClusterParallelEnvironment=peomics /ClusterParallelRatioFactor=1 /ClusterQueueName=all.q /ClusterGridEngine=SGE 
/DefaultClusterJobNumber=12
End;

#Create the Oshell project environment
Begin NewProject;
File "/test/AlignmentTest/OshellClusterTest.osprj";
Options /Distributed=true;
End;

#Alignment
Begin MapRnaSeqReadsToGenome /Namespace=NgsLib;
Files 
"
/TestDataSets/HumanRNASeqPaired/SRR327893.subset.1.fastq.gz
/TestDataSets/HumanRNASeqPaired/SRR327893.subset.2.fastq.gz
/TestDataSets/HumanRNASeqPaired/SRR065521.subset.1.fastq.gz
/TestDataSets/HumanRNASeqPaired/SRR065521.subset.2.fastq.gz
/TestDataSets/HumanRNASeqPaired/simulationread200PE_1.fastq.gz
/TestDataSets/HumanRNASeqPaired/simulationread200PE_2.fastq.gz
/TestDataSets/HumanRNASeqPaired/simulationread400PE_1.fastq.gz
/TestDataSets/HumanRNASeqPaired/simulationread400PE_2.fastq.gz
";
Reference Human.B37.3;
GeneModel RefGene;
Trimming  /Mode=TrimByQuality /ReadTrimQuality=2;
Options  /ParallelJobNumber=4 /PairedEnd=True /FileFormat=AUTO /AutoPenalty=True
/FixedPenalty=2 /Greedy=false /IndelPenalty=2 /DetectIndels=False /MaxMiddleInsertionSize=10 /MaxMiddleDeletionSize=10
/MaxEndInsertionSize=10 /MaxEndDeletionSize=10 /MinDistalEndSize=3 /ExcludeNonUniqueMapping=False /ReportCutoff=10 
/WriteReadsInSeparateFiles=True /OutputFolder="/test/AlignmentTest/OshellClusterTest/BAMFiles" /GenerateSamFiles=False 
/ThreadNumberPerJob=4 /InsertSizeStandardDeviation=40 /ExpectedInsertSize=300 /InsertOnSameStrand=False 
/InsertOnDifferentStrand=True /QualityEncoding=Automatic /CompressionMethod=Gzip /Gzip=True /SearchNovelExonJunction=True /ExcludeUnmappedInBam=False;
Output primary_alignment;
End;

# save OmicSoft project enviroment
Begin SaveProject;
Project OshellClusterTest;
File "/test/AlignmentTest/OshellClusterTest.osprj";
End;

# close Oshell project enviroment
Begin CloseProject;
Project OshellClusterTest;
End;
SGE Cluster jobs scheduled by Oshell.

Also Reads: SetEnvironment, ClusterAlignmentPath and ClusterSummaryPath.

Wrap Oshell to cluster jobs

User can also wrap Oshell jobs in qsub script, such as the one below for SGE. It gives users greater controls on job submission since the default job scheduler using SetEnvironment has limited options. Users do not have to SetEnvironment in Oscript using this method.

#!/bin/bash
#
# SGE submission options
#$ -q all.q                   # Select the queue
#$ -o /home/ge/job.o
#$ -e /home/ge/job.e
#$ -N test                    # A name for the job
#$ -pe smp 1                  # Select the parallel environment

# Run Oshell projects
MONO=/[path where mono was installed]/bin/mono
OSHELL=/App/omicsoft/Oshell/oshell.exe
BASEDIR=/App/omicsoft
TMP=/scratch
OSCRIPT=/App/Oscirpt/runpipeline.oscript
LOG=/App/Oscirpt/runpipeline.log
"$MONO" "$OSHELL" --runscript "$BASEDIR" "$OSCRIPT" "$TMP" "$MONO" > "$LOG"

Oshell subcommand

In the previous version, Oshell provides individual subcommand to run each function, such as

  • oshell.exe --buildref to build reference
  • oshell.exe --buildgm to build gene model
  • oshell.exe --alignrna to do RNA-Seq alignment
  • oshell.exe --semap to do fusion alignment
  • For more, please read Oshell subcommand

We have completely migrated the Oshell to work in environment setting as described in this article. The development of these subcommands has been discontinued. We only support these subcommands through the end of year 2013.

Oshell Land R API

The Land R API functions are provided to users who want to query land data using R. For more informations please see: Land_R_API_with_Omicsoft_v12.

License

Commercial users: please contact bioinformaticssales@qiagen.com to get a license.

Publication

RNA-Seq Analysis Pipeline Based on Oshell Environment

Citation

@null{6808521, 
author={Li, J. and Hu, J. and Newman, M. and Liu, K. and Ge, H.}, 
journal={Computational Biology and Bioinformatics, IEEE/ACM Transactions on}, 
title={RNA-Seq Analysis Pipeline Based on Oshell Environment}, 
year={2014}, 
month={}, 
volume={PP}, 
number={99}, 
pages={1-1}, 
doi={10.1109/TCBB.2014.2321156}, 
ISSN={1545-5963},}