Install OmicSoft v12 on Amazon Linux 2 for OmicSoft Cloud Computing

From Array Suite Wiki

Build AWS AMI for OmicSoft Cloud Computing

Tips.pngOmicSoft generally recommends using our pre-built Compute AMIs.

OmicSoft Cloud uses customized Amazon Linux-based or Ubuntu-based Amazon Machine Images (AMIs) as the template for creating compute instances for running cloud-based analyses. These AMIs require a few key components, including the Oshell engine, packages for transferring data to and from AWS S3 storage, and Docker.

OmicSoft provides default AMIs for cloud computing. Alternatively, users can build their own AMI and then follow the configuration steps below. In order to use a specific AMI for cloud computing, one must explicitly specify the preferred AMI in ArrayServer.cfg in the Ami and AmiSnapshot parameters.

If you build your own AMI, the AMI should be built following the instructions below exactly, including the file paths to software tools. Please be aware that OmicSoft support can only provide limited advice on troubleshooting custom AMIs, because of the large number of configuration options that can impact performance.

AWS Prerequisites

This section only applies when setting up OmicSoft products on an AWS EC2 instance from scratch. Skip this section if installing OmicSoft products on-premises.

Start a new Amazon Linux 2 instance using an appropriate AWS AMI with the following options:

Connect to the instance using SSH and the ec2-user user.

Tips.png The original 30 GiB size of the cloud computing AMI must be maintained for backwards compatibility.


Required Linux packages

Before installing any required packages, please ensure there are no broken packages/dependencies on your system. Install the following required packages:

# Apply existing package updates, including security patches:
$ sudo yum update

# Various utilities:
$ sudo yum install ca-certificates curl unzip wget

# Build tools:
$ sudo yum install git gcc gcc-c++ make cmake

# Dependencies for installing other packages (such as R packages) from source:
$ sudo yum install libcurl-devel libxml2-devel openssl-devel libpng-devel libjpeg-turbo-devel zlib-devel

Install the AWS CLI v1 as described in https://docs.aws.amazon.com/cli/v1/userguide/install-linux.html#install-linux-pip:

# Prerequisites: Python 3.6+, pip3.
$ sudo yum install python3 python3-pip
 
# Check that pip3 was installed successfully:
$ pip3 --version
pip 20.2.2 from /usr/lib/python3.7/site-packages/pip (python 3.7)

# Install AWS CLI v1
$ sudo pip3 install awscli --upgrade

# Check the AWS CLI was installed successfully:
$ aws --version
aws-cli/1.22.65 Python/3.9.6 Linux/4.18.0-348.12.2.el8_5.x86_64 botocore/1.24.10

Install Mono 6

Install Mono 6.12 from the official repository for CentOS/RHEL 7 (https://www.mono-project.com/download/stable/#download-lin-centos). Installing Mono by compiling it from sources is no longer necessary.

Add the Mono repository to your system:

$ sudo rpmkeys --import "http://keyserver.ubuntu.com/pks/lookup?op=get&search=0x3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF"
$ sudo su -c 'curl https://download.mono-project.com/repo/centos7-stable.repo | tee /etc/yum.repos.d/mono-centos7-stable.repo'

Install Mono 6 from repository:

$ sudo yum install mono-complete
  
$ which mono
/usr/bin/mono
   
$ mono --version
Mono JIT compiler version 6.12.0.107 (tarball Wed Dec 9 21:44:58 UTC 2020)
...

Create symlinks for the Mono 6 directories (this is needed for backward compatibility):

$ sudo mkdir /opt/mono-6.12.0
$ sudo mkdir /opt/mono-6.12.0/bin
$ sudo ln -s /usr/bin/mono /opt/mono-6.12.0/bin/mono
$ sudo ln -s /usr/bin/mono-sgen /opt/mono-6.12.0/bin/mono-sgen
$ sudo ln -s /usr/bin/cert-sync /opt/mono-6.12.0/bin/cert-sync
$ sudo ln -s /usr/bin/certmgr /opt/mono-6.12.0/bin/certmgr

Install libgdiplus

The libgdiplus package should have been already installed alongside Mono:

$ ldconfig -p | grep libgdiplus
   
libgdiplus.so.0 (libc6,x86-64) => /lib64/libgdiplus.so.0
libgdiplus.so (libc6,x86-64) => /lib64/libgdiplus.so

In the unlikely case that libgdiplus was not installed alongside Mono, follow these steps to install it manually.

Check the Mono configuration file, normally located at /etc/mono/config, and make sure the following entries are present at the end of the file, before the closing </configuration> tag:

<configuration>
...
    <dllmap dll="gdiplus" target="libgdiplus.so" os="!windows"/>
    <dllmap dll="gdiplus.dll" target="libgdiplus.so" os="!windows"/>
    <dllmap dll="gdi32" target="libgdiplus.so" os="!windows"/>
    <dllmap dll="gdi32.dll" target="libgdiplus.so" os="!windows"/>
    <dllmap dll="gdiplus.dll" target="/lib64/libgdiplus.so.0"/>
</configuration>

If not, add them using:

$ sudo vi /etc/mono/config

Install SQLite

Install SQLite v3.7.14.1 (2012-10-04) by building it from source:

# Download SQLite source archive:
$ cd /opt
$ sudo wget -c http://www.sqlite.org/sqlite-autoconf-3071401.tar.gz
$ sudo tar zxvf sqlite-autoconf-3071401.tar.gz
$ sudo mv sqlite-autoconf-3071401 sqlite
$ cd sqlite

# Build and install:
$ sudo ./configure --prefix=/opt/sqlite
$ sudo make
$ sudo make install

# Check SQLite was installed successfully:
$ sudo ./sqlite3 --version

# Clean up:
$ sudo rm /opt/sqlite-autoconf-3071401.tar.gz
$ cd ~

Set ulimit

Increase ulimit as described at Setting up ulimit for ArrayServer

Increase Memory Map Counts

If your cloud jobs use extremely large input files (e.g. hundreds of millions of reads), memory map counts must be increased in order to avoid running out of memory. This is done by setting the vm.max_map_count configuration to a value higher than the 65530 default. The recommended value is 655350.

# Create a new configuration file:
$ sudo touch /etc/sysctl.d/12-omicsoft.conf

# Add vm.max_map_count=655350 by editing the file:
$ sudo vi /etc/sysctl.d/12-omicsoft.conf

# Your file should look similar to:
$ sudo cat /etc/sysctl.d/12-omicsoft.conf
...
# Increase memory map limits to prevent jobs using
# extremely large files from running out of memmory
vm.max_map_count=655350
...

To check that the new value is persisted on reboot, restart your machine and execute:

$ cat /proc/sys/vm/max_map_count
655350

Install Docker

Install Docker as described in https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-container-image.html#create-container-image-prerequisites.

Configure Docker to start on boot: https://docs.docker.com/engine/install/linux-postinstall/#configure-docker-to-start-on-boot.

Allow non-root users to run Docker commands (required as part of certain EScripts, for example), as described at https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user.

Install R 4.0.4

Enable Extra Packages for Enterprise Linux (EPEL):

$ sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Install R v4.0.4:

# Install R
$ cd ~
$ export R_VERSION=4.0.4
$ curl -O https://cdn.rstudio.com/r/centos-8/pkgs/R-${R_VERSION}-1-1.x86_64.rpm
$ sudo yum install R-${R_VERSION}-1-1.x86_64.rpm
   
# Create symlinks to ensure R is available on the default system PATH variable
$ sudo rm /usr/local/bin/R
$ sudo ln -s /opt/R/${R_VERSION}/bin/R /usr/local/bin/R
$ sudo rm /usr/local/bin/Rscript
$ sudo ln -s /opt/R/${R_VERSION}/bin/Rscript /usr/local/bin/Rscript
   
$ R --version
R version 4.0.4 (2021-02-15) -- "Lost Library Book"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
   
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.
   
# Ensure R is available in PATH for the root user
$ sudo vi /etc/sudoers
# Find the 'Defaults secure_path=' configuration
# Append :/usr/local/bin to the secure_path value if not already present
# Save the file and exit
   
$ sudo R --version
R version 4.0.4 (2021-02-15) -- "Lost Library Book"
   
# Start the R REPL in preparation for the next step
$ sudo R

Install the required R packages:

// R REPL environment:
> if (!requireNamespace("BiocManager", quietly = TRUE))
         install.packages("BiocManager")
// This should allways install the BiocManger version matching the R version.
> packageVersion("BiocManager")
[1] ‘1.30.16’
    
> BiocManager::install("impute")
> packageVersion("impute")
[1] ‘1.64.0’
    
// Required in order to install older versions of other packages (the version of the package itself is not that important):
> install.packages("versions")
> packageVersion("versions")
[1] ‘0.3’
   
> versions::install.versions("samr", '3.0')
> packageVersion("samr")
[1] ‘3.0’
   
> BiocManager::install("limma")
> packageVersion("limma")
[1] ‘3.46.0’
    
> versions::install.versions("Rtsne", "0.15")
> packageVersion("Rtsne")
[1] ‘0.15’
   
// Check if voom is installed:
> library(limma)
> ?voom

> versions::install.versions("askpass", "1.1")
> packageVersion("askpass")
[1] ‘1.1’

> versions::install.versions("openssl", "1.4.4")
// If the above fails with the message "The current version and publication date of openssl could not be detected", try running > install.packages("https://cran.r-project.org/src/contrib/Archive/openssl/openssl_1.4.4.tar.gz", repos=NULL, type="source")
> packageVersion("openssl")
[1] ‘1.4.4’
   
> versions::install.versions("curl", "4.3.2")
// If the above fails with the message "package ‘curl’ is not available for this version of R", try running > install.packages("https://cran.r-project.org/src/contrib/curl_4.3.2.tar.gz", repos=NULL, type="source")
> packageVersion("curl")
[1] ‘4.3.2’
   
> versions::install.versions("httr", "1.4.2")
> packageVersion("httr")
[1] ‘1.4.2’
   
> versions::install.versions("spatstat", "1.64-1")
> packageVersion("spatstat")
[1] ‘1.64.1’
   
> versions::install.versions("uwot", "0.1.11")
> packageVersion("uwot")
[1] ‘0.1.11’
   
> versions::install.versions("Seurat", "3.2.3")
> packageVersion("Seurat")
[1] ‘3.2.3’
   
// Latest locfit version that doesn't require R v4.1:
> versions::install.versions("locfit", "1.5-9.4")
> packageVersion("locfit")
[1] ‘1.5.9.4’
   
// Minimum version required by DESeq2
> versions::install.versions("matrixStats", "0.60.1")
> packageVersion("matrixStats")
[1] ‘0.60.1’
   
> BiocManager::install("DESeq2")
> packageVersion("DESeq2")
[1] ‘1.30.1’
    
> BiocManager::install("edgeR")
> packageVersion("edgeR")
[1] ‘3.32.1’

// Installing "lumi" v2.42.0 fails if attempted on Amazon Linux 2, so feel free to skip it.
> BiocManager::install("lumi")
> packageVersion("lumi")
[1] ‘2.42.0’
    
> versions::install.versions("jpeg", "0.1-9")
> packageVersion("jpeg")
[1] ‘0.1.9’
   
> versions::install.versions("latticeExtra", "0.6-29")
> packageVersion("latticeExtra")
[1] ‘0.6.29’
   
> versions::install.versions("Hmisc", "4.5-0")
> packageVersion("Hmisc")
[1] ‘4.5.0’

Known Issues on Amazon Linux 2

  1. Installing version 2.42.0 of the lumi Bioconductor R package can fail with the following error message: "C++14 standard requested but CXX14 is not defined". If you absolutely need to install lumi on your Amazon Linux 2 instance, you can attempt the following workaround and then re-try installing the package:
# Create the following file, including the .R parent directory (if you don't have them already):
$ sudo touch /root/.R/Makevars
# Add the following C++ flags: CXX14=g++ and CXX14PICFLAGS=-fPIC. Your file should look similar to:
$ sudo cat /root/.R/Makevars
...
# C++ compiler flags
CXX14=g++
CXX14PICFLAGS=-fPIC
...

Install OmicSoft packages

Install Oshell

Create Oshell installation directory

$ sudo mkdir /opt/oshell
$ cd /opt/oshell
$ sudo wget -c https://resources.omicsoft.com/software_update/OmicSoftServiceUpdater.exe
$ sudo touch oshell.exe

To run OmicSoft Server as a non-privileged user (ec2-user, not root), that user must be made owner of all OmicSoft-related folders:

$ sudo chown -R ec2-user:ec2-user /opt/oshell/

Run Omicsoft Service Updater

$ mono ./OmicSoftServiceUpdater.exe

Check Oshell was installed successfully

$ cd /opt/oshell
$ mono ./oshell.exe --version
OShell version=12.1.X.X

Install Certificates in Mono Store

Update certificate stores:

$ sudo update-ca-trust

Import certificates from the Linux store into the Mono root CA stores:

$ cert-sync --user /etc/pki/tls/certs/ca-bundle.crt
$ sudo cert-sync /etc/pki/tls/certs/ca-bundle.crt

Configure intermediate CA certificates for https://resources.omicsoft.com:

$ certmgr --ssl https://resources.omicsoft.com
$ sudo certmgr --ssl -m https://resources.omicsoft.com

Verify if the machine and user root CA stores have been populated:

$ certmgr -list -c -m Trust
$ sudo certmgr -list -c -m Trust