Dockerized Kallisto Cloud Command

From Array Suite Wiki

Kallisto Index

Begin RunEScript;
Files
"/Users/joseph/Kallisto/gencode.v24.transcriptsOnly.fa";
EScriptName KallistoIndex;
Command kallisto index -i "%OutputFolder%%FileNameNoExt%.KallistoIndex.idx" "%FilePath%";
Options /ParallelJobNumber=1 /ThreadNumber=8 /Mode=Single /RunOnDocker=True /ImageName="omicdocker/kallisto:testing" /OutputFolder="/Users/joseph/Kallisto";
End;

Notes on Kallisto Index: If using an input set of transcript sequences such as can be found at https://www.gencodegenes.org/human/, the FASTA headers include a lot of information beyond the Transcript ID, which will make it difficult to map to the OmicSoft transcript annotations, which use the Transcript ID alone.

>ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|processed_transcript| GTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTC

To make an OmicSoft-compatible Kallisto index, we recommend pre-processing the FASTQ file to remove the additional information.

sed 's/|.*//g' gencode.v33.transcripts.fa > gencode.v33.transcriptsOnly.fa

This will remove everything except the TranscriptID from each header:

>ENST00000456328.2
GTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTC


Kallisto Quant

Begin RunEScript /RunOnServer=True;
Resources
"/Path/To/Kallisto/Index.idx";
Files
"/Path/To/SampleA.1.fastq"
"/Path/To/SampleA.2.fastq"
"/Path/To/SampleB.1.fastq"
"/Path/To/SampleB.2.fastq";
EScriptName KallistoQuant;
Command kallisto quant -i "%Resource1%" -t 2 -o "%OutputFolder%" -b 10 %FilePath1% %FilePath2% 2>&1;
Options /ParallelJobNumber=2 /ThreadNumberPerJob=2 /Mode=Paired /InstanceType="m4.large" /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /ImageName="omicdocker/kallisto:testing" /UseCloud=True /OutputFolder="/Path/To/Output/%PairName%";
Output "/Path/To/Output/%PairName%/abundance.tsv => /Path/To/Output/%PairName%_abundance.tsv" /Type=tsv;
End;