Dockerized Kallisto Cloud Command
From Array Suite Wiki
Kallisto Index
Begin RunEScript; Files "/Users/joseph/Kallisto/gencode.v24.transcriptsOnly.fa"; EScriptName KallistoIndex; Command kallisto index -i "%OutputFolder%%FileNameNoExt%.KallistoIndex.idx" "%FilePath%"; Options /ParallelJobNumber=1 /ThreadNumber=8 /Mode=Single /RunOnDocker=True /ImageName="omicdocker/kallisto:testing" /OutputFolder="/Users/joseph/Kallisto"; End;
Notes on Kallisto Index: If using an input set of transcript sequences such as can be found at https://www.gencodegenes.org/human/, the FASTA headers include a lot of information beyond the Transcript ID, which will make it difficult to map to the OmicSoft transcript annotations, which use the Transcript ID alone.
>ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|processed_transcript| GTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTC
To make an OmicSoft-compatible Kallisto index, we recommend pre-processing the FASTQ file to remove the additional information.
sed 's/|.*//g' gencode.v33.transcripts.fa > gencode.v33.transcriptsOnly.fa
This will remove everything except the TranscriptID from each header:
>ENST00000456328.2 GTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTC
Kallisto Quant
Begin RunEScript /RunOnServer=True; Resources "/Path/To/Kallisto/Index.idx"; Files "/Path/To/SampleA.1.fastq" "/Path/To/SampleA.2.fastq" "/Path/To/SampleB.1.fastq" "/Path/To/SampleB.2.fastq"; EScriptName KallistoQuant; Command kallisto quant -i "%Resource1%" -t 2 -o "%OutputFolder%" -b 10 %FilePath1% %FilePath2% 2>&1; Options /ParallelJobNumber=2 /ThreadNumberPerJob=2 /Mode=Paired /InstanceType="m4.large" /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /ImageName="omicdocker/kallisto:testing" /UseCloud=True /OutputFolder="/Path/To/Output/%PairName%"; Output "/Path/To/Output/%PairName%/abundance.tsv => /Path/To/Output/%PairName%_abundance.tsv" /Type=tsv; End;