ExportNgsData.pdf

From Array Suite Wiki

Export

The Export module can be used to export NGS Read data to a variety of output formats, including SAM, BAM, FASTQ_GZ, FASTA, BED etc.

Ngs ExportData Menu.png

Input Data Requirements

This function works on NgsData objects.

General

ExportNGS.png

Input/Output

  • The module can be run on all references, customized references, or a filtered by region set of references. In most cases, references will be chromosomes.
  • The module can also be run on all observations, visible observations, selected observations, or customized observations.

Options

  • Output format - The user can specify the desired output format.

ExportNGS option.png

For example, selecting "UNPAIRED+UNMAPPED_FASTQ_GZ" will extract unmapped and unpaired reads, and save them in fastq.gz format. Output options separated by + indicate multiple output files, for each descriptor.

  • SAM: Extracts all entries to Sequence Alignment/MAP (SAM) alignment format.
  • BAM: Extracts all entries to Binary SAM (BAM) alignment format.
  • FASTQ_GZ: Extracts all entries to unaligned FASTQ format.
  • UNPAIRED+UNMAPPED_FASTQ_GZ: For paired-end data, will extract all unmapped reads, including the unmapped read from a "singleton" read-pair.
  • FASTA: Extracts all entries to FASTA format.
  • MAPPED_PAIRED: Extracts all entries where both reads of a pair were properly mapped, as BAM format.
  • MAPPED+UNMAPPED: For single-end data, extracts all mapped and unmapped reads as separate files.
  • PAIRED+UNPAIRED+UNMAPPED: For paired-end data, extracts all mapped/paired (both reads mapped), the mapped read from a "singleton" read pair, and unmapped entries (including unmapped reads from a "singleton" read pair).
  • BED: Extracts coverage data for every entry, as BED format.
  • UNMAPPED: Extracts all unmapped reads (entries) to FASTQ format.
  • UNMAPPED_BAM: Extracts all unmapped reads (entries) to BAM format, using the following UNMAPPED_BAM logic.
Export Function File Extension Both reads mapped Read 1 mapped, Read 2 unmapped Neither Read mapped
SAM .SAM Read 1, Read 2 Read 1, Read 2 Read 1, Read 2
BAM .BAM Read 1, Read 2 Read 1, Read 2 Read 1, Read 2
FASTQ_GZ .1.FASTQ Read 1 Read 1 Read 1
FASTQ_GZ .2.FASTQ Read 2 Read 2 Read 2
UNPAIRED+UNMAPPED_FASTQ_GZ .1.FASTQ Read 1
UNPAIRED+UNMAPPED_FASTQ_GZ .2.FASTQ Read 1 Read 2
FASTA .FASTA Read 1, Read 2 Read 1, Read 2 Read 1, Read 2
MAPPEDPAIRED .1.FASTQ Read 1
MAPPEDPAIRED .2.FASTQ Read 2
MAPPED+UNMAPPED UnmappedFastq .UNMAPPED.FASTQ Read 2 Read 1, Read 2
MAPPED+UNMAPPED MappedSam .MAPPED.SAM Read 1, Read 2 Read 1
PAIRED+UNPAIRED+UNMAPPED Paired.Sam .PAIRED.SAM Read 1, Read 2
PAIRED+UNPAIRED+UNMAPPED Unpaired.Sam .UNPAIRED.SAM Read 1
PAIRED+UNPAIRED+UNMAPPED Unmapped.fastq .UNMAPPED.FASTQ Read 2 Read 1, Read 2
BED .BED Read 1, Read 2 Read 1
UNMAPPED .FASTQ Read 2 Read 1, Read 2
UNMAPPED_BAM .BAM (*Special Logic) Special Logic Special Logic

Export unmapped reads

The option to "keep unmapped reads" must be checked during alignment in order to recover these reads using the Export NGS data option later on.

It can be used in mapping xenograft sequencing data. So after aligning xenograft sequencing to one genome (say mouse), then using this function will output fastq.gz file(s) with unpaired and unmapped reads. Then the fastq.gz file can be aligned to another genome (say human) to get cleaner result.

ExportNGS unmappedreads.png

Other Options

  • Exclude singletons (for paired-end mode) - Will not export reads where only one read of the pair mapped to the same region.
  • Exclude multi-reads - Multi reads are considered non-unique (i.e. reads that align to multiple genomic locations with equal or similar numbers of mismatches). Selecting this option will include unique reads only when exporting the data .
  • Output folder - User can specify for the output folder.

OmicScript

ExportNgsData