The Export module can be used to export NGS Read data to a variety of output formats, including SAM, BAM, FASTQ_GZ, FASTA, BED etc.
Input Data Requirements
This function works on NgsData objects.
- The module can be run on all references, customized references, or a filtered by region set of references. In most cases, references will be chromosomes.
- The module can also be run on all observations, visible observations, selected observations, or customized observations.
- Output format - The user can specify the desired output format.
For example, selecting "UNPAIRED+UNMAPPED_FASTQ_GZ" will extract unmapped and unpaired reads, and save them in fastq.gz format. Output options separated by + indicate multiple output files, for each descriptor.
- SAM: Extracts all entries to Sequence Alignment/MAP (SAM) alignment format.
- BAM: Extracts all entries to Binary SAM (BAM) alignment format.
- FASTQ_GZ: Extracts all entries to unaligned FASTQ format.
- UNPAIRED+UNMAPPED_FASTQ_GZ: For paired-end data, will extract all unmapped reads, including the unmapped read from a "singleton" read-pair.
- FASTA: Extracts all entries to FASTA format.
- MAPPED_PAIRED: Extracts all entries where both reads of a pair were properly mapped, as BAM format.
- MAPPED+UNMAPPED: For single-end data, extracts all mapped and unmapped reads as separate files.
- PAIRED+UNPAIRED+UNMAPPED: For paired-end data, extracts all mapped/paired (both reads mapped), the mapped read from a "singleton" read pair, and unmapped entries (including unmapped reads from a "singleton" read pair).
- BED: Extracts coverage data for every entry, as BED format.
- UNMAPPED: Extracts all unmapped reads (entries) to FASTQ format.
- UNMAPPED_BAM: Extracts all unmapped reads (entries) to BAM format, using the following UNMAPPED_BAM logic.
|Export Function||File Extension||Both reads mapped||Read 1 mapped, Read 2 unmapped||Neither Read mapped|
|SAM||.SAM||Read 1, Read 2||Read 1, Read 2||Read 1, Read 2|
|BAM||.BAM||Read 1, Read 2||Read 1, Read 2||Read 1, Read 2|
|FASTQ_GZ||.1.FASTQ||Read 1||Read 1||Read 1|
|FASTQ_GZ||.2.FASTQ||Read 2||Read 2||Read 2|
|UNPAIRED+UNMAPPED_FASTQ_GZ||.2.FASTQ||Read 1||Read 2|
|FASTA||.FASTA||Read 1, Read 2||Read 1, Read 2||Read 1, Read 2|
|MAPPED+UNMAPPED UnmappedFastq||.UNMAPPED.FASTQ||Read 2||Read 1, Read 2|
|MAPPED+UNMAPPED MappedSam||.MAPPED.SAM||Read 1, Read 2||Read 1|
|PAIRED+UNPAIRED+UNMAPPED Paired.Sam||.PAIRED.SAM||Read 1, Read 2|
|PAIRED+UNPAIRED+UNMAPPED Unpaired.Sam||.UNPAIRED.SAM||Read 1|
|PAIRED+UNPAIRED+UNMAPPED Unmapped.fastq||.UNMAPPED.FASTQ||Read 2||Read 1, Read 2|
|BED||.BED||Read 1, Read 2||Read 1|
|UNMAPPED||.FASTQ||Read 2||Read 1, Read 2|
|UNMAPPED_BAM||.BAM (*Special Logic)||Special Logic||Special Logic|
Export unmapped reads
The option to "keep unmapped reads" must be checked during alignment in order to recover these reads using the Export NGS data option later on.
It can be used in mapping xenograft sequencing data. So after aligning xenograft sequencing to one genome (say mouse), then using this function will output fastq.gz file(s) with unpaired and unmapped reads. Then the fastq.gz file can be aligned to another genome (say human) to get cleaner result.
- Exclude singletons (for paired-end mode) - Will not export reads where only one read of the pair mapped to the same region.
- Exclude multi-reads - Multi reads are considered non-unique (i.e. reads that align to multiple genomic locations with equal or similar numbers of mismatches). Selecting this option will include unique reads only when exporting the data .
- Output folder - User can specify for the output folder.