Ngs BamTools.pdf

From Array Suite Wiki

BAM Tools

The BAM Tools command will allow the user to perform various actions on a single or multiple BAM formatted files.

NGSBamTools2.png

General

The first step is to add the BAM files by selecting either "Add", "Add Folder" or "Add List".

Options

  • Action
    • Sort BAM files by coordinate - Sorts the reads in the BAM file by the coordinates of the sequence.
    • Sort BAM files by read names - Sorts the reads in the BAM file by the name of the read.
    • Append alignment tie count (ZC) tag - This tag counts the number of ties for each alignment and can be used for a variety of modules.
    • Index BAM files for analysis (.bim) - Encode in binary form for computer storage and processing purposes. Bim is standard index for OmicSoft. Indexing will be skipped by default if a .bai or .bim already exists. Users can force indexing by checking Force indexing. When set to True, this option will create .bim and .bai indexes to REPLACE the pre-existing index.
    • Index BAM file for external analysis (.bai) - Index BAM in .bai format so it can be further processed/viewed in other programs.
    • Index BAM files for name search (.bns) - Encode in binary number system.
    • Build genome browser BAS file(.bas) - Will generate a genome browser ".bas" file for importing into a genome browser.
    • Build Land BAS file (.bas) - Will generate a ".bas" file to be used in Land.
    • Extract header and infer reference library ID - Determines which program was used in generating the file, and attempts to determine the correct library used (saves the user from having to convert the files to extract this information. It will return a table containing "FilePath" "Program", "Version', "CompatibleReferenceLibraryID", "Reference Count" and "References for each observation (BAM File).
    • Summarize alignment statistics - This will generate an alignment statistics report table in the project solution explorer. The table contains FilePath, TotalReads, MappedReads, UniquelyMappedReads, UniquedPairedReads, MappedRead%, UniquelyMappedRead%, UniquedPairedRead%, MaxTies, AverageTies, MaxEditDistance, AverageEditDistance.
    • Compress BAM tools (read indexing and quality collapsing) - Compressed BAM files using this tool and compressed BAM files directly from alignment are different. For compressed BAM file from Bam tools, the read name is ordered alphabetically, while for compressed BAM from alignment the read name is based on their orders in the fastq file. This also expalins why we can restore the fastq files from compressed BAM files if it is compressed from alignment, but we cannot restore if compressed from BAMtools. Read more about getting compressed BAM from alignment.
    • Build GCF file: Converts the specified .bam file(s) into space-efficient genome coverage files, for visualizing genome coverage in the Genome Browser.
  • Job number - Specify the total number of samples to be run in parallel. One thread can read and process only one sample/file. Job number is equal to ThreadNumber. The more threads that are allocated, the faster the algorithm will run. By default, this is set to the number of CPUs on the user’s computer. This should not be set to a greater number of CPUs than available, but can be reduced at the user’s discretion.
  • Optionally an "Output folder" can be specified - If unspecified, the target files will be generated in the same folder as the source files.

OmicScript

BamTools and its Action