AdapterStripping Right

From Array Suite Wiki

Right Adapter stripping is used where the length of the DNA fragments being sequenced are likely to be shorter than the read length.

The function will trim read sequences starting at the matching position of the input adapter sequence.

Tips.pngThis function is orientation-sensitive. The adapter sequence should match the sequence that would show up at the end of a read.

For example, we have two illumina adapter seqeunce:

PE adapter 1
PE adapter 2

They are sharing common end sequence CGCTCTTCCGATCT. In paired-end sequencing, if the read length is larger than insert fragment size, we will see part of adapter 2 in read 1 and part of adapter 1 in read 2.

Adapter stripping.png

Since they are two adapters are sharing common end sequence, the adapter sequences can be trimmed by using

AdapterStripping Right /AdapterSequence=AGATCGGAAGAGCG

AGATCGGAAGAGCG is the starting sequence of the reverse complement of both adapters.

If there are no common sequences between the adapters, you can also trim two adapters simultaneously. Just be sure that your adapter sequence isn't expected to occur in your fragments!

Notes on Underlying Alignment Method

Right adapter stripping is using a full Smith-Waterman alignment and requires the score to be at least (adapaterSequence.Length - 2). The function uses 1 for match and -1 for mismatch.

Array Studio aligns the adapter sequence against read sequence to strip up to where the best alignment to the adapter ends.

  • If alignment overlap length < 7, Array studio strips reads from 3’end strictly
  • If alignment overlap length > 7, mismatches are allowed if the quality of the mismatch position is low.
  • The read is stripped based on the starting location of the best alignment.

The stripping is aggressive because 1-2 bp of adapter sequence may results in calling of mismatches in alignment.

The minimal length required for the read after adapter stripping is 17bp for alignment step. Array Studio keeps both reads in a pair even if one of the reads is shorter than 17bp after stripping. e.g. Paired end reads, R1 and R2, have read length 75 and 7bp, respectively, after trimming and adapter stripping. If we keep R1 in the filtered fastq file, we have to keep R2 too, to maintain the line order, 1-1 line matching of PE files.