SeqAlto

Installation Instructions

Simply copy the binary to a folder of your choice.

Quick Start

Generate index

All the chromosomes for the organism are contained in a file called "genome.fa". This is a mammalian genome, hence we will use a k-mer size of 22 (20 is also ok). For the human genome this step uses about 12GB of memory.

seqalto index genome.fa 22 genome_22.sidx

Perform alignment

The paired sequencer reads in located in two files "reads_1.fq" and "reads_2.fq". We want to store the output in a SAM file named "output.sam". We want to align with 2 threads. This step uses about 7GB of memory.

seqalto align genome_22.sidx  -1 reads_1.fq -2 reads_2.fq -p 2> output.sam

Details

Index Generation (index) Command Line Options

k-mer_size

Default: must set
k-mer size of the index.
For 100-bp reads set this to 20 or 22. 20 will give you a bit more accuracy. In general, the larger this number is the faster and less accurate. For 200-bp reads it is ok to use 25 or 28.

Index mode (-I)

Default: 1
Indexing mode.
Mode 1 is the default sub-sampled mode, which uses about 7GB of memory. Mode 0 will index all k-mers, this uses about 36GB for index generation and 22GB for alignment. Indexing all k-mers improves the accuracy and sensitivity. The other modes are experimental and not supported.

Alignment (align) Command Line Options

Standard Options
Ungapped Alignment (-u)

Default: off
Disables gapped alignment.
Ungapped alignment is faster, as it does not require alignment of reads with insertions or deletions. This mode is recommended if the primary goal of the alignment is to determine SNPs, and indels are not of any concern. May also miss SNPs near indels.

Fast Alignment (-f)

Default: off
Activates fast alignment mode.
Fast alignment considers fewer possible alignments for each read than for the default alignment settings. If accurate alignment is selected, fast alignment will be disabled. It is advisable to only use reads of high MAPQ if this mode is used.

Accurate Alignment (-a)

Default: off
Activates accurate alignment mode.
Accurate alignment considers as many alignments for each read as possible, resulting in an exhaustive alignment search for each read. This mode is generally very slow. It is recommended only for small numbers of reads that require maximum alignment accuracy. Activating this mode will automatically disable fast alignment, even if the -f flag is included in the run command. For improved accuracy, consider using a smaller k-mer size.

Maximum Gap (-o)

Default value: gap open rate of 0.005 Type: integer or float Values: any non-negative integer, or any float between 0 and 1 If the provided value is an integer, sets the maximum number of gap opens. If the provided value is a float, sets the rate of gap opens on a scale from 0 to 1.

Maximum Gap (-e)

Default value: 50 Type: integer Values: any non-negative integer Sets the maximum length of gap extension to be found reliably. Larger gaps may still be identified.

Indel Penalization (-h)

Default value: 5 Type: integer Values: any non-negative integer Sets the number of bases at the end of the reads to force align.

Read Trimming (-t)

Default value: 30 Type: integer Values: 0-100 Sets the minimum quality threshold for read trimming. The algorithm will use BWA-like trimming on the ends of reads until the Sanger- format quality is greater than the provided value.

Read Group ID (-–rg)

Default value: “none” Type: string Sets the read group ID for alignment.

Platform Unit (–-pu)

Default value: “none” Type: string Sets the platform unit for alignment.

Minimum Length Threshold (-l)

Default value: 50 Type: integer Values: any non-negative integer Sets the minimum threshold for read length. Reads shorter than this value will be rejected.

Paired-End Options
Maximum Template Size (-i)

Default value: 550 Type: integer Values: any positive integer Sets the maximum template size used for paired-end alignment. Minimum template size is the read length.

Average Template Size (-m)

Default value: 250 Type: integer Values: any positive integer Sets the average template size used for paired-end alignment.

Disable Smith-Waterman Pairing (-s)

Default value: off (Smith-Waterman Pairing enabled) Disables Smith-Waterman pairing of unmapped reads.

Phred Score Pairing Prior (-d)

Default value: 80 Type: integer Values: any non-negative integer Sets the prior for Phred score pairing. Higher this value, the more likely to select properly paired reads over discordant reads.

Minimum Unclipped Read Percentage (-c)

Default value: 50 Type: integer Values: any integer between 0 and 100 Sets the minimum allowable percentage of unclipped reads. Increasing this value will cause SeqAlto to perform more like BWA.

Advanced Options

These options should only be adjusted with extreme care.

Needleman-Wunsch Match Score (–-nw_mat)

Default value: 5 Type: integer Values: any integer greater than 0 Sets the Needleman-Wunsch match score. The recommended range is between 2 and 10.

Needleman-Wunsch Mismatch Penalty (–-nw_sub)

Default value: 15 Type: integer Values: any integer greater than 0 Sets the Needleman-Wunsch mismatch penalty. The recommended range is between 5 and 20.

Needleman-Wunsch Gap Penalty (–-nw_gap)

Default value: 40 Type: integer Values: any integer greater than 0 Sets the Needleman-Wunsch gap penalty. The recommended range is between 20 and 60.

Needleman-Wunsch Gap Extension Penalty (–-nw_ext)

Default value: 2 Type: integer Values: any integer greater than 0 Sets the Needleman-Wunsch gap extension penalty. The recommended range is between 1 and 10.

Smith-Waterman Match Score (-–sw_mat)

Default value: 5 Type: integer Values: any integer greater than 0 Sets the Smith-Waterman match score. The recommended range is between 2 and 10.

Smith-Waterman Mismatch Penalty (–-sw_sub)

Default value: 15 Type: integer Values: any integer greater than 0 Sets the Smith-Waterman mismatch penalty. The recommended range is between 5 and 20.

Smith-Waterman Gap Penalty (-–sw_gap)

Default value: 40 Type: integer Values: any integer greater than 0 Sets the Smith-Waterman gap penalty. The recommended range is between 20 and 60.

Smith-Waterman Gap Extension Penalty (–-sw_ext)

Default value: 2 Type: integer Values: any integer greater than 0 Sets the Smith-Waterman gap extension penalty. The recommended range is between 1 and 10.

k-mer Maximum Occurrence Threshold (–-max_occ)

Default value: 10000 Type: integer Values: any integer greater than 0 Sets the maximum threshold for k-mer occurrences. If a given k-mer appears more often than this number, if will be ignored.

k-mer Maximum Occurrence Threshold in Needleman-Wunsch Stage (–-max_occ_nw)

Default value: 1000 Type: integer Values: any integer greater than 0 Sets the maximum threshold for k-mer occurrences in the Needleman-Wunsch stage. If a given k-mer appears more often than this number during Needleman-Wunsch alignment, if will be ignored.

k-mer Look-Ahead (-–look_ahead)

Default value: 2 Type: integer Values: any integer greater than 0 Sets the k-mer look-ahead for ungapped alignment.

Additional k-mer Look-Ahead for High Mismatch (–-kmer_pen)

Default value: 1 Type: integer Values: any integer greater than 0 Sets the additional k-mer look-ahead for ungapped alignment for reads with high mismatches.

Needleman-Wunsch k-mer Look-Ahead (–-look_ahead_nw)

Default value: 2 Type: integer Values: any integer greater than 0 Sets the k-mer look-ahead for Needleman-Wunsch gapped alignment.

Needleman-Wunsch Additional k-mer Look-Ahead for High Mismatch (–-kmer_pen_nw)

Default value: 1 Type: integer Values: any integer greater than 0 Sets the additional k-mer look-ahead for Needleman-Wunsch alignment for reads with high mismatches.

SeqAlto Manual

Installation Instructions

Quick Start

Details

Index Generation (index) Command Line Options

Alignment (align) Command Line Options

Useful Links