Command-line options of Isaac Variant Caller

Isaac Variant Caller implements the fast variant-calling algorithm and can be considered as an alternative to GATK or samtools variant callers. Unfortunately, it seems to have no manual that would describe its command-line options.

Here we give the list of the Isaac Variant Caller command-line options obtained from its source codes that are publicly available on GitHub.

Genotyping options

There are two options: –snp-theta and –indel-theta that define values of the theta parameters used for genotyping; their default values are 0.001 and 0.0001, respectively. These parameters are explained in the paper on the Isaac suite.

gVCF options

The following options specify producing the output file in the gVCF format.

  • –gvcf-file – the output gVCF file name; ‘-‘ corresponds to the standard output;
  • –chrom-depth-file – use the mean read depth values for each chromosome from the specified file for high-depth filtration; the chromosome depth file should contain one line per chromosome like chrom_name<TAB>depth;
  • –gvcf-max-depth-factor – if the chromosome depth file is specified (see above), then the loci which depth exceeds the mean chromosome depth the specified number of times are filtered. The default value is 3;
  • –gvcf-min-gqx – the minimum locus GQX value; the negative value disables the filter. The default value is 30;
  • –gvcf-max-snv-strand-bias – the maximum SNV strand-bias value. The default value is 10;
  • –gvcf-no-snv-strand-bias-filter – disables the SNV strand-bias filter;
  • –gvcf-max-snv-hpol – SNVs are filtered if they are located within a homopolymer context greater that the specified length value; a negative value disables the filter. By default, the filter is disabled;
  • –gvcf-max-indel-ref-repeat – indels are filtered if they lengthen or contract a homopolymer or dinucleotide with reference repeat length greater than the specified value; a negative value disables the filter. By default, the filter is disabled;
  • –gvcf-min-blockable-nonref – prevents joining of sites into a non-variant block if it contains more than the specified fraction of non-reference alleles. The default value is 0.2;
  • –gvcf-include-hapscore – include haplotype scores at SNV positions in the gVCF output;
  • –gvcf-no-block-compression – turn off block compression in the gVCF output;
  • –gvcf-compute-VQSRmetrics – report metrics used for Variant Quality Score Recalibration (VQSR), namely, BaseQRankSumReadPosRankSumMQRankSum and MQ;
  • –gvcf-skip-header – skip writing a header to the output gVCF file.

Haplotype options

The –hap-model option, if specified, activates the haplotype-based variant calling procedure.

Non-reference model options

The following options specify how non-reference alleles are processed.

  • –nonref-test-file – test for non-reference alleles at any frequency and write the results to the specified file;
  • –nonref-sites-file – print the results of the non-reference allele test at every site to the specified file;
  • –nonref-variant-rate – the expected non-reference variant frequency used for the non-reference test. The default value is 9.99e-07;
  • –min-nonref-freq – the minimum non-reference allele frequency considered in the non-reference test. The default value is 0;
  • –nonref-site-error-rate – the expected rate of erroneous non-reference allele sites applied to the non-reference model. At error sites a non-reference allele is expected in the frequency range from 0 to the value specified by the –nonref-site-error-decay-freq option (see below) with a probability that linearly decays at the –non-site-error-decay-freq value to 0. The default value is 0.0001;
  • –nonref-site-error-decay-freq – the parameter used to estimate the error site probability as described above. The default value is 0.01.

Contig options

This group contains three options that specify how contiguous sequences (contigs) of aligned reads are processed.

  • –min-contig-open-end-support (default: 0);
  • –min-contig-edge-alignment (default: 7);
  • –min-contig-contiguous-match (default: 14).

The –min-contig-open-end-support option filters out any open-ended contig with an unaligned breakpoint sequence length of less than its argument value. The –min-contig-edge-alignment option filters out any contig with an edge match segment shorter than its argument value. The –min-contig-contiguous-match option filters out any contig without a match segment of length at least its argument value.

Realignment options

There are two options in this group: –max-indel-toggle-depth and –skip-realignment. The  –max-indel-toggle-depth option controls the realignment stringency;  lowering its value increases the realignment speed and decreases quality of the called indels. The default value of –max-indel-toggle-depth is 5.

The –skip-realignment option disables the read realignment; it is accepted if no indel-calling and contig options are specified.

Indel-calling options

This group contains options related to indel calling.

  • –max-candidate-indel-depth – the maximum estimated read depth for an indel to be considered; if this value is exceeded for any sample, then the indel is filtered. The default value is 10000;
  • –min-candidate-open-length – the minimum length of an open-ended breakpoint sequence required to become a breakpoint candidate. The default value is 20;
  • –candidate-indel-input-vcf – add candidate indels from the specified VCF file. The option can be provided multiple times to combine evidence from multiple VCF files;
  • –force-output-vcf – write to output a record for each site or indel in the provided VCF file even if no variant is found. The option can be provided multiple times to combine multiple VCF files;
  • –upstream-oligo-size  – process reads as if they have an upstream oligo anchor for purposes of meeting minimum breakpoint overlap in support of an indel.

Variant-calling window options

The –variant-window-flank-file option outputs regional average basecall statistics at variant sites within a window of the variant call of the specified size. The option is provided with a pair of arguments: the window flank size and the output file name, for example, –variant-window-flank-file 10 window10.txt. This option can be specified several times for various window sizes.

Compatibility options

The –eland-compatibility option enables checking of input reads for an optional AS field corresponding to the ELAND PE map score.

Input options

This group contains two options: –max-input-depth and –ignore-conflicting-read-names. The –max-input-depth option specifies the maximum allowed read depth per sample (prior to the realignment procedure); by default, there is no limit. The –ignore-conflicting-read-names option disables reporting an error if two input reads share the same QNAME and read number.

Other options

There is a pair of options in this group: –report-file and –remap-input-softclip. The first option, –report-file, reports non-error run information and statistics to the file specified as its argument. The second option, –remap-input-softclip, attempts to realign all soft-clipped segments in input reads.

Legacy options

This group contains options that are marked legacy.

  • -bam-file – analyze reads from the specified sorted BAM file;
  • -bam-seq-name – analyze reads from a BAM file that are aligned to the chromosome with the specified name;
  • -samtools-reference – get the reference sequence from the specified multisequence FASTA file that follows the samtools reference conventions;
  • -bsnp-diploid-het-bias – specify the bias term for the heterozygous state in the bsnp model, so that heterozygotes are expected at allele ratios in the range 0.5±x, where x is the parameter value. The default value is 0;
  • -bsnp-diploid-file – run the Bayesian diploid genotype model and write its results to the specified file;
  • -bsnp-disploid-allele-file – write the most probable genotype at every position to the specified file;
  • -min-qscore – do not use a base if its qscore is less than the specified value. The default value is 17;
  • -max-window-mismatch n m – do not use a base if the mismatch count within the window of m flanking bases is greater than n;
  • -min-single-align-score – mark the reads which single align scores are less than the specified value as single-end failed. By default, such reads are excluded from consideration unless a paired score is present; this behavior can be modified by options -single-align-score-exclude-mode and -single-align-score-rescue-mode (see below). The default value is 10;
  • -single-align-score-exclude-mode – exclude single-end failed reads even when a paired score is present and the read is not paired-end failed;
  • -single-align-score-rescue-mode – include non single-end failed reads even when a paired score is present and the read is paired-end failed;
  • -min-paired-align-score – reads which paired align score is less than the specified value are marked as paired-end failed if a paired score is present. By default such reads are excluded from consideration, but may still be used if the single-score rescue mode is enabled. The default score value is 6;
  • -filter-unanchored – prevent using unanchored read pairs (that is, the read pairs that have a single-read mapping score of zero in both its reads) during variant calling;
  • -include-singleton – include paired-end reads with unmapped mates;
  • -include-anomalous – include paired-end reads that are not part of a proper pair (anomalous orientation or incorrect insert size);
  • -counts – write observation counts for every position to the specified file;
  • -clobber – overwrite pre-existing output files;
  • -print-evidence – print the observed data at single-site events (indels not included);
  • -print-all-site-evidence – print the observed data for all sites (indels not included);
  • -bindel-diploid-het-bias – set the bias term for the heterozygous state in the bindel model, so that heterozygotes are expected at allele ratios in the range 0.5±x, where x is the parameter value. The default value is 0;
  • -bindel-diploid-file – run the Bayesian diploid genotype caller and write the results to the specified file;
  • -indel-contigs – the contig file produced by the GROUPER indel-finding tool. This option must be specified together with -indel-contig-reads;
  • -indel-contig-reads – the contig reads file produced by the GROUPER indel-finding tool. This option must be specified together with -indel-contigs;
  • -indel-error-rate – for indel calling, set the indel error rate to a constant value equal to the one specified in the option. The default indel error rate is estimated from an empirical function accounting for the homopolymer length and the indel type (insertion or deletion). This option overrides the default behavior;
  • -indel-nonsite-match-prob – the probability of a base matching the reference in an average mismapped read; this value is used only by the indel caller. The default value is 0.25;
  • -report-range-begin – event reports and coverage begin at the specified base. The default value is 0;
  • -report-range-end – event reports and coverage end after the specified base or the reference size, if specified. The default value is the reference size;
  • -report-range-reference – event reports and coverage span the entire reference sequence; a reference sequence is required to use this option. This option can not be combined with -report-range-begin and -report-range-end;
  • -genome-size – the total number of non-ambiguous bases in the genome to which the input reads have been aligned. This option is used in indel calling;
  • -min-candidate-indel-reads – the minimal number of indel-supporting reads to consider the indel a candidate for realignment and indel calling. A read counts if it is either a contig read or a genomic read that passes the mapping score threshold. The default value is 3;
  • -min-candidate-indel-read-frac – the minimal number of intersecting reads containing an indel to consider it a candidate for realignment and indel calling. The criteria for counting a read are the same as for -min-candidate-indel-reads. Only genomic reads that pass the mapping score threshold are used for the denominator of this metric. The default value is 0.02;
  • -max-small-candidate-indel-read-frac – an additional indel candidacy filter for small indels (no more than 4 bases): the minimal fraction of intersecting reads containing the small indel to consider it a candidate for realignment and indel calling. The criteria for counting a read are the same as for -min-candidate-indel-reads. Only genomic reads that pass the mapping score threshold are used for the denominator of this metric. The default value is 0.1;
  • -max-candidate-indel-density – if there are more than the specified number of candidate indels per base intersecting a read, then realignment is truncated to only allow individual indel toggles of the starting alignments for that read. The default value is 0.15;
  • -candidate-indel-file – write to the specified file all candidate indels before realignment and genotyping;
  • -write-candidate-indels-only – skip all analysis steps besides writing candidate indels (valid only with -candidate-indel-file);
  • -realigned-read-file – write to the specified BAM file the reads that have had their alignments altered during realignment;
  • -realign-submapped-reads – if specified, then even reads that failed the variant calling mapping thresholds are realigned using the same procedure as for the variant calling reads;
  • -snp-max-basecall-filter-fraction – do not call SNPs at the sites where the fraction of filtered basecalls exceeds the parameter value. The default value is 1;
  • -no-ambiguous-path-clip – disable trimming of ambiguous reads after realignment;
  • -max-indel-size – the maximum size for indels processed for calling and realignment. Note that increasing this value should lead to an approximately linear increase in memory consumption. The default value is 150;
  • -print-all-poly-gt – print all polymorphic-site genotype probabilities in the diploid sites and SNP files;
  • -print-used-allele-counts – print used base counts for each allele in the diploid sites and SNP files;
  • -used-allele-count-min-qscore – if allele counts are printed, then filter them for the qscore greater than the specified parameter value;
  • -all-warnings – print all warnings (by default, only errors and low-frequency warnings are shown);
  • -skip-variable-metadata – do not print command-line or time stamps in the output file metadata.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s