Obtaining probability of all variant calls being correct

The VCF format specifies quality scores (QUAL) for each variable position (variant) in a genome. The QUAL value is the Phred quality score for the assertion that alternative bases of a variant are correct, that is, $\mathrm{QUAL} = -10 \log_{10} p$, where $p$ is the probability that the alternative base calls are wrong. Using the QUAL scores, one may easily calculate the probability that all variant calls in a VCF file are correct.

Here we give an equation for that probability, a Python script that implements it and an example of its usage.

Converting an AGP file to the BED format

The AGP format is used to describe the assembly structure in the NCBI Genome database. Since AGP is a plain-text tabular data format that specifies positions of smaller sequence objects on larger ones (e.g., contigs on scaffolds), AGP files can be converted to the BED format for their further processing.

Combining a large number of VCF files

The bcftools and vcftools packages provide routines for merging or concatenating multiple VCF files. However, specifying a large number of input VCF files may terminate their processing because an operating system will not be able to keep so many files opened. This problem can be overcome by iterative combining of files: first, pairs of the original VCF files are processed, then pairs of the obtained files are processed and so on until we get the resulting VCF file.

Here we describe an iterative scheme for merging or concatenating VCF files using bcftools and GNU parallel and present a Python script that implements it.

Restoring models in protein structure files by Swiss PDB Viewer

Despite its name, Swiss PDB Viewer implements a number of features besides visualization of protein molecules. One of such features is side chain reconstruction for protein structures that contain only backbone atoms. However, Swiss PDB Viewer does not write model records to its output PDB files that may cause problems with other PDB-processing programs.

In this post, we present a Python script that adds proper model records to a PDB file produced by Swiss PDB Viewer.

Command-line options of Isaac Variant Caller

Isaac Variant Caller implements the fast variant-calling algorithm and can be considered as an alternative to GATK or samtools variant callers. Unfortunately, it seems to have no manual that would describe its command-line options.

Here we give the list of the Isaac Variant Caller command-line options obtained from its source codes that are publicly available on GitHub.

R’s matplot function in MATLAB

By default, MATLAB’s plot function draws no markers in the figure that it produces. One may explicitly specify a marker and a line style following the line specification string syntax; however, only one marker type and line style may be applied to a single data set.

Further we present a simple MATLAB function that implements the same functionality as R’s matplot function and allows to set style for each data line shown.