R’s matplot function in MATLAB

By default, MATLAB’s plot function draws no markers in the figure that it produces. One may explicitly specify a marker and a line style following the line specification string syntax; however, only one marker type and line style may be applied to a single data set.

Further we present a simple MATLAB function that implements the same functionality as R’s matplot function and allows to set style for each data line shown.

Continue reading

Obtaining scaffold positions on assembled chromosomes from NCBI Genome

NCBI Genome stores genomic assemblies of numerous species. Besides assembly sequences, it also contains the related auxiliary information, including AGP files that describe how large sequence objects (e.g., chromosomes) were assembled from smaller ones (e.g., scaffolds or contigs).

For some assemblies, their chromosome-from-scaffold AGP files may be missing although the chromosomes were assembled from the scaffolds. In that case, one may reconstruct the AGP file of scaffolds on chromosomes using chromosome-from-components and scaffold-from-components AGP files.

Further we describe how to perform such a reconstruction and present a Python script implementing it.

Continue reading

Filtering noise in LASTZ dot plots

LASTZ, a whole-genome alignment tool, provides an option to produce a dot plot file of the obtained pairwise alignments. Such a file can be visualized in R using its plot function or from the command line using this R script. However, LASTZ dot plots often contain noise that originates from repetitive elements even if the genomes being aligned to each other have been masked.

For example, the dot plot below shows the pairwise alignments between chromosome 1 sequences of the human genome (the GRCh38.p2 assembly) and the chimpanzee genome (the Pan_troglodytes-2.1.4 assembly). Both sequences were masked with RepeatMasker before alignment; LASTZ was launched with the following parameters.

lastz hs_ref_GRCh38.p2_chr1.mfa \
    ptr_ref_Pan_troglodytes-2.1.4_chr1.mfa \
    --nogapped --notransition --step=20 --ambiguous=iupac \
    --format=rdotplot --output=human_chimp_chr1.rdotplot
lastz-alignment-human-chimpanzee-chromosome-1

LASTZ alignments between chromosome 1 sequences of the human and chimpanzee genomes.

Continue reading

Compiling PHAST under OS X Yosemite or higher

The PHAST (stands for PHylogenetic Analysis with Space/Time models) package implements a number of methods related to comparative and evolutionary genomics. PHAST depends on the LAPACK library and, when compiled under OS X, uses its built-in version. However, the compilation of PHAST under OS X Yosemite or higher stops showing the following error:

fatal error: 'vecLib/clapack.h' file not found

The reason the compilation fails is that the vecLib framework that had been considered deprecated in earlier OS X versions was removed starting from OS X Yosemite. Instead of vecLib, one should use the Accelerate framework embedded in OS X. For that purpose, the files include/external_libs.h and src/make-include.mk should be modified in the following way.

-#include <vecLib/clapack.h>
+#include <Accelerate/Accelerate.h>
-LIBS = -lphast -framework vecLib -lc -lm
+LIBS = -lphast -framework Accelerate -lc -lm

Besides, the FSHIFT macro in the file src/util/clean_genes.c should be replaced with another one (e.g., FRAMESHIFT) because the Accelerate framework contains the macro of the same name but with different meaning.

The changes described above are included in my fork of the original PHAST repository on GitHub: https://github.com/gtamazian/phast.

Creating GIF animations of protein molecules with PyMOL

PyMOL is an open-source molecular visualization system useful for producing high-quality figures of protein structures. Besides static figures, PyMOL can also generate animations with the mpng command that writes movie frames to separate files. However, mpng provides no options to customize the produced images. Here we describe an appoach to get a customized looped animation in PyMOL and present a Python script implementing it. The script is based on Maximilian Ebert’s solution from the PyMOL mailing list.

Continue reading