Filtering noise in LASTZ dot plots

LASTZ, a whole-genome alignment tool, provides an option to produce a dot plot file of the obtained pairwise alignments. Such a file can be visualized in R using its plot function or from the command line using this R script. However, LASTZ dot plots often contain noise that originates from repetitive elements even if the genomes being aligned to each other have been masked.

For example, the dot plot below shows the pairwise alignments between chromosome 1 sequences of the human genome (the GRCh38.p2 assembly) and the chimpanzee genome (the Pan_troglodytes-2.1.4 assembly). Both sequences were masked with RepeatMasker before alignment; LASTZ was launched with the following parameters.

lastz hs_ref_GRCh38.p2_chr1.mfa \
    ptr_ref_Pan_troglodytes-2.1.4_chr1.mfa \
    --nogapped --notransition --step=20 --ambiguous=iupac \
    --format=rdotplot --output=human_chimp_chr1.rdotplot
lastz-alignment-human-chimpanzee-chromosome-1

LASTZ alignments between chromosome 1 sequences of the human and chimpanzee genomes.

The noise from the dot plot can be easily removed by filtering the alignments by their length. The following Python script implements that kind of filtering for LASTZ dot plot files; note that the script requires NumPy.

So, let’s use the filter_dotplot.py script keeping the alignments which length is greater than 1 kbp.

./filter_dotplot.py human_chimp_chr1.rdotplot 1000 \
    human_chimp_chr1_filtered.rdotplot

We got the following dot plot after the filtration by the alignment length; it contains much less noise compared to the dot plot above.

lastz-filtered-alignment-human-chimpanzee-chromosome-1

Filtered LASTZ alignments between chromosome 1 sequences of the human and chimpanzee genomes. The alignments which length was greater than 1 kbp were kept.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s