Sample-based format for predicted variant effects

VCF is a variant-based format, i.e., each its record (line) represents a single genomic variant: its location, reference and alternative alleles, variant calling characteristics and sample genotypes. However, sample-based datasets are more convenient for some applications, especially if each variant allele has its special meaning. For example, one may predict variant effects with snpEff or Ensembl VEP and consider only the samples having specific effects for both their alleles.

Here we introduce the BED-based format for sample-centered storing of predicted variant effects. Before describing the format, we give a sample of records in it.

Continue reading

Blastn equivalents of MegaBLAST options

MegaBLAST is a legacy sequence alignment tool optimized for rapid processing of long but slightly different nucleotide sequences. It is a part of the NCBI C Toolkit which last version was included in the NCBI BLAST+ 2.2.26 package released in March 2012. In the following NCBI BLAST+ releases, MegaBLAST was replaced with the blastn tool.

Although blastn and MegaBLAST implement nearly the same alignment algorithm, their command-line options differ. Here we describe blastn synonyms for MegaBLAST options.

Continue reading

Obtaining high-quality figures from UCSC Genome Browser

The UCSC Genome Browser is widely used for genomic data visualization. Being a web-based application, it works in an internet browser like Firefox or Chrome and shows a schematic representation of a genome in low resolution that is enough for a screen. However, journals often require high-resolution figures for manuscripts to be submitted or published. The UCSC Genome Browser provides an opportunity to export such figures.

Continue reading