# Fast vector product in MATLAB

The MATLAB function cross implements the vector product for two kinds of input data: a pair of vectors and a pair of N-D arrays. Surprisingly, it works slower for successive computations of cross products for 3-D vector pairs than the following naive implementation.


function C = cross3d(A, B)

C = [A(2)*B(3) - A(3)*B(2), ...
A(3)*B(1) - A(1)*B(3), ...
A(1)*B(2) - A(2)*B(1)];

end



Further, we estimate the median running time of cross and cross3d in a reproducible way using MATLAB scripts.

# Chart of data format conversions with Kent source utilites

Kent source utilities are the part of the UCSC Genome Browser and implement a number of tasks related to bioinformatic data analysis. In particular, they can be used to convert genomic data files from one format to another.

# BED-based format for genotype counts

Being a comprehensive format for storing variant calling data, VCF is superfluous for some kinds of analysis. Although it is a plain-text tabular format, it may take significant time and memory to load a large VCF file into an R or Python session. For that reason, it is usually effective to preprocess a VCF file using a stand-alone tool in order to extract the information required for further analysis.

An example of such a tool is the vcftools package that implements a number of routines for processing VCF files. In particular, vcftools can be used to obtain allele frequencies from a VCF file:

CHROM POS   N_ALLELES N_CHR {ALLELE:COUNT}
7     45524 2         414   C:414 T:0
7     45569 3         414   G:413 A:1 T:0


Note that the number of columns differs because the second variant has two alternative alleles while the first variant has only one. It may cause problems during further processing of the data, e.g., the R function read.table will fail to read such a file. Also it would be more informative to store genotype counts instead of allele counts. Here we describe the normalized table format for genotype counts and the tool to obtain it from a VCF file.

# Vectorized atomic masses in MATLAB

It is often convenient to obtain atomic masses for a given array of element names in a vectorized way. For example, one may use it to calculate the weighed RMSD (wRMSD) between two conformations of a protein specified by their atom coordinates $\{\, x_i \,\}_{i=1}^n$ and $\{\, y_i \,\}_{i=1}^n$:

$\mathrm{wRMSD} = \sqrt{\frac{1}{n} \sum_{i=1}^n w_i (x_i - y_i)^2}$,

where $x_i, y_i \in \mathbb{R}^3$ and $w_i$ denotes the atomic mass of the ith atom.