Converting an AGP file to the BED format

The AGP format is used to describe the assembly structure in the NCBI Genome database. Since AGP is a plain-text tabular data format that specifies positions of smaller sequence objects on larger ones (e.g., contigs on scaffolds), AGP files can be converted to the BED format for their further processing.

Continue reading

Obtaining scaffold positions on assembled chromosomes from NCBI Genome

NCBI Genome stores genomic assemblies of numerous species. Besides assembly sequences, it also contains the related auxiliary information, including AGP files that describe how large sequence objects (e.g., chromosomes) were assembled from smaller ones (e.g., scaffolds or contigs).

For some assemblies, their chromosome-from-scaffold AGP files may be missing although the chromosomes were assembled from the scaffolds. In that case, one may reconstruct the AGP file of scaffolds on chromosomes using chromosome-from-components and scaffold-from-components AGP files.

Further we describe how to perform such a reconstruction and present a Python script implementing it.

Continue reading