Obtaining scaffold positions on assembled chromosomes from NCBI Genome

NCBI Genome stores genomic assemblies of numerous species. Besides assembly sequences, it also contains the related auxiliary information, including AGP files that describe how large sequence objects (e.g., chromosomes) were assembled from smaller ones (e.g., scaffolds or contigs).

For some assemblies, their chromosome-from-scaffold AGP files may be missing although the chromosomes were assembled from the scaffolds. In that case, one may reconstruct the AGP file of scaffolds on chromosomes using chromosome-from-components and scaffold-from-components AGP files.

Further we describe how to perform such a reconstruction and present a Python script implementing it.

Continue reading