The AGP format is used to describe the assembly structure in the NCBI Genome database. Since AGP is a plain-text tabular data format that specifies positions of smaller sequence objects on larger ones (e.g., contigs on scaffolds), AGP files can be converted to the BED format for their further processing.
NCBI Genome stores genomic assemblies of numerous species. Besides assembly sequences, it also contains the related auxiliary information, including AGP files that describe how large sequence objects (e.g., chromosomes) were assembled from smaller ones (e.g., scaffolds or contigs).
For some assemblies, their chromosome-from-scaffold AGP files may be missing although the chromosomes were assembled from the scaffolds. In that case, one may reconstruct the AGP file of scaffolds on chromosomes using chromosome-from-components and scaffold-from-components AGP files.
Further we describe how to perform such a reconstruction and present a Python script implementing it.