Bio Dasturchi

Bio Dasturchi GitHub Sahifasi.

View on GitHub

1.7 NCBI Genomes & NCBI Assembly

Prerequisite Terminologies:

In order to have a better understanding of the main topic, you should get yourself familiar with the following term:


Genome is a sub-database of NCBI which allows us to retrieve an entire genome of an organism/species, which can be sequences, genomic maps, genomic assemblies, or annotations. It is a comprehensive database that has a huge number of genome assemblies that can be retrieved or downloaded from this database. It is an open platform for your researchsubmission, hence you can submit your research findings on a particular genome and it’ll be available for other researchers as well.

Assembly is also a sub-database of NCBI, which contains the information from assembled genomes. It means the genomes that are available within this database are completely sequenced and are available for metadata statistical reports with assembly names and other information of this sort.


Note: The BLAST available on this database works only with Human Genome/Microbial nucleotides Genome and is quite different from the BLAST available on public domain.

➢ Retrieving a genome of interest from the Genome database:

➢ FTP format analysis:

➢ Retrieval of an assembled genome from NCBI:

Note: You should always use the RefSeq database to retrieve the genomic information.

Note: If you’ve downloaded the GFF files of any particular genome, or the genome annotation files, you should never open that files on you Windows OS, since it takes a lot of storage on real time on your RAM, so it can be harmful for your PC. If you’re working with Linux OS or MacOS, then you should follow the specific procedure to open the files on the respective operating systems.


In this video tutorial of NCBI, we came to know about the two sub-databases of NCBI to retrieve and analyze the genomes of an organism. We got to know how to retrieve and analyze an entire genome of an organism using the “Genomes” database of NCBI and also came to know about the FTP format to download the genomic information from RefSeq or GenBank databases. And from the “Assembly” database of NCBI, we got to know the procedure to download and retrieve the fully sequenced genomes provided by NCBI and other integrated databases.
