Tool created to assemble genomes in record time

The new tool is freely accessible to researchers around the world

A professor at the Department of Biology at FCUP and researcher at the Interdisciplinary Center for Marine and Environmental Research at the University of Porto (CIIMAR-UP), Agostinho Antunes, is part of the international team that has just revealed a new computational tool capable of assembling thousands of genomes of quickly and simply.

The new tool, freely accessible to researchers from all over the world, will make it possible to respond to the ambitions of the international objective Earth BioGenome to produce reference genomes for around 1,8 million species over the next decade, with a major impact on conservation.

This discovery takes place in a context marked by environmental changes resulting from global warming and the current sixth mass extinction, which place biodiversity in crisis with devastating consequences for the functioning and health of ecosystems, the evolutionary inheritance and adaptive potential of species, constituting a great threat to humanity.

It is at this moment in history that genomics becomes highly relevant in biodiversity assessments and conservation efforts. This requires a global, round-the-clock effort to understand the genomes of the world's biodiversity known so far.


Earth BioGenome: a decade to produce all genomes

According to Agostinho Antunes, “the extinction of species is an irreparable loss of thousands or millions of years of evolution and the compression of the genomes of living beings brings together fundamental information”. In this context, knowing and studying genomes at a global level is a task that “is at the center of science and cutting-edge biotechnologies that drive growth, productivity, commercialization and global competitiveness”.

One of the project objectives Earth BioGenome involves identifying species genetically at greatest risk of extinction to preserve the genetic information of life. To this end, the global goal of producing reference genomes for the approximately 1,8 million species known on Earth over the next decade was launched. However, to fulfill this mission, the current rate of production of reference genomes will have to increase by at least two orders of magnitude.

In the opinion of Agostinho Antunes, despite there being thousands of researchers contributing to this objective, “this technological advance will not be achieved without the automation of the genome assembly process and a procedure that is widely accessible to any research group”. This implies efforts not only to optimize the assembly of genomes and develop good practices, but also to disseminate existing tools and provide work teams with infrastructure and training.


The origin of the new genome production pipeline

The optimization of the assembly process and the development of best practices was achieved by combining the experience of two projects – the Vertebrate Genomes Project (VGP) and the European Reference Genome Atlas (ERGA).

And it was based on this work that the international team that includes the leader of the Evolutionary Genomics and Bioinformatics team at CIIMAR, developed an automated computational flow or “pipeline”, capable of generating almost complete genome assemblies in a simple and fast way. This robust tool is now described in the article Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy, published in the prestigious journal Nature Biotechnology.

This pipeline was developed within the Galaxy ecosystem, a global scientific workflow platform aimed at making computational biology accessible to researchers who do not have experience in computer programming or systems administration.

“Galaxy allows users to execute complex workflows on thousands of data sets in a graphical interface and computer applications accessible to all types of users” explains Agostinho Antunes.

The new pipeline is no exception in the context of Galaxy and is designed to be useful across the spectrum of user education levels and analysis scenarios. In addition to the very powerful tool, tutorials were created that allow users to be trained to make using this platform easier and faster.


A new ally in species conservation

In more detail, the new pipeline combines a series of bioinformatics methodologies placed in an orderly and automated way, which allows for practically complete genome assemblies, leaving it up to the researcher to complete the entire process. It also includes extensive quality control and decontamination functions to ensure reliable results.

For validation, the researchers used 51 available vertebrate datasets and first tested this workflow in assembling a zebra finch (Taeniopygia guttata) reference genome, for which a wide variety of data types are available. genomics, which made it possible to legitimize the competence of this tool.

Although the sets of genomes studied relate to vertebrates, the principles of this new model can be applied to genomes of animals, plants or fungi, modifying just a few parameters. The task of describing the genomes of the almost 1,8 million eukaryotic species known on Earth in a short space of time becomes more accessible, enhancing their impact on conservation actions.


The importance of knowing genomes

Genomics is highly relevant in biodiversity assessments and conservation efforts, as genetic diversity is fundamental to all levels of biological organization (individuals, populations, species, communities and ecosystems).

But we don’t stop there: “genomics has the promise of improving our quality of life”, says Agostinho Antunes, highlighting examples such as the monitoring of genetic diseases, precision medicine based on the genetic information of each human individual, among others .

At the same time, having genetic information also allows for great opportunities for economic development and environmental improvement for the benefit of society at large.

“Every living being has a genome that contains the secret code of life. From the food we eat, to the medicines and cures we seek, to the environmental sustainability of the natural resources we depend on, genomics is currently revolutionizing life sciences” adds Agostinho Antunes.


The future of the new pipeline

The future of this powerful tool is also in the hands of the research team, which plans to improve its efficiency every day and provides updates on scalability issues, automation of data curation, the incorporation of different types of data and complementary procedures that include sequencing.

These improvements aim to make the flow of this pipeline capable of generating virtually complete and highly accurate reference genomes in record time.