Beyond The Reference Genome: How pangenomics is changing the game
/Jack Royle
Reference genomes have long been the backbone of genomic research. These 'one-size-fits-all' representations are typically constructed from sequencing projects that use just a single representative. While there is no doubt that they have been invaluable, they fall short in capturing the full genetic diversity within a species - who’s to say that the one individual chosen to be sequenced has the full gamut of every piece of information in their genome?
Thanks to the increased throughput of the new PacBio Revio®, we can now sequence the genomes of multiple individuals of a species at a fraction of what it used to cost previously, in quite a short amount of time. Unlike short-read sequencing, the HiFi reads of the Revio provide significantly longer stretches of DNA sequence at Q30+ quality, making it easier to resolve complex genomic regions, including repetitive elements and large structural variations. These advancements mean we can move beyond the era of a single reference genome, and into the ‘pangenomic era’.
The beginnings of building a reference pangenome is quite simple – by sequencing multiple individuals of a species, we can assemble both the ‘core’ genome (shared by all individuals) as well as the accessory, or satellite genome, that accounts for the unique variations found in individuals. With HiFi reads, we can also include epigenetic information, as DNA methylation is natively read during the sequencing run. With enough coverage, it is even possible to generate haplotype-phased pangenomes! These genomes are then mapped into a ‘Pangenome graph’, providing co-ordinates for each variation. This approach is also iterative, allowing information to be added in the future, whether that be full-length Isoforms for gene information, or even more genomes as new variants of a species are discovered.
Utilising a pangenome as a reference is already having notable effects in communities, with the Chinese Pangenome Consortium already identifying variations showing considerable differentiation among different ethnic groups, including novel deletions that cause anemia in Southern Chinese and Southeast Asian Populations - missed previously due to a lack of diverse genomic information in previous reference datasets.1 The impact is also seen in agriculture, where previously missed structural variants are now being identified as disease resistance markers and are linked to phenotypic outcomes.2
The Pangenome approach ultimately provides a more comprehensive view of the genetic landscape, enabling us to uncover hidden secrets in the genomes of organisms, provide more personalised medical treatments, and better understand the complex relationships between genetics and traits. The game is changing, and the possibilities are endless as we journey into the era of pangenomics.