Is the human genome fully understood and annotated?

The DNA sequence is finally complete, but what does it all mean?

Think of the human genome as a massive library. For decades, we only had access to about 92% of the books on the shelves. The Telomere-to-Telomere (T2T) Consortium finally gave us the complete set of books in 2022, filling in the missing 8% — roughly 200 million new 'pages' of DNA [4]. This was a monumental technical achievement, resolving notoriously tricky regions like the centromeres (the 'waist' of chromosomes) and the short arms of five chromosomes [4][6]. However, having all the books on the shelf is not the same as having read and understood every one of them.

The process of 'annotation' — figuring out which stretches of DNA are actual genes, what they do, and how they are regulated — is far from finished. A major community experiment from 2006 (EGASP) found that even the best computer programs at the time could only correctly predict at least one version of a gene for about 70% of known genes, and their accuracy for capturing the different ways a single gene can be spliced (alternative splicing) was only 40-50% [7]. While tools have improved, this highlights that automated annotation is still imperfect and requires extensive manual curation and experimental validation.

The newly completed regions are the most mysterious and complex

The very regions that were hardest to sequence are also the most difficult to understand. These include vast stretches of repetitive DNA, like the centromeres and segmental duplications (large, nearly identical copies of DNA blocks). The T2T assembly revealed that segmental duplications make up 7% of the genome, not the 5.4% previously estimated [2]. These regions are hotbeds of structural variation — large-scale rearrangements like inversions and deletions that differ between individuals [5][8].

This complexity matters for medicine. For instance, the T2T genome allowed researchers to fully resolve the SMN1/SMN2 gene region, which is critical for spinal muscular atrophy, and the AMY1/AMY2 region, linked to starch digestion and obesity [1]. A 2023 study found that a deletion in the KLRC gene cluster, a region only fully resolved in the T2T genome, is associated with natural killer cell differentiation in about 20% of humans [8]. This shows that the 'dark matter' of the genome is not junk; it contains medically relevant genes that we are only beginning to explore.

A complete reference genome is a game-changer for genetic testing, but it's not a complete understanding

Having a complete, accurate reference genome (like T2T-CHM13) dramatically improves our ability to find genetic variants in individuals. A 2022 study showed that using the T2T reference instead of the older GRCh38 reference eliminated tens of thousands of false-positive variants per person and reduced errors in 269 medically relevant genes by up to a factor of 12 [10]. This means fewer false alarms and more accurate diagnoses when sequencing a patient's genome.

However, even with a perfect reference, we still cannot interpret most of the variants we find. A massive 2023 study of over 76,000 human genomes created a 'constraint map' of the genome, showing which regions are so important that mutations are rarely tolerated [9]. While this map helps identify functional regions, it also confirms that the vast majority of the non-coding genome shows no signs of constraint, meaning its function (if any) remains unknown. Furthermore, a 2025 benchmark of the complete HG002 genome showed that even state-of-the-art methods still struggle, with de novo assemblies outperforming traditional variant calling by an order of magnitude, yet still making about one error per 100,000 base pairs in the most complex regions [3]. We have the complete map, but we are still learning to read it.

Sources used in this answer

Complex genetic variation in nearly complete human genomes

Sequencing 65 diverse genomes and building 130 haplotype-resolved assemblies closed 92% of previous assembly gaps and fully resolved complex loci like MHC and centromeres, revealing up to 30-fold variation in centromere array length.

2025 · Glennis A Logsdon, Peter Ebert, Peter A Audano, Mark Loftus, David Porubsky, Jana Ebler, Feyza Yilmaz, Pille Hallast, Timofey Prodanov, DongAhn Yoo, Carolyn A Paisie, William T Harvey, Xuefang Zhao, Gianni V Martino, Mir Henglin, Katherine M Munson, Keon Rabbani, Chen-Shan Chin, Bida Gu, Hufsah Ashraf, Stephan Scholz, Olanrewaju Austine-Orimoloye, Parithi Balachandran, Marc Jan Bonder, Haoyu Cheng, Zechen Chong, Jonathan Crabtree, Mark Gerstein, Lisbeth A Guethlein, Patrick Hasenfeld, Glenn Hickey, Kendra Hoekzema, Sarah E Hunt, Matthew Jensen, Yunzhe Jiang, Sergey Koren, Youngjun Kwon, Chong Li, Heng Li, Jiaqi Li, Paul J Norman, Keisuke K Oshima, Benedict Paten, Adam M Phillippy, Nicholas R Pollock, Tobias Rausch, Mikko Rautiainen, Yuwei Song, Arda Söylev, Arvis Sulovari, Likhitha Surapaneni, Vasiliki Tsapalou, Weichen Zhou, Ying Zhou, Qihui Zhu, Michael C Zody, Ryan E Mills, Scott E Devine, Xinghua Shi, Michael E Talkowski, Mark J P Chaisson, Alexander T Dilthey, Miriam K Konkel, Jan O Korbel, Charles Lee, Christine R Beck, Evan E Eichler, Tobias Marschall · Nature

Original

Segmental duplications and their variation in a complete human genome

The complete T2T genome showed segmental duplications account for 7.0% of the genome (218 Mbp), up from the previous estimate of 5.4%, and that 91% of the newly resolved duplication sequence better represents human copy number variation.

2022 · Mitchell R Vollger, Xavi Guitart, Philip C Dishuck, Ludovica Mercuri, William T Harvey, Ariel Gershman, Mark Diekhans, Arvis Sulovari, Katherine M Munson, Alexandra P Lewis, Kendra Hoekzema, David Porubsky, Ruiyang Li, Sergey Nurk, Sergey Koren, Karen H Miga, Adam M Phillippy, Winston Timp, Mario Ventura, Evan E Eichler · Science (New York, N.Y.)

Original

A complete diploid human genome benchmark for personalized genomics.

A telomere-to-telomere benchmark for the diploid HG002 genome achieved near-perfect accuracy across 99.4% of the genome, adding 15.3% of sequence absent from prior benchmarks and showing de novo assembly outperforms variant calling by an order of magnitude.

2025 · Nancy F Hansen, Nathan Dwarshuis, Hyun Joo Ji, Arang Rhie, Hailey Loucks, Glennis A Logsdon, Mitchell R Vollger, Jessica M Storer, Juhyun Kim, Eleni Adam, Nicolas Altemose, Dmitry Antipov, Mobin Asri, Sofia Barreira, Stephanie C Bohaczuk, Andrey V Bzikadze, Sara A Carioscia, Andrew Carroll, Kuan-Hao Chao, Yanan Chu, Arun Das, Peter Ebert, Adam English, Mark Fleharty, Laura E Fleming, Giulio Formenti, Andrea Guarracino, Gabrielle A Hartley, Katharine Jenike, Jenna Kalleberg, Yu Kang, Robert King, Josipa Lipovac, Mira Mastoras, Matthew W Mitchell, Shloka Negi, Nathan D Olson, Keisuke K Oshima, Luis F Paulin, Brandon D Pickett, David Porubsky, Jane Ranchalis, Desh Ranjan, Mikko Rautiainen, Harold Riethman, Robert D Schnabel, Fritz J Sedlazeck, Kishwar Shafin, Mile Sikic, Steven J Solar, Alexander P Sweeten, Winston Timp, Justin Wagner, DongAhn Yoo, Ying Zhou, Erik Garrison, Evan E Eichler, Michael C Schatz, Andrew B Stergachis, Rachel J O'Neill, Karen H Miga, Steven L Salzberg, Sergey Koren, Justin M Zook, Adam M Phillippy · bioRxiv : the preprint server for biology

Original

The complete sequence of a human genome

The T2T Consortium produced a complete 3.055 billion-base pair human genome sequence, adding nearly 200 million base pairs of sequence containing 1,956 gene predictions, 99 of which are predicted to be protein-coding.

2022 · Sergey Nurk, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V Bzikadze, Alla Mikheenko, Mitchell R Vollger, Nicolas Altemose, Lev Uralsky, Ariel Gershman, Sergey Aganezov, Savannah J Hoyt, Mark Diekhans, Glennis A Logsdon, Michael Alonge, Stylianos E Antonarakis, Matthew Borchers, Gerard G Bouffard, Shelise Y Brooks, Gina V Caldas, Nae-Chyun Chen, Haoyu Cheng, Chen-Shan Chin, William Chow, Leonardo G de Lima, Philip C Dishuck, Richard Durbin, Tatiana Dvorkina, Ian T Fiddes, Giulio Formenti, Robert S Fulton, Arkarachai Fungtammasan, Erik Garrison, Patrick G S Grady, Tina A Graves-Lindsay, Ira M Hall, Nancy F Hansen, Gabrielle A Hartley, Marina Haukness, Kerstin Howe, Michael W Hunkapiller, Chirag Jain, Miten Jain, Erich D Jarvis, Peter Kerpedjiev, Melanie Kirsche, Mikhail Kolmogorov, Jonas Korlach, Milinn Kremitzki, Heng Li, Valerie V Maduro, Tobias Marschall, Ann M McCartney, Jennifer McDaniel, Danny E Miller, James C Mullikin, Eugene W Myers, Nathan D Olson, Benedict Paten, Paul Peluso, Pavel A Pevzner, David Porubsky, Tamara Potapova, Evgeny I Rogaev, Jeffrey A Rosenfeld, Steven L Salzberg, Valerie A Schneider, Fritz J Sedlazeck, Kishwar Shafin, Colin J Shew, Alaina Shumate, Ying Sims, Arian F A Smit, Daniela C Soto, Ivan Sović, Jessica M Storer, Aaron Streets, Beth A Sullivan, Françoise Thibaud-Nissen, James Torrance, Justin Wagner, Brian P Walenz, Aaron Wenger, Jonathan M D Wood, Chunlin Xiao, Stephanie M Yan, Alice C Young, Samantha Zarate, Urvashi Surti, Rajiv C McCoy, Megan Y Dennis, Ivan A Alexandrov, Jennifer L Gerton, Rachel J O'Neill, Winston Timp, Justin M Zook, Michael C Schatz, Evan E Eichler, Karen H Miga, Adam M Phillippy · Science (New York, N.Y.)

Original

Inversion polymorphism in a complete human genome assembly

Remapping data from 41 genomes against the T2T reference found a ~21% increase in sensitivity for detecting inversions, identifying 26 misorientations in the older GRCh38 reference.

2023 · David Porubsky, William T. Harvey, Allison N. Rozanski, Jana Ebler, Wolfram Höps, Hufsah Ashraf, Patrick Hasenfeld, Benedict Paten, Ashley D. Sanders, Tobias Marschall, Jan O. Korbel, Evan E. Eichler · Genome biology

Original

Complete genomic and epigenetic maps of human centromeres

Complete maps of human centromeres revealed they constitute 6.2% of the genome (189.9 megabases) and uncovered multimegabase structural rearrangements and high degrees of structural, epigenetic, and sequence variation across individuals.

2022 · Nicolas Altemose, Glennis A Logsdon, Andrey V Bzikadze, Pragya Sidhwani, Sasha A Langley, Gina V Caldas, Savannah J Hoyt, Lev Uralsky, Fedor D Ryabov, Colin J Shew, Michael E G Sauria, Matthew Borchers, Ariel Gershman, Alla Mikheenko, Valery A Shepelev, Tatiana Dvorkina, Olga Kunyavskaya, Mitchell R Vollger, Arang Rhie, Ann M McCartney, Mobin Asri, Ryan Lorig-Roach, Kishwar Shafin, Julian K Lucas, Sergey Aganezov, Daniel Olson, Leonardo Gomes de Lima, Tamara Potapova, Gabrielle A Hartley, Marina Haukness, Peter Kerpedjiev, Fedor Gusev, Kristof Tigyi, Shelise Brooks, Alice Young, Sergey Nurk, Sergey Koren, Sofie R Salama, Benedict Paten, Evgeny I Rogaev, Aaron Streets, Gary H Karpen, Abby F Dernburg, Beth A Sullivan, Aaron F Straight, Travis J Wheeler, Jennifer L Gerton, Evan E Eichler, Adam M Phillippy, Winston Timp, Megan Y Dennis, Rachel J O'Neill, Justin M Zook, Michael C Schatz, Pavel A Pevzner, Mark Diekhans, Charles H Langley, Ivan A Alexandrov, Karen H Miga · Science (New York, N.Y.)

Original

EGASP: the human ENCODE Genome Annotation Assessment Project.

The EGASP experiment found the best computational methods correctly predicted at least one transcript for ~70% of annotated genes, but multiple-transcript accuracy (accounting for alternative splicing) reached only ~40-50%.

2006 · Roderic Guigó, Paul Flicek, Josep F Abril, Alexandre Reymond, Julien Lagarde, France Denoeud, Stylianos Antonarakis, Michael Ashburner, Vladimir B Bajic, Ewan Birney, Robert Castelo, Eduardo Eyras, Catherine Ucla, Thomas R Gingeras, Jennifer Harrow, Tim Hubbard, Suzanna E Lewis, Martin G Reese · Genome biology

Original

Characterization of large-scale genomic differences in the first complete human genome

Analysis of large-scale differences between T2T-CHM13 and GRCh38 found 67 additional discrepant regions (~21.6 Mbp) and identified a deletion in the KLRC gene cluster associated with natural killer cell differentiation in ~20% of humans.

2023 · Xiangyu Yang, Xuankai Wang, Yawen Zou, Shilong Zhang, Manying Xia, Lianting Fu, Mitchell R Vollger, Nae-Chyun Chen, Dylan J Taylor, William T Harvey, Glennis A Logsdon, Dan Meng, Junfeng Shi, Rajiv C McCoy, Michael C Schatz, Weidong Li, Evan E Eichler, Qing Lu, Yafei Mao · Genome biology

Original

A genomic mutational constraint map using variation in 76,156 human genomes

Aggregating 76,156 human genomes from gnomAD built a genome-wide constraint map, showing that constrained non-coding regions are enriched for known regulatory elements and variants implicated in complex diseases.

2023 · Siwei Chen, Laurent C. Francioli, Julia K. Goodrich, Ryan L. Collins, Masahiro Kanai, Qingbo Wang, Jessica Alföldi, Nicholas A. Watts, Christopher Vittal, Laura D. Gauthier, Timothy Poterba, Michael W. Wilson, Yekaterina Tarasova, William Phu, Riley Grant, Mary T. Yohannes, Zan Koenig, Yossi Farjoun, Eric Banks, Stacey Donnelly, Stacey Gabriel, Namrata Gupta, Steven Ferriera, Charlotte Tolonen, Sam Novod, Louis Bergelson, David Roazen, Valentin Ruano-Rubio, Miguel Covarrubias, Christopher Llanwarne, Nikelle Petrillo, Gordon Wade, Thibault Jeandet, Ruchi Munshi, Kathleen Tibbetts, Maria Abreu, Carlos A. Aguilar Salinas, Tariq Ahmad, Christine M. Albert, Diego Ardissino, Irina M. Armean, Elizabeth G. Atkinson, Gil Atzmon, John Barnard, Samantha M. Baxter, Laurent Beaugerie, Emelia J. Benjamin, David Benjamin, Michael Boehnke, Lori L. Bonnycastle, Erwin P. Bottinger, Donald W. Bowden, Matthew J. Bown, Harrison Brand, Steven Brant, Ted Brookings, Sam Bryant, Sarah E. Calvo, Hannia Campos, John C. Chambers, Juliana C. Chan, Katherine R. Chao, Sinéad Chapman, Daniel I. Chasman, Rex Chisholm, Judy Cho, Rajiv Chowdhury, Mina K. Chung, Wendy K. Chung, Kristian Cibulskis, Bruce Cohen, Kristen M. Connolly, Adolfo Correa, Beryl B. Cummings, Dana Dabelea, John Danesh, Dawood Darbar, Phil Darnowsky, Joshua Denny, Ravindranath Duggirala, Josée Dupuis, Patrick T. Ellinor, Roberto Elosua, James Emery, Eleina England, Jeanette Erdmann, Tõnu Esko, Emily Evangelista, Diane Fatkin, Jose Florez, Andre Franke, Jack Fu, Martti Färkkilä, Kiran Garimella, Jeff Gentry, Gad Getz, David C. Glahn, Benjamin Glaser, Stephen J. Glatt, David Goldstein, Clicerio Gonzalez, Leif Groop, Sanna Gudmundsson, Andrea Haessly, Christopher Haiman, Ira Hall, Craig L. Hanis, Matthew Harms, Mikko Hiltunen, Matti M. Holi, Christina M. Hultman, Chaim Jalas, Mikko Kallela, Diane Kaplan, Jaakko Kaprio, Sekar Kathiresan, Eimear E. Kenny, Bong-Jo Kim, Young Jin Kim, Daniel King, George Kirov, Jaspal Kooner, Seppo Koskinen, Harlan M. Krumholz, Subra Kugathasan, Soo Heon Kwak, Markku Laakso, Nicole Lake, Trevyn Langsford, Kristen M. Laricchia, Terho Lehtimäki, Monkol Lek, Emily Lipscomb, Ruth J. F. Loos, Wenhan Lu, Steven A. Lubitz, Teresa Tusie Luna, Ronald C. W. Ma, Gregory M. Marcus, Jaume Marrugat, Kari M. Mattila, Steven McCarroll, Mark I. McCarthy, Jacob L. McCauley, Dermot McGovern, Ruth McPherson, James B. Meigs, Olle Melander, Andres Metspalu, Deborah Meyers, Eric V. Minikel, Braxton D. Mitchell, Vamsi K. Mootha, Aliya Naheed, Saman Nazarian, Peter M. Nilsson, Michael C. O’Donovan, Yukinori Okada, Dost Ongur, Lorena Orozco, Michael J. Owen, Colin Palmer, Nicholette D. Palmer, Aarno Palotie, Kyong Soo Park, Carlos Pato, Ann E. Pulver, Dan Rader, Nazneen Rahman, Alex Reiner, Anne M. Remes, Dan Rhodes, Stephen Rich, John D. Rioux, Samuli Ripatti, Dan M. Roden, Jerome I. Rotter, Nareh Sahakian, Danish Saleheen, Veikko Salomaa, Andrea Saltzman, Nilesh J. Samani, Kaitlin E. Samocha, Alba Sanchis-Juan, Jeremiah Scharf, Molly Schleicher, Heribert Schunkert, Sebastian Schönherr, Eleanor G. Seaby, Svati H. Shah, Megan Shand, Ted Sharpe, Moore B. Shoemaker, Tai Shyong, Edwin K. Silverman, Moriel Singer-Berk, Pamela Sklar, Jonathan T. Smith, J. Gustav Smith, Hilkka Soininen, Harry Sokol, Rachel G. Son, Jose Soto, Tim Spector, Christine Stevens, Nathan O. Stitziel, Patrick F. Sullivan, Jaana Suvisaari, E. Shyong Tai, Kent D. Taylor, Yik Ying Teo, Ming Tsuang, Tiinamaija Tuomi, Dan Turner, Erkki Vartiainen, Marquis Vawter, Lily Wang, Arcturus Wang, James S. Ware, Hugh Watkins, Rinse K. Weersma, Ben Weisburd, Maija Wessman, Nicola Whiffin, James G. Wilson, Ramnik J. Xavier, Anne O’Donnell-Luria, Matthew Solomonson, Cotton Seed, Alicia R. Martin, Michael E. Talkowski, Heidi L. Rehm, Mark J. Daly, Grace Tiao, Benjamin M. Neale, Daniel G. MacArthur, Konrad J. Karczewski · Nature

Original

A complete reference genome improves analysis of human genetic variation

Using the T2T-CHM13 reference universally improved read mapping and variant calling for thousands of globally diverse samples, eliminating tens of thousands of spurious variants per sample and reducing false positives in 269 medically relevant genes by up to a factor of 12.

2022 · Sergey Aganezov, Stephanie M Yan, Daniela C Soto, Melanie Kirsche, Samantha Zarate, Pavel Avdeyev, Dylan J Taylor, Kishwar Shafin, Alaina Shumate, Chunlin Xiao, Justin Wagner, Jennifer McDaniel, Nathan D Olson, Michael E G Sauria, Mitchell R Vollger, Arang Rhie, Melissa Meredith, Skylar Martin, Joyce Lee, Sergey Koren, Jeffrey A Rosenfeld, Benedict Paten, Ryan Layer, Chen-Shan Chin, Fritz J Sedlazeck, Nancy F Hansen, Danny E Miller, Adam M Phillippy, Karen H Miga, Rajiv C McCoy, Megan Y Dennis, Justin M Zook, Michael C Schatz · Science (New York, N.Y.)

Original