Last year, scientists unveiled the most complete gapless sequence of the human genome ever produced – but it was missing one small piece: the Y chromosome.
Now, the smallest member of the human chromosome family has been fully sequenced, completing a puzzle that's taken three decades to solve.
The result is an all-encompassing human reference genome, one that could now contain secrets about male fertility.
"Now that we have this 100 percent complete sequence of the Y chromosome, we can identify and explore numerous genetic variations that could be impacting human traits and disease in a way that we weren't able to do before," says Dylan Taylor, a geneticist at Johns Hopkins University and one of the study authors.
The Y chromosome contains lots of repetitive sequences – including a few long palindromes – that have made it largely 'unreadable' until now.
Led by genomicist Arang Rhie from the US National Human Genome Research Institute, the aptly named Telomere-to-Telomere consortium used advanced sequencing techniques and newly developed bioinformatic algorithms to stitch long stretches of DNA together, finally mapping the Y chromosome in full.
"We knew we had an incomplete picture up until now," says John Hopkins University computational biologist Rajiv McCoy in what could be considered a slight understatement, given the previous draft of the Y chromosome was missing more than half of its bases.
Those gaps, many of which spanned genes relating to sperm production, led to all sorts of incorrect assumptions being made in other studies. Some previously unknown human Y sequences were, for example, mistakenly thought to be traces of bacterial DNA contaminating samples.
"But we can now see the entire genome from end to end for the first time," McCoy says.
The team filled in more than 30 million 'letters' in the DNA sequence to assemble the Y chromosome in its entirety: all 62,460,029 base pairs. They also corrected multiple errors in previously sequenced sections and discovered 41 new protein-coding genes.
"The biggest surprise was how organized the repeats are," says Adam Phillippy, a computer scientist at the US National Human Genome Research Institute.
"Nearly half of the chromosome is made of alternating blocks of two specific repeating sequences known as satellite DNA. It makes a beautiful, quilt-like pattern."
In a second study led by geneticist Pille Hallast from the Jackson Laboratory, researchers went one step further, using the reference sequence to assemble human Y chromosomes from 43 male individuals, half of whom represented African lineages.
Together, the assemblies spanned 183,000 years of human evolution and revealed some surprising variations in the Y chromosome.
For one, the Y chromosomes were vastly different sizes, ranging from 45.2 million to 84.9 million base pairs in length.
There were also striking structural differences: the precise sequences of genes were conserved (so they still encoded the right proteins) but sometimes larger sections of DNA were flipped, oriented in the opposite direction along the Y chromosome.
"When you find variation that you haven't seen before, the hope is always that those genomic variants will be important for understanding human health," says Phillippy.
Recently, genes on the Y chromosome have been implicated in aggressive forms of common cancers in men, while Y chromosome loss has been found to drive the growth of bladder cancers. But we don't know what else we've overlooked.
A new era of personalized medicine beckons if sequencing technologies keep advancing, allowing whole genomes – not just select sections – to be sequenced cheaply.
But genome sequencing could exacerbate healthcare disparities if historical injustices and the lack of diversity in research studies aren't resolved.
"Ultimately, as the complete, accurate and gapless assembly of diploid human genomes becomes routine, we expect that 'reference genomes' will become known simply as 'genomes'," the researchers conclude.