Researchers have uncovered a mysterious gene in the genetic code of the coronavirus SARS-CoV-2 – a segment virtually hidden from view in the virus's genome, and largely overlooked until now.

The newly identified gene – called ORF3d – is an example of what's called an overlapping gene: a kind of 'gene within a gene' that's effectively concealed in a string of nucleotides, because of the way it overlaps the coded sequences of other genes.

"In terms of genome size, SARS-CoV-2 and its relatives are among the longest RNA viruses that exist," explains bioinformatician Chase Nelson from the American Museum of Natural History.

"They are thus perhaps more prone to 'genomic trickery' than other RNA viruses."

Viruses are actually quite prone to hosting overlapping genes, so it's not exactly a shocking discovery. Whether ORF3d truly represents genomic trickery remains to be seen, but in any case, it's certainly tricky to see.

Overlapping genes are difficult to identify in genetic sequences, as genomic scan systems can often miss them when running through strings of genetic code: programmed to pick up individual genes, but not necessarily seeing overarching instructions shared between the nucleotides of adjacent genes in a sequence.

In the context of viruses like SARS-CoV-2, that could make for a serious blind spot. Scientists have been racing to understand as much as possible about this devastating virus since early this year, and while some aspects of its genetic make-up have been elucidated (including the firm consensus that it was not 'made in a lab'), much remains that we just don't know yet.

"Missing overlapping genes puts us in peril of overlooking important aspects of viral biology," Nelson says.

"Overlapping genes may be one of an arsenal of ways in which coronaviruses have evolved to replicate efficiently, thwart host immunity, or get themselves transmitted."

As for ORF3d, there's much yet to learn about why it's there, lurking in the genome and straddling other genes.

Scanning through genomic databases, the researchers found the gene has been identified before, but only in one variant of coronavirus that affects pangolins (found in Guangxi, China).

It has also previously been misclassified as an unrelated gene, ORF3b – which is present in other coronaviruses, including SARS-CoV – but they are not actually the same thing.

"The two genes are unrelated and encode entirely different proteins," Nelson says. "This means that knowledge about SARS-CoV ORF3b should not be applied to SARS-CoV-2 ORF3d."

One thing we do know about the mysterious gene, based on previous blood work with human COVID-19 patients, is that ORF3d does elicit a strong antibody response.

As for whether T-cells would also be triggered – or what other viral purposes the overlapping ORF3d might have – we're still in the dark. It might be relatively benign. It might not be.

"We don't yet know its function or if there's clinical significance," Nelson says.

"But we predict this gene is relatively unlikely to be detected by a T-cell response, in contrast to the antibody response. And maybe that has something to do with how the gene was able to arise."

One thing's for sure. In a virus that only has about 15 known genes, the discovery of another one – let alone an overlapping gene – is a significant development. Just how significant, scientists will now try to find out.

The findings are reported in eLife.