The most populous country in the world, India, has long been left out of genetic studies, leaving a massive gap in our understanding of human origins and appreciation of the genetic diversity of our species.

A new study diving 50,000 years into the past by analyzing thousands of genomes has sought to change that, revealing several surprising threads in the rich tapestry of modern India's ancestry.

University of California Berkley population geneticist Elise Kerdoncuff and colleagues sequenced the DNA of 2,762 peple from India, including individuals from most geographic regions, in both rural and urban areas, and speakers of all major languages, and tribal and caste groups, to capture the extraordinary diversity of the country.

They wanted to answer key questions such as; when did modern humans first arrive in India from Africa? Were they part of the major mainland migration out of Africa or did they migrate earlier, perhaps along a coastal route? And what traces remain of now-extinct archaic humans, the Neanderthals and Denisovans, in Indian populations?

"These analyses provide a detailed view of the population history of India and underscore the value of expanding genomic surveys to diverse groups outside Europe," Kerdoncuff and colleagues write in their preprint, which has been posted to bioRxiv ahead of peer review.

Previous research has shown most Indians derive ancestry from three ancestral groups: descendants of ancient Iranian farmers, who arrived on the Indian subcontinent sometime between 4700 and 3000 BCE; herders from the Eurasian steppe region who moved into India between 1900 and 1500 BCE; and indigenous South Asian hunter-gatherers who had been there much longer.

Kerdoncuff and colleagues found that ancestry to those three groups varied widely among modern Indian individuals, yet clear patterns emerged.

The proportion of people's ancestry associated with Andamanese hunter-gatherers, for example, was highest in the south and lowest in the north of India, and higher in certain language and caste groups.

"This highlights that the ancient admixture events are related to the spread of languages and the history of the traditional caste system in India," Kerdoncuff and colleagues write.

Modeling how ancient DNA previously extracted from Iranian-related groups could have given rise to the genomic sequences of present-day Indians sampled in the study, the researchers also found that the most likely scenario was an influx of farmers from Sarazm, an ancient agricultural hub in what is today Tajikistan.

Archaeological studies have previously pointed to trade connections between Sarazm and South Asia, but it wasn't a one-way link: Kerdoncuff and colleagues found one individual among the ancient Iranian ancestors who carried traces of Indian genetics too.

"Societies were far more connected in deep time than most have given them credit for," Michael Frachetti, an archaeologist at Washington University in St. Louis who was not involved in the work, told journalist Michael Price at Science.

Going further back in time, Kerdoncuff and colleagues found modern Indians derive 1-2 percent of their ancestry from archaic hominins, Neanderthals and Denisovans, which is similar to Europeans and Americans.

But in a surprising discovery, the researchers found that the analyzed genomes contain a far wider diversity of Neanderthal and Denisovan genes, compared to other sampled populations.

"Strikingly, [about] 90.7 percent of worldwide Neanderthal sequences are seen in India" and "around 51 percent of Denisovan sequences are unique to India," Kerdoncuff and colleagues write.

Lastly, to their central question of where Indians originate from, the researchers found most of the genetic variation in Indians stems from a single major migration out of Africa that occurred around 50,000 years ago, with earlier migration waves contributing little genetic material to modern Indian populations.

The study, which has not yet been peer reviewed, is available on the preprint server bioRxiv.