10 Genetic landscape of India

Dr. Siuli Mitra

epgp books

 

CONTENTS:

 

1.      Learning outcomes: At the end of the module the reader will know

 

2.      A brief on ethnohistory of populations of the Indian subcontinent

 

3.      Lessons from Mitochondrial diversity

 

4.      Lessons from Y chromosomal diversity

 

5.      Lessons from Autosomal DNA diversity

 

6.      Genetic outliers among populations of India

 

7.      The Indian Genome Variation Consortium: A brief

 

1.  Learning outcomes: At the end of the module the reader will know

  • Why the Indian subcontinent is considered a natural laboratory for geneticists
  • The genetic markers mostly used to study genetic diversity in India
  • Populations that have been mostly studied and the reason that makes them genetically interesting
  • Outcomes of genetic diversity studies on populations in India

 

The Indian subcontinent is home to immense diversity: biological, cultural, demographic, ethnic and linguistic nurtured by varied geographical attributes. These have led to the stratification of populations across the country into tribal and non-tribal groups, linguistic groups and religious groups among the primary ones. Attempts to study the resulting differentiation in populations have lately focused on population genetic approaches to increase the accuracy of investigations. These attempts have helped draw a genetic landscape of Indian populations painting a not complete but a clear picture reflecting the genetic structure of Indian populations. Some of the numerous aspects studied have been included in the following sections but first of all the ethnic and linguistic elements have been summarized which form the backdrop of most of the genetic diversity studies in Indian populations.

 

2.    The Indian subcontinent: A natural genetic laboratory

 

The genetic landscape of India is a result of the peopling of the subcontinent by various waves of migrations, several indigenous groups and their interactions. The geographic position of the Indian subcontinent makes it a possible corridor to the early dispersal of modern humans which began from Africa about 100,000 years ago. But the exact date of the arrival and habitation of modern humans in India is still disputed. The traces of first modern humans in Eurasia were however observed to be of around 30,000-50,000 BP (Before Present). Middle and Upper Palaeolithic tools of around the same time were also found in India.

 

The social stratifications due to the caste system, dialects spoken and religions followed have created divergent groups that are endogamous. A large chunk of the population follows the Hindu religion which divides the society into four castes which have further sub-divisions. About 8% of the population is constituted by tribal groups. Depending on language classification the Indian populations are grouped into Indo-European, Austro-Asiatic, Dravidian and Tibeto-Burman populations. While tribal populations fall into all the four groups, non-tribal groups speak Indo-European or Dravidian languages only. Most of the populations of India speak languages of the Indo-European family which also has languages spoken by western Eurasians. In fact the introduction of caste system was the result of conquests by Indo-European speaking invaders from Central Asia. The formation of the social structures led to the introduction of rules controlling the pattern of mate-exchange within and between groups. The limitation posed by these rules of matrimony led to the formation of endogamous populations which in turn resulted in cultural and genetic differentiation. Another belief regarding the demographics of the different language speakers is that Dravidian speaking populations were spread throughout the sub-continent prior to the arrival of the Indo-European speakers. After the latter invaded, they drove the Dravidians towards the southern part of the country and to this day Northern part of India has more speakers of Indo-European language family. An interesting aspect is also seen in the limited inhabitation of the Tibeto-Burman language speaking populations to the north-eastern part of the country. The speakers of the Austro-Asiatic family belong to two separate divisions. Members who speak languages of the Mundari sub-division of the Austro-Asiatic family dwell the eastern and central parts of the country. The speakers of the Mon-Khmer sub-division primarily inhabit the north-eastern parts of India.

 

An important aspect of population stratification is seen in the presence of religious groups like Muslim, Christian, Buddhism, Jainism etc. and migrant populations like Siddis (African migrants), Parsis and Iranis.

 

Several theories have been proposed on the origins of the social hierarchies among Indian populations. The first documented evidence of the caste system was found in the Rig Veda (1700-1100 BC). It has four groups: Brahman, Kshatriya, Vaishya and Sudra each assigned with a specific occupation in the society’s functioning. Each group has sub-divisions. The origin of the caste system was hypothesized to be the result of the Indo-Aryan invasion wherein the migrants replaced and pushed the indigenous Dravidian groups towards southern part of the country (Chaubey, 2010). The indigenous groups were later recruited into the caste groups. The tribalare believed to be the original inhabitants of India and so are important to understand the earliest settlement in the continent as well as the cultural transition that the populations in India have undergone. Their relative geographic and cultural isolation in comparison to non-tribal groups makes them one of the connecting links with the past.

 

All these factors, languages and ethnic affiliations sometimes demarcated by geographical restrictions, has made India into a natural laboratory for geneticists. Initial genetic studies after having identified the above diversifications, attempted investigating the extent of genetic differentiation by studying the genetic structure of Indian populations. This pursuit saw the evolution of classical to molecular markers. The deductions made from the empirical evidences have produced a genetic landscape of India that is becoming clearer with time. Few conflicting scenarios have arisen, some questions are unanswered and lot many populations still unexplored. We contemplate upon a few lessons provided by the molecular markers now widely in use.

 

3.   Lessons from Mitochondrial markers:

 

Mitochondrial DNA and human evolutionary genetics:

 

The unique properties of mitochondrial DNA (mtDNA) which makes it a suitable marker for studying human evolution and populations’ history are as follows:

 

a)      Multiple copies per cell

 

b)      Maternal inheritance

 

c)      Lack of recombination

 

d)     High mutation rate

 

Studying mtDNA variation:

 

Earlier studies were based on variation in mitochondrial restriction fragment length polymorphisms (RFLPs). With advent of sequencing methods complete sequencing of the mtDNA PCR products for the first hypervariable region (HVR I) of the control region of mtDNA was done. With the development of high throughput DNA sequencing technology whole mitochondrial genomes were sequenced as a part of population studies. The data obtained is analyzed using lineage based or population based approaches. Lineage based approaches study the population history by examining the diversity of haplogroups. Haplogroups are defined by a common set of mutations which tend to differ among regions. Population based approaches apply population genetic methods to data to answer queries on human prehistory, population origin and migrations. Among the most important findings from mitochondrial studies on human population histories have been those about the peopling of the New World, settlement of the Pacific, invasions in New Guinea and Australia and the settlement of Europe. The only drawback of studying mtDNA variation is that since it is a single locus reflecting only maternal history the variations could be because of chance effects or selection acting on it. Nevertheless, mtDNA markers have helped gain genetic insights on population histories many times supporting ethnohistorical accounts.

 

mtDNA landscape of India:

 

The mitochondrial markers are localized in the maternally inherited mitochondrial DNA (mtDNA) and so are used to infer the dynamics of maternal lineages. Initial studies used mitochondrial RFLPs and subsequently more fine resolution of lineages were done by sequencing hypervariable regions I and II of the mtDNA. The mitochondrial perspective of India’s role in the genetic history of modern humans can be understood by examining the dispersal of the two major haplogroups M and N among different populations of India. The mtDNA diversity in India is second only to that observed in Africa. Distribution of mtDNA haplotypes is uniform among different groups in India (Basu et al., 2003) and the extent of genetic variation of mtDNA lineages is restricted indicating a small founding population of females. Macrohaplogroups M, R and U (Central Asian origin) are observed in majority of the individuals in different groups of India. The U haplogroup has two deep rooted lineages, of which U2i is at high frequency among tribal groups and may be indigenous to India. Many sub-haplogroups of M, N and R haplogroups are rampant and at low frequencies within Indian populations. The presence of these three haplogroups supports the southern route theory. The U2 sub-haplogroups entry was dated after origin of M, N, R endorsing the migratory history of modern humans through south Asia.

 

4.    Lessons from Y chromosomal markers:

 

Y chromosomal DNA and evolutionary genetics:

 

The Y chromosome is carried only by the male individuals and so in evolutionary studies is important for carrying the male-specific markers. The chromosome is gene poor having only 27 protein-coding genes and included short interspersed nuclear elements (SINEs), endogenous retroviruses and segmental duplications as other genetic elements. The non-recombining portion of the Y chromosome forms 95% of the chromosome. This class of markers are located in the non-recombining portion of Y chromosome and so often called NRY markers. The NRY markers provide clues on the origin, divergence and movement of paternal lineages. The existing Y chromosomes have evolved from a single paternal ancestor.

 

It has been vested some unusual features in comparison to other evolutionarily important markers:

 

a.       A high mutation rate

 

b.      Higher sequence divergence between species

 

c.       Lower sequence diversity within species

 

The low within species diversity is attributed to the low effective size of the chromosome which increases the effect of genetic drift on it. The high genetic drift also accounts for the high between population diversity in humans. This makes it an important marker for studying geographical variations between human populations.

 

Studying Y chromosomal variation:

 

The main objective of Y chromosome research has been to find and compare the diversity of Y chromosome haplogroups in different populations. The classes of variations studied are base substitutions, duplication or deletion, insertion and tandem repeats like minisatellites and microsatellites. The findings from Y chromosome diversity studies have implications in evolutionary genetics, genealogical investigations, forensic work and medical research. Migratory history and admixture can be deduced for a population by studying the distribution of haplogroups.

 

Y chromosomal landscape of India:

 

Most of the societies in India are patrilocal where social mobility of women due to marriage is a norm. This reflected in Y chromosomal evidences which showed lack of male-mediated gene flow between groups (Bamshad et al., 1998; Basu et al., 2003).Y chromosome markers comparisons between castes and tribes show the uniqueness of these two groups (Ramana et al., 2001). The presence of some deep rooted Y linegaes in lower castes and tribes was inferred as a possible tribal origin of lower castes (Thanseem et al., 2006). Evidences from Y chromosome showed that groups inhabiting the northern part of India have close genetic affinities with those of west Asia and Central Asia (Mukherjee et al., 2001). But the caste groups in South India were found to be more similar to East Europeans than Asians (Bamshad et al., 2001). The Y haplogroups H, L, R2 and F* (asterisk indicates that the haplogroup is to be further resolved into sub-haplogroups) are at high frequencies in Indian groups both castes and tribes indicating the underlying genomic unity. A study on Y SNPs and STRs on diverse groups spread over South Asia provided evidences that the genetic influx from Central Asia on the pre-existing gene pool was low ruling out the recent gene flow from Central Asia. Y chromosome studies on the Austro-Asiatic languages speaking groups (who are believed to be the earliest settlers in the continent) revealed their paternal affinities with each other and with those in South East Asia (Kumar et al., 2007). Studies on Muslim populations showed them to be genetically closer to the nearby non-Muslim groups than to other Muslim groups which implied Muslim expansion to be a cultural transition (Basu et al., 2003; Terreros et al., 2007).

 

5.   Lessons from Autosomal DNA markers:

 

Autosomal DNA and evolutionary genetics:

 

The autosomal DNA markers have a more complex biparental mode of inheritance in comparison to mtDNA and NRY. Most of the genes in the human genome are located in the autosomes. The autosomal regions are subjected to functional constraints as they contain the maximum number of protein coding genes. The different genetic elements present in the autosomal genes that have been examined for evolutionary genetic studies are retroelements, RFLPs, SNPs and STRs. The autosomal markers have wide applicability in studies investigating human population structure and history. Many autosomal genes play important roles in metabolism. Identification of variants in these genes thus has implications in studying the population wise prevalence of disease conditions. For instance the Δ32 mutation in the CCR5 gene increases resistance against HIV infection in individuals. Populations with higher prevalence of this mutation are less susceptibility to infection by HIV.

 

Autosomal DNA studies in India:

 

Yet, some authors have used autosomal STRs to examine genetic affinities among Indian groups (Watkins et al., 2008). But the autosomal chromosomes have genes that play important roles in metabolism and so studying variation in these genes is important to draw the health landscape of India. The genetic variation in autosomal genes has been studied to achieve the following objectives:

 

a.       To find out the clusters with similar pattern of variation among different sub-groups of populations

 

b.      To examine the extent of genetic differentiation between clusters at different loci

 

c.       To assess the effect of ethnic, linguistic and geographic demarcations at individual loci

 

d.      To find out the haplotype diversity and linkage disequilibrium in these genes across populations

 

Population stratification has left genomic imprints at times uniformly across the entire genome and sometimes localized at certain genes and genomic portions. The assessment of these variations has been useful in understanding the pronounced effect of genetic heterogeneity while studying complex diseases among populations in India. A close examination revealed large allele frequency differences between populations at these loci reflecting strong founder effects preserved by strict endogamy. So many population-specific diseases have been reported where the culprit gene has also been identified like Madras motor neuron disease, Handigodu disease, pseudocholinesterase deficiency (Tamang and Thangaraj, 2012). Characteristic genic patterns like population-specific haplotypes have also been identified.

 

Example1. Certain mutations were found to have originated in India for example the 25 base pair deletion in the MYBPC3 gene.

 

Example2. A haplotype comprising restriction sites in the beta globin gene is linked with sickle cell gene and is called the Arab Indian haplotype due to its presence in and around Saudi Arabia. The haplotype is found in more than 90% of sickle cell anemic patients among tribal groups across different regions of India.

 

6.    Genetically interesting populations in India:

 

a.   Populations of Andaman and Nicobar Islands

 

Two groups of tribes inhabit these islands: Little Andaman and Great Andaman groups which are linguistic sub-divisions. Genetic revelations on these groups have helped getting clues on dispersal and evolution of modern humans after emerging out-of-Africa. Ancient DNA, mtDNA and Y chromosomal investigations showed affinities of these groups with Asian populations. Two novel and unique mtDNA lineages M31 and M32 with estimated age to be 65,000years were discovered. Another study suggested genetic uniqueness and long-term isolation of these groups from populations of South Asia (Reich et al., 2009).

 

b.   Populations of north-eastern states

 

The Tibeto-Burman speaking tribal groups inhabiting the states in the north-eastern part of India are among the last tribal groups to immigrate into the sub-continent. Genetic evidences show that these population entered from the north-eastern corridor and settled here. The absence of YAP (Y Alu polymorphism) insertion element in Tibeto-Burman groups as opposed to other groups in mainland supports their genetic separation. Phylogenetic analysis to resolve the affinities between the tribal populous Tibeto-Burman, Austro-Asiatic and Dravidian groups showed that the two latter groups were more closely related than the Tibeto-Burman group. The Tibeto-Burman groups, however, share considerable genetic constitution with Austro-Asiatic groups differentiated only at the Y chromosome variants. Genome-wide scanning showed their genetic proximity with Chinese populations.

 

c.    Siddis

 

Siddis (also known as Habshis) are a migrant population from East Africa who were brought to India as slaves by the Portuguese between the seventeenth and mid-nineteenth centuries. They mainly dwell the states of Gujarat, coastal Karnataka and Andhra Pradesh. Genetic investigations by researchers in India have attempted to find their population history using different sets of markers. The Siddis were found to have inherited the genetic heritage of Africans, Indians and Portuguese. The Y chromosomal and mtDNA markers helped trace the ancestry to Bantu language speakers of sub-Saharan Africa. Admixture studies are used to find the extent and time of occurrence of genetic admixing among populations that are known to have undergone genetic exchange. Admixture analysis of Siddis showed admixture from neighboring groups of South Asian nearly 200 years ago which is also the time when they were imported into the continent.

 

7.   The Indian Genome Variation Consortium: A brief

 

In an attempt to find the genetic underpinnings of the ethnic and linguistic diversity in geographically spread out populations of India six laboratories of the Council of Scientific and Industrial Research (CSIR) with financial support of Government of India slated an initiative and formed the Indian Genome Variation Consortium (IGVC). The Indian Genome Variation Consortium (2005) was a joint initiative to address several questions related to the distribution of genetic variation and its association with clinical phenotypes and response to drug consumption in Indian populations.

 

The important questions raised to be investigated by the IGVC are pertaining the following:

 

a)      Distribution of clinically important SNPs among populations

 

b)      Correlation of the distribution of these SNPs with ethnic, linguistic and geographic affiliations

 

c)      Identification of ancestry informative markers

 

d)     Relationship with HapMap project populations

 

e)      Differentiation of disease susceptibility, drug responsiveness and predisposition to infectious disease among populations

 

A total of 55 populations were recruited to represent the ethnic, linguistic and geographic diversity of the Indian populations. The subjects were screened for 405 SNPs spread across 75 genes and sequenced for a 5.2Mb long genomic segment of chromosome 22 covering 49 genes. Genes implicated in monogenic diseases and complex diseases were included in the study. The genes in chromosome 22 studied have roles in susceptibility towards schizophrenia and bipolar disorder. The other genes have been implicated in cancer, aging, cardiovascular diseases, neurological disorders, infection susceptibility and drug response.

 

The results on the genetic structuring of Indian populations obtained from the project are as follows:

 

a) Low population differentiation was indicated across most of the loci and it was high at a few. The extent of differentiation was higher among the tribal populations representing different linguistic families. Populations were more differentiated due to ethnic affiliations than geographic or linguistic factors and this was attributed to the isolation of tribal in comparison to non-tribal groups.

 

b) The populations of India group into five genetic clusters (Figure1)

 

c) On a global context, genetic affinities of Indian populations with other world populations was non-uniform and complex

 

d) Shared haplotypes were found in genes implicated in complex diseases

Figure1. Five genetic clusters underlying Indian populations (Adapted from IGVC, 2008)

 

The project had certain merits. The study by IGVC was more inclusive of the ethno linguistic diversity and genomic coverage than other studies. The spread of the sampling strategy added a feather to its cap. The final and a very important outcome of the study was the defining of populations into five genetic clusters.

 

In a nutshell genetic landscape of India indicates the following pattern:

 

a. A common and deep genetic heritage of the caste and tribal populations

 

b. The initial colonization of the subcontinent has been dated back to about 60,000 years ago that coincides with the time of first wave of migration out of Africa

 

c. Higher genetic affinities of the upper caste populations with the Central Asians

 

d. Geographically proximate populations are genetically similar

 

e. Genetic differentiation among tribal groups is greater than that among caste groups

 

f. Austro-Asiatic speaking tribal groups might be the earliest inhabitants of India

 

Summary

  • The Indian sub-continent has a genetic reservoir created by the immense cultural, demographic and linguistic variations with populations stratified into tribal and non-tribal groups speaking languages belonging to one of four linguistic families.
  • The geographical location and biological diversity facilitated multiple waves of migrations and subsequent gene flow into the sub-continent during different points of history.
  •  Thus the genetic landscape of India has been shaped up by the culturally, linguistically, biologically and ethnically diverse invaders along with pre-existent indigenous groups.
  • The mitochondrial diversity of India is defined by the dispersal of the two major haplogroups M and N and their numerous sub-haplogroups among different populations of India at varying frequencies.
  • Y chromosomal evidences showed lack of male-mediated gene flow between groups supporting restricted marital patterns due to social stratification.
  • Many population-specific diseases reported have left genomic imprints.
  • Populations like the tribes of Andaman and Nicobar island and north-east tribal groups are genetically divergent from the rest of the population groups in India.
  • The Indian Genome Variation Consortium is an inter-organizational initiative to understand the genetic structure in populations of India.

 

you can view video on Genetic landscape of India

References

  • Bamshadet al. 1998. Female gene flow stratifies Hindu castes. Nature 395: 651-652.
  • Bamshad et al. 2001. Population genetic structure in Indian Austro-Asiatic speakers: The role of landscape barriers and sex-specific admixture. MolBiolEvol 28: 1013-1024.
  • Basuet al. 2003. Ethnic India: a genomic view, with special reference to peopling and structure. Genome Res 13:2277-2290.
  • Kumar V et al. 2007. Y-chromosome evidence suggests a common paternal heritage ofAustro-Asiatic populations. BMC EvolBiol 7: 47.
  • Mukherjee et al. 2000. Congruence of genomicand ethnolinguistic affinities among five tribal populations of Madhya Pradesh, India. JGenet 79: 41-46.
  • Tamang and Thangaraj. Genomic view on the peopling of India. (2012). Investigative Genetics 3:20 (Review)
  • Thanseem et al. 2006. Genetic affinities among the lower castes and tribal groups of India:inference from Y-chromosome and mitochondrial DNA. BMC Genet 7: 42.
  • Watkins WS  et  al.(2008).Genetic  variation  in  South  Indian  castes:  evidence  from  Y-chromosome, mitochondrial,and autosomal polymorphisms. BMC Genetics 9: 86.