11 Genetic Markers in Anthropological Research

Dr. Siuli Mitra

epgp books

 

CONTENTS:

 

1.      Learning outcomes

 

2.      Genetic markers

 

3.      Genetic markers and the field of anthropological genetics

 

4.      Allozymes: The first genetic markers

 

5.      Introduction to molecular markers

 

6.      Detection methods

 

a)      DNA sequencing

 

b)      PCR RFLP

 

c)      Denaturing gradient gel electrophoresis

 

d)     Temperature gradient gel electrophoresis

 

e)      Single strand conformation polymorphism (SSCP)

 

f)       Heteroduplex analysis

 

7.      Types of polymorphisms

 

a)      Single-nucleotide polymorphisms (SNPs)

 

b)      Minisatellites

 

c)      Short-tandem repeat polymorphisms (STRPs or microsatellites)

 

d)     Copy Number Variations (CNVs)

 

8.      Uni-parental markers

 

a)      Mitochondrial markers

 

b)      Y chromosomal markers

 

9.      Autosomal markers

 

1.    Learning Outcomes

 

At the end of this section the reader will know

  • How the type of genetic markers being used evolved in the field of anthropological genetics?
  • Different categories of molecular genetic markers in use
  • The methods of detection of genetic markers
  • The application of genetic markers in answering anthropological queries

 

2.   Genetic markers

 

Genotyping is an important goal in studying an attribute of interest in genetics. It is used for distinguishing between the genotypes relevant to the variant forms of the trait. These variant forms of the trait are also called allelic forms. Genetic markers are harbingers of information on allelic variation at a locus and depend on the mutation process underlying their creation. The three classes of genetic markers available today are Allozymes, DNA polymorphisms and DNA repeats. With the advent of cost effective DNA sequencing methods the amount of human genetic variation catalogued is immense.

 

3. Genetic markers and the field of anthropological genetics

 

Genetic markers are genetic entities segregating independently and used to classify populations by their presence, absence or differences in frequency among populations (Crawford, 1973).Genetic markers are used to quantify genetic diversity in populations that has resulted due to the interplaying of evolutionary processes. The two discoveries of the blood group system (Landsteiner, 1900) and that of protein electrophoresis (Smithies, 1955) gave the impetus for genetic markers research. Subsequently other advancements in methodologies happened ranging for isolation of the genetic material to amplification of selected genomic regions and reading the composition and sequence of the genome. All these factors brought in unparalleled flow of information on genetic diversity among human populations described by genetic markers. Genetic markers found application in studying population structure and history, selection and admixture mapping. This module discusses the different categories of genetic markers that have been widely used to answer queries in anthropological research and their applications in evolutionary genetic studies. Before delving into the molecular markers a background of the classical markers is however important.

 

4.    Allozymes: The first genetic markers

 

The term “Allozymes” was obtained from the phrase allelic variants of enzymes. Protein variants of enzymes can be distinguished by separating them in gel electrophoresis according to size and charge differences caused by amino acid substitutions. This is the working principle of Allozymes markers. The bands showing Allozymes variants were visualized by treating the gels with enzyme specific stains that comprised ligands which acted as substrate for the enzyme, enzyme co-factors and oxidized salt. The use of classical markers to for studying genetic variation cannot be underestimated. The reason is that the observations from this group of markers on human populations showed immense within population polymorphism which led to the devising of the neutral theory of evolution. The setback that was suffered by this system of genetic markers is the small number of informative loci that could be studied. It had the merit of cost effectiveness and hence was used for genetic mapping of traits and in association studies till DNA sequencing became a reality.

 

5.    Introduction to molecular markers

 

The Allozymes markers had the demerit of being indirect way of studying DNA variation and were replaced by direct molecular markers as a consequence of the human genome project. The human genome project generated a human genome reference sequence. This was followed by the complete genome sequencing of individuals from different ethnic backgrounds (Frazer, 2009). The individual sequences varied from the reference sequence at many different locations on the genome. This unleashed the information about different classes of genetic variations. Human genetic variants broadly fall into two categories: single nucleotide variants and structural variants. The classes differ on the basis of nucleotide composition though there is no standard differentiation.

 

6.    Detection methods:

 

Before elaborating on the types of markers the different methods available are enumerated in this section:

 

a.   DNA sequencing

 

It is the most precise method available for variant determination. It allows the detection of sequence variants in multiple sequences for multiple individuals.

 

b.    PCR RFLP

 

PCR fragments are digested with a restriction enzyme and the fragment sizes determined by gel electrophoresis. Base pair substitutions at the restriction site lead to changes in the pattern of restriction fragments on the gel.

 

c.   Denaturing gradient gel electrophoresis

 

A double stranded DNA fragment obtained from PCR is made to migrate through a gradient of denaturing solvents. As the fragment migrates it gets denatured leading to a conformational change. The mobility of the fragment is reduced and it reaches a sequence specific position in the gel. The PCR fragments which are differing in sequence are characterized by specific denaturation conditions.

 

d.   Temperature gradient gel electrophoresis

 

The technique is same as that of the denaturing gradient gel only except for the denaturing solvents a temperature gradient is used for denaturation.

 

e.   Single strand conformation polymorphism (SSCP)

 

The working principle of SSCP is the difference in electrophoretic mobility of secondary structures formed by single stranded DNA fragments.

f.   Heteroduplex analysis

 

Heteroduplex DNA molecules are formed by intertwining of complementary DNA strands differing at a single (or few) bases. Heteroduplex analysis involves assaying differences in the electrophoretic mobility among heteroduplexes and homoduplexes.

 

7.    Types of polymorphisms

 

a.   Single-nucleotide polymorphisms (SNPs)

 

The most prevalent of polymorphisms occurs by a single base substitution and is called a single nucleotide polymorphism (SNP). In most cases, a SNP (pronounced snip) has two alternative forms (alleles) and is a result of a transition (purine to purine or pyrimidine to pyrimidine) mutation or a transversion (purine to pyrimidines) mutation. Insertion/deletions (In Del) of a single base or two bases are also common. When a SNP lies in the recognition site of a restriction enzyme, it is called a restriction fragment length polymorphism (RFLP) or a restriction site polymorphism (RSP) as the presence of the mutant allele activates the enzyme and results in formation of restriction fragments. The first genetic maps obtained by using DNA polymorphisms were based on RFLP markers. The SNPs are catalogued in the dbSNP of NCBI and designated with a reference SNP (rs) ID. The database is regularly updated as novel polymorphisms are added that are population-specific. About 10 million SNPs have been catalogued in the dbSNP of NCBI for use in genotyping platforms. Most of the dbSNP entries are in dels (2 of 12million entries) (Strachan & Read, 2003). SNPs can either produce an alteration in coding of an amino acid (non-synonymous substitution) or code for the same amino acid (synonymous substitution).

 

b.   Minisatellites

 

Minisatellites are markers composed of tandem repeats of DNA sequences with repeats of length 6-60 base pairs. The polymorphisms in minisatellites result from unequal crossing over or gene conversion events. The genomic DNA is first digested using restriction enzymes following which DNA probes containing a complementary mini satellite sequence are allowed to hybridize with the fragmented DNA. The minisatellites are highly polymorphic markers due to their length and this characteristic has found them application in DNA fingerprinting. But their use is limited to forensic investigations and paternity analysis and this group of markers is not widely used in population genetic analysis.

 

c.   Short-tandem repeat polymorphisms (STRPs or microsatellites)

 

About 50% of the human genome consists of repeat sequences that are both interspersed and in tandem. Satellites containing tandem repeats which are 1-6 nucleotides long are called microsatellites. This group of markers form the first PCR based markers. They are highly polymorphic and widely distributed in the euchromatic part of the genome. These markers are popular in investigations directed at mapping, paternity analysis and population genetics.

 

Microsatellites are formed by replication slippage in which the DNA polymerase responsible for replication of DNA slips and repeats the replication of previous sequences. The markers however has complex mutation patterns and create PCR artifacts complicating band scoring after gel electrophoresis.

 

While SNPs are molecular events that have remained stable over evolutionary time, tandem repeats are relatively recent (Strachan and Read, 2003). The interspersed repeat sequences are called short interspersed nuclear elements (SINEs) that are 100-300 base pairs long or the 6-8 kilo bases long interspersed nuclear elements (LINEs). While the satellite markers have been at the forefront of family and forensic studies, the SINEs have been widely used in studying human genome diversity. The Alu family of SINE class has been estimated to be 500,000 copies in humans and is only found in primates.

 

d. Copy Number Variations (CNVs)

 

These are sub-microscopic structural variations on chromosomes stretching to more than 1kb. They vary in the number of copies among different individuals. Although large in size, CNVs are not always pathogenic. This class of variations comprises intermediate sized insertions, deletions and inversions and large (≥50kb segments) copy number variations (CNVs) (Tuzun et al, 2005). CNVs are due to occurrence of identical or nearly identical sequences of length 1kb or larger (Feuk& Scherer, 2006), in some chromosomes (Frazer, 2009).Though SNPs are more common, CNVs account for the greatest number of nucleotides (more than 70% of variant bases; Frazer, 2009) that differ between two genomes. The CNVs constitute 0.5-1% of the genome of an individual and so act profoundly in evolution of genome and health. A CNV that occurs in more than 1% of the population is called a Copy Number polymorphism. An inversion forms when a segment of DNA is reversed in orientation with respect to the rest of the genome. A change in the position of a chromosomal segment within a genome in the same or a different chromosome, keeping the DNA content unchanged, leads to a translocation.The application of CNVs to study population history has been explored less in comparison to SNPs and microsatellites.

 

8.    Uni-parental markers

 

Basing on the mode of inheritance DNA markers can be classified as uniparental or bi-parental. Mitochondrial DNA and Y chromosomal markers are uniparentally transmitted; the former is maternally inherited while the latter is transmitted by fathers. Autosomal markers are transmitted by both the parents. Before a discussion on mtDNA markers and Y chromosomal markers one should be acquainted with the terms haplotypes and haplogroups. The combination of allelic states of a set of polymorphisms lying on the same DNA molecule (a chromosomal region) is called a haplotype. A haplogroup is a set of mtDNA or Y chromosomal haplotypes defined by slowly mutating polymorphisms.

 

i.    Mitochondrial markers

 

The human mitochondrial genome is a circular, double-stranded, 16, 569 base pairs long molecule present in hundred to thousand copies in each mitochondrion (Anderson et al. 1981). The mitochondrial DNA (mtDNA) codes for 13 subunits of the oxidative phosphorylation system of mitochondria, 2 ribosomal RNAs (rRNAs), and 22 transfer RNAs (tRNAs). The cellular localization of mitochondria in cytoplasm enables maternal inheritance as the egg cell provides cytoplasm for the zygote during fertilization. The mitochondrial DNA (mtDNA) has an effective population size of a quarter of that of autosomes. Features that are unique to mtDNA and that have made it the marker of choice for studying human genomic diversity in general and diversity in maternal lineages in specific are

 

(i)   High copy number

 

(ii)   Lack of recombination

 

(iii)   High mutation rate and

 

(iv)     Maternal inheritance.

 

 

The mtDNA has several SNPs, a subset of which is also RFLPs. Many of the SNPs are restricted to the control region (CR) that has a higher mutation rate than the rest of the genome (Stoneking, 2000). Mitochondrial DNA serves as a molecular clock because of the control region that carries genetic signals (promoters) needed for replication and transcription. Since much of this DNA segment is not vital to the survival of the mitochondrion or of the host cell, by studying the number and variety of base changes within this control region, geneticists can determine the relatedness between individuals. Using the mutation rate within the mitochondrial control region as a “molecular clock,” evolutionists can plot the course of evolution. The CR has two segments, hyper variable regions I (positions 16024-16383)and II (1-10000) which have been widely studied to typify the genetic structure of populations. The DNA sequence of the control region is termed hyper variable because it accumulates point mutations at approximately 10 times the rate of nuclear DNA. Most of the studies in which control region sequences have been studied have focused on intraspecific patterns of variability and phylogenetic relationships of closely related species, a prominent example being the study of human population history. Certain mitochondrial mutations have also been linked to occurrence of complex diseases (Taylor and Turnbull, 2005).

 

Studies involving mtDNA variation involve rapid sequencing of PCR products and RFLP analysis of mtDNA PCR products. But recently sequence analysis of the first hyper variable segment (HVR I) of the control region has taken over. There are two basic approaches to using mtDNA in studies of human evolution:

 

Lineage-based approach attempts to unravel the history of mtDNA lineages, called haplogroups. Haplogroups represent related groups of sequences that are defined by shared mutations and which tend to be specific for a region. Because haplogroups reflect shared ancestry of mtDNA, they can be helpful for estimating admixture proportions in populations inhabiting known routes of migration or originating from diverse geographical regions.

 

Population-based approach attempts to study the pre-history of individual populations, of Geographical regions, or of population migrations by using human population groups as the unit of study and applying population genetic method to the data.

 

Although HVRI sequence data alone do not have the resolving power to reveal all haplogroups, the high mutation rate of this segment ensures a sufficient number of polymorphic sites for population genetics.

 

ii. Y chromosomal markers

 

The availability of the near-complete chromosome sequence, plus many new polymorphisms, a highly resolved phylogeny and insights into its mutation processes, now provide new avenues for investigating human evolution. Y-chromosome research is expanding. The properties of the Y chromosome that make it the outlier within the genomic regions are that one-half consists of tandemly repeated Satellite DNA and the rest carries few genes, and most of it does not recombine. However, it is because of this that the Y chromosome is used for investigating recent human evolution from a male perspective and has specialized, but important, roles in medical and forensic genetics. While a small portion of Y chromosome, the pseudoautosomal region, recombines with X chromosome the remainder undergoes no recombination. This non-recombining portion of the Y chromosome therefore provides a second DNA region that we can assume undergoes no recombination. However, in comparison with the mitochondrial genome the polymorphism in Y chromosome is very low (Dorit, Akashi, Gilbert, 1995).

 

By convention Y chromosomes are designated to haplogroups or clades, those that are defined by only short tandem repeats are called haplotypes and descriptions of data combining both Y STRs and biallelic markers are referred to as lineages. Y-chromosome carries much diverse spectrum of mutations i.e., chromosomal changes that do occur from generation to generation that can be used as a site or sequence specific markers. As all markers are joined along the entire length of Y-chromosome (except for the pseudo autosomal region), a haplotype constructed from a number of different markers actually documents the history of Y chromosome. Though the mutation rate of Y-chromosome is low and variations are hard to find, rare mutations do occur.

 

 

Y chromosomal haplogroups among world populations:

 

In the Y chromosome haplogroup tree two primary splits lead to haplogroups A and B, the spread of which is restricted to Africa. The remaining haplogroups form three sub clusters that coalesce at the root of CR-M168 node, representing the majority of African varieties and the non-African haplogroups. This level of structuring of continental pools of Y-chromosomes includes the shared presence of haplogroup DE chromosomes in Africa and Asia, the non-African haplogroup C widely distributed in East Asia, Oceania and North America anda global distribution of another non-African cluster haplogroup F- M89 with its most prolific daughter group haplogroup K.Considerable regionalization of haplogroups is present in the sub-clades of F and K. Haplogroups F* and H are quite restricted to the Asian subcontinent whereas the center of gravity for haplogroups J and I is in Europe and the Middle East respectively. In East Asia haplogroups N and O that arise from the haplogroup K branch are the most frequent. Other important K affiliated haplogroups include Q in North East Asia and the Americas as well as African haplo-group R whose phylogeography spans North Africa and West Asia and manifest high frequencies in Europe.

 

Example1. Thangaraj et al., (2007) studied all Austro-Asiatic populations of India including the north-eastern Khasis. They analyzed the Y chromosome data for 1222 individuals from 25 Indian populations covering all sub-divisions of the Austro-Asiatic family. Nine haplogroups were found to be most prominent in these populations with O-M95 haplogroup attaining the maximum percentage. Except Khasi none of the samples showed O-M122.

 

Example2. Qamar etal., (1999) surveyed 9 Pakistani subpopulations for variation on the no recombining region of Y chromosome. The polymorphic sites examined were YAP, five SNPs and the tetra nucleotide microsatellite DYS19. Y-chromosomes carrying the YAP element (YAP+) were found in populations from southwestern Pakistan at frequencies ranging from 2% to 8%, whereas northeastern populations appear to lack YAP+ chromosomes.

 

9.    Autosomal markers

 

Autosomal markers are inherited from both the parents and can hence give more information on the diversity present in the gene pool under investigation. This group of markers gives more representative population genetic information.The autosomal markers undergo recombination unlike uniparental markers and under independent assortment when passed over to the offspring generation. The effective population size is greater than mtDNA and NRY markers. The genetic markers in autosomal genes are under great functional constraints due to their genic richness (autosomes have more genes in comparison to Y chromosome, mtDNA and X chromosome) and involvement of the genes in metabolism. Mutations in autosomal genes have the ability to havedeleterious effects on metabolism and so are under selective constraints. However, some of the mutations have a neutral effect on phenotype while others impart advantage in a particular environment. The frequency of the marker is hence, altered in tune with its effect on the phenotype.

 

The exodus of modern human out-of-Africa and their dispersal in the different continents of the world made them get exposed to varied environment. This included both physical and chemical changes. The metabolic machinery of the Homo sapiens adjusted to the alterations in the environment leaving genomic imprints in the underlying genes. These genetic traces are now being unraveled to draw inferences on the population’s history. Demographic changes in history are inferred by modifications that have occurred across the genome while genetic adaptation are traced by studying signatures of selection which are localized in the genome. Some autosomal genes have been widely studied across human populations to answer questions on history of human dispersal. Table1 lists some of these genes, their roles in metabolism.

Table1. Examples of autosomal genes studied in anthropological research, their role in metabolism and significant polymorphisms that have been studied

 

SUMMARY

 

  • Genetic markers are genetic entities segregating independently and used to classify populations by their presence, absence or differences in frequency among populations
  • Genetic markers are used to quantify genetic diversity in populations that has resulted due to the interplaying of evolutionary processes.
  • Single nucleotide substitutions (SNPs) are the most prevalent of polymorphisms and occur by a single base substitution
  • SNPs can produce an alteration in coding of an amino acid (non-synonymous substitution) or code for the same amino acid (synonymous substitution).
  • While SNPs are molecular events that have remained stable over evolutionary time, tandem repeats are relatively recent and hence used to study recent events in populations’ history.
  • mtDNA is characterized by features like maternal inheritance, non-recombination, high copy numbers etc. and that have made it the marker of choice for studying human genomic diversity in general and diversity in maternal lineages in specific.
  • Hypervariable regions I (positions 16024-16383) and II (1-10000) of mtDNA have been widely studied to typify the genetic structure of populations.
  • The Y chromosome is used for investigating recent human evolution from a male perspective and has specialized, but important, roles in medical and forensic genetics.
  • Autosomal markers are inherited from both the parents and give more representative population genetic information. APOE, LCT and G6PD genes are examples of autosomal genes the diversity in which have been studied to delineate human population history.

 

you can view video on Genetic Markers in Anthropological Research

References:

  • Anderson et al. 1981. Sequence and organization of the human mitochondrial genome. Nature 290(5806): 457-65.
  • Crawford, M. H. (1973). The use of genetic markers of the blood in the study of the evolution of human populations. In Methods and Theories of Anthropological Genetics, eds. M. H. Crawford & P. L. Workman. Albuquerque: University of New Mexico Press, pp. 1938.
  • Feuk et al. 2006. Structural variation in the human genome. Nature review genetics 7: 85-97.
  • Landsteiner, K. and Levine, P. (1927). Further observations on individual difference of human blood. Proceedings of the Society of Experimental Biology, 24, 9412.
  • Smithies, O. (1955). Zone electrophoresis in starch gels: Group variations in serum proteins of normal human adults. Biochemistry Journal, 61, 629.Frazer, 2009
  • Taylor and Turnbull. (2005). Mitochondrial DNA mutations in human disease. Nature Review Genetics 6(5): 389-402.
  • Thangaraj K et al. (2007). Y-chromosome STR haplotypes in two endogamous tribal populations of Karnataka, India. J. Forensic Sci. 52: 751-753.