29 Genetic Distance
Dr. Siuli Mitra
CONTENTS:
1. Learning outcomes
2. The concept of genetic distance
3. Definitions of genetic distance
4. Measures of genetic distance
5. Choosing a measure of genetic distance
6. Tree inference using genetic distances
7. How reliable is your tree?
8. Multivariate analysis and genetic distance
9. Calculation of ge netic distance
10. Example of genetic distance calculation (using MEGA)
1. Learning Outcomes:
The reader will be able to
a. Define genetic distance
b. Know the origin of the concept
c. Choose a suitable measure of genetic distance given a data type and genetic model
d. To calculate distance between DNA sequences using MEGA software
One of the important facets of studying human genetic diversity under the aegis of anthropology has been to decipher the underlying information on origin and dispersal of modern humans, population size and migration of past populations and human adaptation to environmental modifications. To compile sufficient evidence on genetic diversity and to draw meaningful inferences from it is essential to answer related queries. Research in this area is focused on attainment of two basic objectives.
a. Description of the patterns of diversity present across human populations, inter-population-comparison and distribution of diversity between sub-populations
b. Inferring historical events that lead to the existing patterns in diversity which also involves understanding the evolutionary history of portions of the human genome, genes or chromosomes besides the aims listed above
The first step in the process of understanding genetic diversity is acquiring genetic information keeping in tune with the research question. Genetic information was solely compiled in the form of protein polymorphisms (classical markers) in earlier studies and has evolved into more informative DNA polymorphisms in the post-genome sequencing era. Understanding genetic diversity follows a two- fold approach: observing the allele frequencies and delineating the evolutionary distance between alleles. The discovery of DNA polymorphisms responsible for allelic differences provided the foundation of the concept of evolutionary or genetic distance between alleles.
The genetic distance in essence has three functions:
(i) It provides a similarity index of two sequences or two populations
(ii) It provides an idea of the time elapsed since divergence of the two sequencesassuming a molecular clock (divergence between sequences occurs at an evolutionary rate such that it is related to the time since divergence)
(iii) If an evolutionary tree is obtained, branches of the tree represent the d istance between nodes that in turn represent sequences (or populations)
2. The concept of genetic distance
The evaluation of similarities and differences between populations is the basis of any population genetic study and necessary to draw inferences on the evolutionary history of humans. After identifying differences the next step is to compute these differences and generate an estimate. Genetic distance is used as an estimate to quantify differences both between individuals and groups of populations to adjudge the degree of similarity between them. It was first used by Sanghvi (1953). The genetic distance quantifies the genetic divergence between two molecular sequences (DNA or protein), two individuals (to study microevolution) or taxa (while studying macroevolution).A diagrammatic representation of sequence divergence and a hypothetical phylogenetic tree depicting the relationship between the populations carrying the sequences is given in Figure1. The genetic distance estimates also reflects the time when the divergence occurred. If the estimates are used to draw a phylogenetic tree the length of the branches represent the distance between the nodes (for example molecular sequences) being compared.
Figure1.
For example consider three populations X, Y and Z having a short genomic sequence 1, 2 and 3 respectively (Figure1). The following can be observations and deductions can be made basing on these figures.
a. In figure 1(i), Sequence 1 and sequence 2 differ at the 4t h and 6t hi.e. two nucleotide positions while sequence 3 differs at the 1st , 4th and 6thi.e. three nucleotide positions.
b. X population carries the ancestral genome sequence (Sequence 1) while Y and Z populations carry derived genomic sequences (2 and 3).
c. The longer branch length for Z population shows that the genetic distance between X and Z is more than that between X and Y.
Finding genetic distances between sequences or populations to infer questions on their genetic relatedness is however not this straightforward. Molecular data is lot more complex than considered in these examples and evolutionary relationships between them are made complex by the presence of different classes of sequence variations. Proportion of homologous sites at which two sequences vary from each other gives a p-distance. However a p-distance is an underestimate of the actual genetic distance because of the different nucleotide substitution rates of the different classes of variations. This problem is circumvented by using different nucleotide substitution models that make assumptions on the evolutionary rates of mutation. These models will not be discussed here and the module will be limited to the applications of genetic distance estimators and its use in making phylogenetic infere nce.
3. Definitions of genetic distance:
Nei (1987) expounded the concept by stating that genetic distance is “the extent of gene differences between populations or species that is measured by some numerical quantity.” A second definition was given by Beaumont et al., (1998). They defined genetic distance as a “quantitative measure of genetic difference, be it at the sequence level or the allele frequency level that is calculated between individuals, populations or species.”
4. Measures of genetic distance
The genetic distance should ideally take into consideration allele frequencies across all loci while studying genetic divergence between populations of the same species. However, only a subset of all loci is sampled in practice to estimate the genetic distance.
Several measures of genetic distance have been proposed by different researchers. The genetic differences between two sequences can also be quantified by calculating Wright’s FST . But FST is limited by its ability to make only pairwise comparisons. That is, only X and Y can be compared at one go. Introduction of genetic distance estimates circumvented this problem by enabling comparison of more than one pair at the same time.
The choice of a genetic distance measure depends on its application. They can be used for classifying populations or in making evolutionary comparisons (Nei, 1978).Three of the measures have been very widely used, Nei’s genetic distance, D (Nei, 1972), Cavalli-Sforza chord measure (Cavalli-Sforza and Edwards, 1967) and Reynolds, Weir and Cockerham’s genetic distance (1983).
The estimates of genetic distance are calculated basing on certain common assumptions listed below:
a. Gene substitutions occur independent of those in other lineages
b. Substitutions are independent of those occurring in other sites within the same lineage
c. Evolutionary comparisons are made based on the sequence as it is now and the previous changes are not accounted for (Markovian)
d. The rate of substitution is uniform across the sites
5. Choosing a measure of genetic distance:
a. Euclidean distance
Euclidean distances are the most common measures of genetic distance. They are straight line measures of distance used to approximate genetic distance between individuals. Individuals or populations are represented as points in space and the genetic distance is calculated as the geometric distance between the points. But this measure does not take into account the evolutionary change in populations. The relative values and not the absolute values of the distance have a biological meaning.
b. Mahalanobis D2 distance:
Mahalanobis proposed the D2 measure in 1930 during his studies on racial comparisons. It has since then played a fundamental role in statistical analysis for multiple measurements. It found application in numerical taxonomy, archaeology, medical diagnostics and remote sensing. The first field where the distance measure was applied was craniometry and anthropology.
The Mahalanobis’ D2 distance is a descriptive statistic which gives the relative measure of a data point’s distance (residual) from a common point. The distance is zero if the common point is at the mean of distribution of the data points. The Mahalanobis distance of an observation x = (x1, x2, x3,….,
xN) from a set of observations having mean μ = (μ1, μ2, μ3,…, μN) and covariance matrix S is given as D2 = (x – μ) T S-1 (x – μ)
c. Nei’s standard genetic distance
Masatoshi Nei gave the minimum genetic distance denoted as Dm to measure the minimum number of codon differences per locus.
If xi and yi are the frequencies of an allele A in populations X and Y respectively, the probability of drawing same and different alleles is given as јX = ∑xi2 , јY= ∑yi2 and jX = ∑xiyi, the net minimum codon differences between the two populations is calculated as
Dm = DXY(m) – [DX(m) + DY(m)]/2
Where DXY(m) = 1-JXY, DX(m) = 1-JX, and DY(m) = 1-JY. JXY, JX and JY are the respective averages of jXY, jX and jY across all genomic loci.
Nei also gave the standard genetic distance D in 1972 to quantify genetic divergence which has been used widely and is given as D = – logeI, where I = JXY/(JXJY)1/2. The value of I is 1 when the two populations have equal gene frequencies across all the loci and 0 when none of the alleles are shared. In other words, the estimate I denotes how similar the two populations are.
d. Cavalli-Sforza chord measure
Given by Cavalli Sforza and Edwards in 1967, the chord distance in a hypersphere indicates the distance between two populations. Each unit of the hypersphere is equivalent to one gene substitution. The chord distance is given by
The chord measure is one of the earliest of distance measures and is still used for reconstructing phylogenies of human populations using microsatellites data. However, the estimate DCE does not consider the role of mutation and assumes that changes are due to drift only to bring about the divergence and so is also called just a geometric distance. Nei(1983) gave a modification of the chord measure for the original one given by Sforza which is measured as
e. Weir and Cockerham’s genetic distance
This measure was given by Reynolds, Weir and Cockerham in 1983 to measure genetic divergence by using an ancestry coefficient θW. The estimate is similar to DCE and assumptions are the same as made for DCE.
6. Tree inference using genetic distances:
Genetic distance estimation done by pairwise comparisons is followed by generation of phylogenetic trees. While a genetic distance is an approximation of the mean number of changes that have occurred after two sequences diverged from their ancestor, a phylogenetic tree reflects the relationship between the genetic distances when multiple pairwise comparisons are made.
a. Cluster analysis
Cluster analysis or classification of individuals or populations based on certain defined characteristics like morphological, biochemical and genetic traits is an important approach used for graphical representation of relationship between individuals or populations.
The tree is built in a stepwise fashion wherein similar sequences or similar groups of sequences having least genetic distance are grouped together. When two units are grouped, they are treated as a single unit. From the units left, the pair with the highest similarity is identified. The process (or algorithm) continues till two units are left. A pairwise genetic distance matrix is generated as a result of the grouping which acts as the input file for the clustering algorithm. This in turn produces a phylogenetic tree. There are hierarchical and non-hierarchical clustering methods. Hierarchical methods group individuals having most similarities first and then keep adding individuals according in order of more to less similarity. Unweighted pair group method with arithmetic means (UPGMA) and weighted paired group method with arithmetic means (WPGMA) are popular examples. The non-hierarchical methods require mentioning number of clusters to be formed at the beginning of the program and so also called k- means clustering.
b. Minimum evolution:
This is used for the reconstruction of phylogenetic history through additive distances. The genetic distance between a pair of sequences equals the sum of the lengths of the branc hes connecting them. For this to be accomplished the following condition should be met for four sequences A, B, C and D
dAB + dCD ≤ max(dAC + dBD, dAD + dBC)
where dAB is the distance between A and B and so on.
Minimum evolution is used for the construction of additive trees. According to minimum evolution, the tree with the minimum length (sum of all the branches in the tree) is co nsidered to be the best estimate of the phylogeny. A popular method in use for estimating a tree with minimum evolution is the neighbor joining tree. It is the most common method used for tree construction currently.
Figure2. An example neighbor joining tree modified from Aggarwal et al., 2010.
The neighbor joining tree in figure2 shows the genetic relatedness between five Indian tribes (Siddis, Gonds, Varli, DangiKonkana and Kolgha) at the DRD2 gene (Aggarwal et al., 2010). In this tree Siddis and Gonds are more closely spaced in comparison to Siddis and Varli. This is suggestive of genetic closeness of Siddis and Gonds. A node depicts a point of divergence and can be considered to be a hypothetical ancestor. The node shown in Figure2 is a hypothetical ancestor of Varli, Siddi and Gond tribes from which these groups have diverged.
7. How reliable is your tree?
The confidence that the tree obtained is of high accuracy (closest to the actual tree) and precise, can be tested. The reliability of an inferred phylogenetic tree can be estimated statistically in two ways: bootstrap analysis and jackknifing.
Calculation of genetic distance involves consideration of the sampling variance and confidence interval. The bootstrap is applied to approximate sampling variance. The mathematical interpretation of a bootstrap is that if a given dataset supports a statistical result, then randomly chosen subsets of the data will also give the same output. In phylogenetics, bootstrap is conducted by resampling columns in an alignment with replacement. In other words, it initiates a resampling in which in a given dataset columns (a subset of data) are removed and the tree is rebuilt to check if the branches remain unchanged. The bootstrap values generated by using software to co nstruct a tree are obtained as percentages. A value of 80 means the same node was constructed from 80% of all the randomly obtained datasets. A value more than 70 is often considered reliable.
An alternative method is jackknifing or delete-half jackknifing. This process eliminates half of the sites from the original sequences. This resampling technique is repeated to obtain numerous sub-datasets. Each new sample is used to reconstruct a sub-tree. The frequency of each sub-tree is calculated. A 100% value of jackknifing implies that the sub-tree was obtained in all of the trees reconstructed.
8. Multivariate analysis and genetic distance:
A discussion on genetic distance to explain phylogenetic relationships is incomplete without mention of multivariate analysis that takes into consideration multiple variable (here loci) to study these inter-population relationships. The most widely used methods are principal component plots due to the ease of understanding through graphics.
Principal component plots reflect the genetic distance between individuals or populations using genetic data from multiple loci. One of the plots widely used are drawn by Multidimensional scaling (MDS). MDS in general creates pictorial representation of distance matrices and the plot can be o ne dimensional, two dimensional and three dimensional. Genetic distance matrices are converted into genetic ‘maps’ by using MDS analysis.
Principal Co-ordinate Analysis (PCO) and Principal Component Analysis (PCA) are also aimed at graphically representation of genetic relatedness between a definite number of individuals or populations. The output of these plots allows visualizing the clustering in populations which is difficult to infer from a matrix of numbers. A PCoA represents a similarity matrix of p elements (individuals, gene sequences etc.). The similarity matrix is computed from the distance matrix by using the formula Where, e_ij is the similarity between it h and jth samples and d_ij2 is the square of the distance between the it h and jt h samples. The populations cluster and are viewed by using scatter plots.
In a PCA, the first principal component (PC) is an eigen vector fitted to the correlation matrix obtained from molecular data on a population. The matrix explains most of the observed variatio n. The PCs that are subsequently extracted are perpendicular to the preceding PC and the eigen values account for the variation explained by a PC. An MDS differs from the other two plots in that it takes a dissimilarity (distance) matrix as the input file.
Figure1 shows an example output from Kshatriya et al., 2011. The different groups of populations in the figure are represented by four different symbols (+, x, -, ᴼ) and the clustering shows their genetic relationship.
Figure3. Principal co-ordinate graph showing the genetic relationship between four groups studied (Adapted from Kshatriya et al., 2011).
9. Calculation of ge netic distance
Many phylogenetic software packages are available for convenient calculation of genetic distance for different types of polymorphism data (eg. RFLP or DNA sequences) and a list is given in Table1. But all these programs follow a common sequence of steps and use similar models. The genetic distance can be calculated by using data of one of the following categories:
a. Protein coding (exonic) and non-coding (intronic, intergenic, UTRs) nucleotide sequences
b. Amino acid sequences
c. Allele frequency
Table1. List of software used to calculate different genetic distance estimates
10. Example of genetic distance calculation (using MEGA)
Molecular Evolutionary Genetic Analysis (MEGA) is a popular and one of the most user-friendly software available for evolutionary genetic analysis in general, and genetic distance estimation in particular. It is used for evaluating genetic distance between DNA or protein sequences. Given below is a stepwise protocol for calculating pairwise distance between DNA sequences. The presentation in Quadrant 2 describes the following steps with an example of the promoter (non-protein coding) region of X (unknown) gene in three individuals using MEGA 4 (which is freely downloadable at www.megasoftware.net).
Step1. Click the above link to go to the MEGA homepage for downloading the software. Open the file with .exe extension to install the software on your computer.
Step2. Open the installed program. Click File and select a file with aligned sequences with a .fas extension (for FASTA sequences).
Example input file (copy and paste the following aligned sequences in a notepad and save it as formega.fas):
>Ind A
GCCTTTCATGTGAATGCTCCAGTGGAGTGGTCAGGTTTTTTACATAGTAGCTCAAGGCTTA AGAGCAAGTGTTCAGAAGGAGCAGAGAGAGAGGGCAGTAGTTACAATGTGAGGCCAAAG AAGCTTCCCCCCAGAAAACTAAAGGTGATAAGTAAAGCATGTTGGTATTGGCTGGCAATA TTCCACAAGAGATGAAAGGACAGATATTGCAGAAGAGAGAAGGTATAACTGGGACCAAA AGCCTTGAGAAGGAAAGAGACATGGAGCAAATCATTCACAGTAACAGCAGACAGCAGAG AAGAGACACATGGTTGTACAGAGGCACCTCCTTTGGGTCTTTACTCAAATGCCCCATTATC AGTGAGAACTTCTCTGACTGCTGTTCTTCAGCAGAGGGTATTCCTTATCCCCTTTCTTGCTT TATGTGTTTTCTCCATAACATATGTGCATATCCATAACACACACATGCATCACCTAGAGCA TTATATATGCCACAGTGACATGTTTTGCTGATTTCTCAATTGACTCCCCCCATTGGAATGA ACGTAAGCTTGAGGAAGACGTTTTGTCCTGTT-CTGTAGCATCTAGAACAGCGCCTGGCACATAGTAGGTACTCAATAAATGCCAGCTGCATGAGGAAATGAATGAGC TGTGTGGGGGATGTAC TTGAGTGAACTC TAAAGTCAGAGTGGTG T TGAGAGAAAAATGCTTGAAATCCAGATGTTGGAAGGTGACAC AGAGTAGTAGCCTGGTG AGAACAGTTAGATC TTAGGGGTTCCTAC TAC AGCCCTCCCTTCCGCACCTTTTTGGC TGTC ACCATGATCAAGCTACTGAATC TCTC TGAGACGC AAGGACCGGGATGGCAC AAAGTGAGT GCTCACCAAAGCTTGAC TGTCCTTTCCCATGGCAATTTACTTCAGCTTGTTTGATTTCCCCT CCCCGACTGGACTAGGCACCTATTCTCTGTCTTC TCTCTTTACAGTTGGAAGGAGC AAAAT GGGAC TTTTGGC TGAAAGTGCTGAGCTCCTGCGGTGGGGGC TGACCGCAAGCCGCGCC TT CTGTGC ACCTGGTCGGCCCAGCTA
>Ind B
GCCTTTCATGTGAATGCTCCAGTGGAGTGGTCAGGTTTTTTACATAGTAGCTCAAGGCTTA AGAGCAAGTGTTCAGAAGGAGCAGAGAGAGAGGGCAGTAGTTACAATGTGAGGCC AAAG AAGCTTCCCCCCAGAAAACTAAAGGTGATAAGTAAAGCATGTTGGTATTGGCTGGCAATA TTCCACAAGAGATGAAAGGACAGATATTGCAGAAGAGAGAAGGTATAACTGGGACCAAA AGCCTTGAGAAGGAAAGAGACATGGAGCAAATCATTCACAGTAACAGCAGACAGCAGAG AAGAGAGACATGGTTGTACAGAGGCACCTCCTTTGGGTCTTTACTCAAATGCCCCATTATC AGTGAGAACTTCTCTGACTGCTGTTCTTCAGCAGAGGGTATTCCTTATCCCCTTTCTTGCTT TATGTGTTTTCTCCATAACATATGTGCATATCCATAACACACACATGCATCACCTAGAGCA TTATATATGCCACAGTGACATGTTTTGCTGATTTCTCAATTGACTCCCCCCATTGGAATGA ACGTAAGCTTGAGGAAGACGTTTTGTCCTGTT-CTGTAGCATCTAGAACAGCGCCTGGCACATAGTAGGTACTCAATAAATGCCA GCTGCATG AGGAAATGAATGAGCTGTGTGGGGGATGTACTTGAGTGAACTCTAAAGTCAGAGTGGTGT TGAGAGAAAAATGCTTGAAATCCAGATGTTGGAAGGTGACACAGAGTAGTAGCCTGGTG AGAACAGTTAGATCTTAGGGGTTCCTACTACAGCCCTCCCTTCCGCACCTTTTTGGCTGTC ACCATGATCAAGCTACTGAATCTCTCTGAGACGCAAGGACCGGGATGGCACAAAGTGAGT GCTCACCAAAGCTTGACTGTCCTTTCCCATGGCAATTTACTTCAGCTTGTTTGATTTCCCCT CCCCGACTGGACTAGGCACCTATTCTCTGTCTTCTCTCTTTACAGTTGGAAGGAGCAAAAT
GGGAC TTTTGGC TGAAAGTGCTGAGCTCCTGCGGTGGGGGC TGACCGCAAGCCGCGCC TT CTGTGC ACCTGGTCGGCCCAGCTA
>Ind C
AGCTTTCATGTGAATGCTCCAGTGGAGTGGTCAGGTTTTTTACATAGTA GCTCAAGGCTTA
AGAGCAAGTGTTCAGAAGGAGCAGAGAGAGAGGGCAGTAGTTACAATGTGAGGCCAAAG
AAGCTTCCCCCCAGAAAACTAAAGGTGATAAGTAAAGCATGTTGGTATTGGCTGGCAATA
TTCCACAAGAGATGAAAGGACAGATATTGCAAAAGAGAGAAGGTATAACTGGGACCAAA
AGCCTTGAGAAGGAAAGAGACATGGAGCAAATCATTCACAGTAACAGCAGACAGCAGAG
AAGAGACACATGGTTGTACAGAGGCACCTCCTTTGGGTCTTTACTCAAATGCCCCATTATC
AGTGAGAACTTCTCTGACTGCTGTTCTTCAGCAGAGGGTATTCCTTATCCCCTTTCTTGCTT
TATGTGTTTTCTCCATAACATATGTGCATATCCATAACACACACATGCATCACCTAGAGCA
TTATATATGCCACAGTGACATGTTTTGCTGATTTCTCAATTGACTCCCCCCATTGGAA——–
-GCTTGAGGAAGACGTTTTGTCCTGTT-
CTGTAGCATCTAGAACAGCGCCTGGCACATAGTAGGTACTCAATAAATGCCAGCTGCATG
AGGAAATGAATGAGCTGTGTGGGGGATGTACTTGAGTGAACTCTAAAGTCAGAGTGGTGT
TGAGAGAAAAATGCTTGAAATCCAGATGTTGGAAGGTGACACAGAGTAGTAGCCTGGTG
AGAACAGTTAGATCTTAGGGGTTCCTACTACAGCCCTCCCTTCCGCACCTTTTTGGCTGTC
ACCATGATCAAGCTACTGAATCTCTCTGAGACGCAAGGACCGGGATGGCACAAAGTGAGT
GCTCACCAAAGCTTGACTGTCCTTTCCCATGGCAATTTACTTCAGCTTGTTTGATTTCCCCT
CCCCGACTGGACTAGGCACCTATTCTCTGTCTTCTCTCTTTACAGTTGGAAGGAGCAAAAT
GGGACTTTTGGCTGAAAGTGCTGAGCTCCTGCGGTGGGGGCTGACCGCAAGCCGCGCCTT
CTGTGCACCTGGTCGGCCCAGCTA
Step3. Go down the editor window to ensure the absence of any other characters except the nucleotides A, T, G, C. In the above input file ‘-’ indicates gaps in alignment which can be left as it is.
Step5. Return to the MEGA window in Step2. Click “Distances” > “Compute Pairwise Distances”
Step6. A window titled “Analysis Preferences” appears. Select Model>p-distance>Compute. The pairwise distances between all the possible pairs of sequences are obtained as a distance matrix.
Step7. To obtain a phylogenetic tree to visualize the clustering of sequences click “Phylogeny” in theMEGA main window. Choose “Construct/Test UPGMA tree”. Click “Compute” in the “Analysis Preferences” window to obtain a UPGMA tree.
Step8. To import the result windows go to the respective windows, and choose among the available options to save results in .xl, .csv, .meg or .txt formats.
SUMMARY
- Genetic distance is used as an estimate to quantify differences both between individuals and groups of populations to adjudge the degree of their relatedness.
- A genetic distance estimate quantifies the genetic divergence between two molecular sequences (DNA or protein), two individuals (to study microevolution) or taxa (while studying macroevolution).
- They can be used for classifying populations or in making evolutionary comparisons.
- Three of the measures have been very widely used, Nei’s genetic distance, D, Cavalli-Sforza chord measure and Reynolds, Weir and Cockerham’s genetic distance.
- Plots like multidimensional scaling, principal co-ordinate and principal component plots are used to visualize clustering in populations.
- Different software programs are available for estimating genetic distance and drawing phylogenetic trees.
you can view video on Genetic Distance |
REFERENCES
- Cavalli Sforza and Edwards. 1967. Phylogenetic analysis: Models and estimation procedures. American Journal of Human Genetics 19(3): 233-257.
- Excoffier L, Laval G, Schneider S. 2005. Arlequin: An integrated software forpopulation genetics data analysis. EvolBioinform Online 1:47-50.
- Felsenstein J, 2001. PHYLIP (phylogeny inference package), version 3.6 for Linux. Seattle: University of Washington.
- Kshatriya GK, Aggarwal A, Khurana P, Italia YM. 2011. Genomic congruence of Indo – European speaking tribes of Western India with Dravidian-speaking populations ofsouthern India: A study of 20 autosomal DNA markers. Ann Hum Biol 38: 583-591.
- Nei M. 1972. Genetic distances between populations. Am Nat 106: 283-292.
- Nei M. 1987. Molecular evolutionary genetics. New York: Columbia University press.
- Ota T, 1993. Program DISPAN: genetic distance and phylogenetic analysis. University Park: Pennsylvania State University.
- Sanghvi LD. 1953. Comparison of genetical and morphological methods for a studyof biological differences. Am J PhysAnthropol 11:385-404.
- Takezaki N, Nei M, 1996. Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics. 144:389-399.
- Tamura K, Dudley J, Nei M, Kumar S. 2011. MEGA 4: Molecular EvolutionaryGenetics Analysis (MEGA) software version 4.0. MolBiolEvol 24: 1596-1599.
- Weir and Cockerham. 1984. Estimating F- statistics for the analysis of population structure. Evolution 38(6): 1358-1370.
- Yeh et al. 1997. POPGENE, the user- friendly shareware for population genetic analysis. Molecular Biology and Biotechnology Centre, University of Alberta, Canada