6 Biodiversity Characterization and Inventorying: Taxonomic Approach

Dr Sunil Mittal

 

1. Learning outcomes

  • To learn about the overview and importance of biodiversity documentation
  • To learn about various tools for species inventorying
  • To learn basics of species characterization
  • To learn how DNA Taxonomy helps in biodiversity characterization

2. Concept map

3. Description

3.1. Introduction

Species is the basic and finest unit for both taxonomy and biodiversity. Characterization of biodiversity de facto means characterization of the species composition, its alpha taxonomy. The discipline of taxonomy encompass characterization, classification and naming of taxa, especially species. At present, world’s biodiversity encompass approximately 2 million named species, but that is only a tip of giant iceberg; it is now considered that more than 90% of species on this planet remain to be discovered and described at length. At present, there is a severe shortage of taxonomists throughout the world, along with declining funds for taxonomic research, a phenomenon known among taxonomists as ‘taxonomic impediment’. Taxonomy had traditionally been regarded as a dry subject. As more and more scientists started believing that the scientific research should be utility-driven (asking for the usefulness of a particular research for the betterment of humanity, and commercial values), the discipline of taxonomy lost its glory in contrast with ‘high impact’ research fields like molecular biology. The implications were profound. The field in which this negative trend had affected the most was arguably biodiversity characterization. There exist an imminent risk that most of the undescribed biodiversity might go extinct even before we get a chance to describe it. In addition, there are several biases at play with respect to the biodiversity characterization. Most of the taxonomists are based in richer countries of temperate northern hemisphere- comparatively less speciose biogeographical zone. Most of the biodiversity on planet earth is at the tropics, however, tropical countries neither have resources to support extensive taxonomic surveys, nor trained taxonomists. Because of this, most of the described taxa in the world are from that of the rich countries, and is obviously a skewed representation of the real biodiversity. Yet another bias is that most of the described biodiversity is that of the terrestrial biomes; albeit rich in biodiversity and more than 70% of earth’s surface being ocean, only a tiny fraction of the marine biodiversity had been described till date. Yet another disturbing pattern in the described biodiversity is that while certain taxa like angiosperms are very well represented, taxa like prasinophytes or terrestrial algae are extremely underrepresented. This is because there are several experts in the field of angiosperm taxonomy, while very few taxonomists working on prasinophytes or terrestrial algae, although the biodiversity of all these three lineages are expected to be similar. These underrepresented lineages and underrepresented habitats of the world need to be prioritized for biodiversity characterization to be effective.

The terms biodiversity characterization and inventorying are used interchangeably to refer documentation of biological diversity at the species level, however, there is a subtle difference between the two. Characterization refers documentation of the taxonomic information, for example, species discovery (report of the new species, along with its naming, nomenclature), discovery of cryptic species (discovery that one species is a complex of two species) or taxonomic synonymy (that two taxa, for example, two species are actually one), and so on. Biodiversity characterization essentially is part of the discipline taxonomy. On the other hand, biodiversity inventorying refers to documenting overall species diversity of a particular area; for example one-meter squire quadrat at a tropical rain forest, or at a seaweed bed. Inventorying generates a list of known species present at a location, no new taxonomically relevant information such as species discovery, discovery of cryptic species or species synonymy is generated. However, inventorying can generate new records (identification of known species at a new location where it had never been reported). For example, documentation of an invasive species at a new location where it had never been reported previously. Biodiversity inventories generate the so-called ‘check-lists’, a simple list of all species at a given location. While biodiversity characterization is far more complicated in terms of taxonomy, generation of checklists requires more rigorous fieldwork, and both are equally valuable data regarding biodiversity documentation is concerned.

3.2. Species inventorying

The cornerstone of species inventorying is the generation of species checklists of a particular location. Perhaps the most valuable among different forms of checklists is All Taxa Biodiversity Inventory (ATBI), documentation of each species at a given location including animals, plants, eukaryotes microbes, prokaryotes, viruses, prions etc. Generation of ATBI is therefore a daunting task, and always incomplete; this is because bacterial, viral and prion diversity at a given area as little as a centimetre square would be in billions. Most of the microorganisms cannot be cultured in artificial media, and therefore to document its hidden diversity, newer culture independent methods like environmental DNA metagenomics need to be incorporated. Far simpler approach for species inventorying is to target a specific taxonomic hierarchy, for example ‘checklist of mosses in India’, the typical taxonomic checklists Botanical Survey of India (BSI) and Zoological Survey of India (ZSI) generates. As discussed previously, many potential biases interplay in the generation of these checklists and therefore these are not as valuable as ATBI. Also, note that none of the biological species exists on itself in nature; species occupy its own position in a complex network of other species in an intricate ecological niche, and therefore the complete picture of overall diversity is far more informative, richer and valuable. For example, it is now universally known that normal microflora (bacteria and viruses inhabiting on the body normally) contributes in a number of physiological traits, for example, prevention of diabetes and heart diseases in humans, life history switch and morphology in algae, disease resistance and growth promotion in plants and so on.

To generate any form of species inventories, the basic requirement is accurate species identification literature/field guide. The most popular taxonomic field guides are the so-called ‘dichotomies Identification keys’, wherein contrasting and easily identifiable traits guide the field researcher from higher taxonomic levels (class, order) to lower taxonomic levels (family, genus) and ultimately to the species level identity of a given specimen. These keys are dichotomous, means only two contrasting alternatives of traits are presented (for example, “leaf has parallel venation” vs “leaf has reticulate venation”). To use such an ID key correctly, the field taxonomist should know all the taxonomic technical terms (“jargon”) which differ substantially across the disciplines. For example, in our earlier example, the identifying person should be competent enough to distinguish various leaf venation systems, and for developing such a technical competency, many years of experience is usually required. Obviously due to the complexity, taxonomic specialists tend to focus on a specific taxonomic lineage, for example marine green algal genus Monostroma or Ulva. Unfortunately, such species identification guides are mostly non-existent in the most speciose habitats of earth including tropics and aquatic hotspots. This is the main reason for poorer biodiversity documentation in these areas.

Newer developments in the species identification arena includes algorithmic approach from field photographs, and DNA barcoding. These tools require far lesser expertise, but these have its own shortcomings. Species identification from photographs mostly work in the similar fashion how ‘search by image’ tool of Google works; it compares the photograph with the database of correctly identified photographs pixel by pixel for the closest match. One such a popular application is “Frog Finder” developed by taxonomists from India; it enables anyone with an android-based smartphone to identify the described frog species by clicking a photo within that app. However, photographs have so many variables affecting the database match decision; for example brightness, colour temperature, contrast and so on. It is almost impossible to normalize these field photos for an accurate comparison with that of database. Another problem is morphovariants; morphological variability of the species that depend upon biotic (for example, bacterial flora) or abiotic (for example, seawater salinity) factors. This is an issue with any form of morphology based species identification method. Consider Ulva paschima, a recently discovered species from Indian coast. Intraspecific varieties that grown at low saline estuaries have branched pattern of its thalli (the algal body), while those varieties that grows on the shore have unbranched thalli, and therefore a dichotomous key that call for contrast between branched and unbranched would lead to incorrect identification. Perhaps the only way to alleviate these problems of traditional taxonomic identification is through DNA barcoding; amplification of certain genomic regions (genes, introns, etc.) and comparing the DNA sequence with that of the standard database (like Genbank for DNA). This DNA barcoding approach is the current gold standard for the species identification, however it has several shortcomings as well. The biggest issue is the cost of DNA sequencing. Although newer methods of next generation sequencing approaches drastically reduces the expenses involved, the cost is still substantial for taxonomists from developing countries. Another huddle is that DNA barcoding is impractical for field researchers; it is not possible to sequence DNA molecules from the field, at least as of today. Newer sequencing methods like that of NanoPore systems aim to address this issue. The Oxford Nanopore Sequencer is a very small gadget, like a typical smartphone, and enables rapid DNA sequencing at the field. The technology is rapidly evolving as of this writing, and more developments are expected. Yet another issue with the DNA Barcoding is that as it compares the accession in standard repository, the accuracy of this method depends upon the accuracy of matched accession in the repositories. A number of accessions in Genbank database are incorrectly identified and therefore unreliable sources. There is no way to find out which sequences are incorrectly identified among the entries of Genbank, and therefore either manual or automatic curation is impossible. Perhaps a workaround is to rely only on the published sequences (ie, those accessions generated as part of a published literature), and rely on multiple database accessions rather than one. Barcode of Life Database (BOLD) is yet another public sequence repository popular among taxonomists. The current standard for species identification relies on DNA barcoding and is often combined with at least one of the traditional methods (for example, morphology, life history, or ecology).

Specimen vouchers are an indispensable part of any taxonomic study and wider biodiversity documentation. Vouchers are the permanent records of collected specimen accessible in any public repositories; for example museums, herbaria, free storage and so on. In Plant Taxonomy, vouchers are mostly pressed herbarium sheets of plants showing pertinent morphological features. In earlier days the preferred method for voucher preparation was preservation in formalin, as it prevents microbial degradation and thereby preserves the morphological features. However, formalin preservation invariably degrades the DNA for the subsequent extraction. Many recent DNA barcode based studies could able to extract and sequence DNA barcodes from more than 3 centuries-old holotypes (the voucher specimen that is directly linked with a taxonomic discovery and its new name) from herbariums. However, formalin preservation is still popular among zoologists, who, in addition, also preserve tissue materials in liquid nitrogen for long term storage and subsequent DNA extraction in case deemed necessary in future. Extracted DNA can also be stored for a long time in liquid nitrogen, in the so-called DNA Banks.

Species distribution maps are essential components of species inventories. The distribution maps overlay species ranges in a geographical map. Print distribution maps, popular earlier, have largely gone out of favour and currently internet based high resolution map overlays are universally preferred. These overlays are downloadable files (mostly .kml extension compatible with Google Earth), which can be added into Google Earth as a layer to see the distribution map behind the chosen representation of the base geographical map. The map overlays contains known places of species distribution, with accurate coordinates (latitude and longitude data from a GPS devise). These map overlays are independently published in the form of floristic and faunal maps of particular regions. Many such maps are downloadable from Global Biodiversity Information Facility (GBIF, www.gbif.org). These maps are invaluable tools for deciding sampling expeditions for field taxonomists and population geneticists, as well as to find whether a taxonomic record is new to a particular area.

 

3.3. Species characterization

Species characterization encompasses the discovery of new species and other major taxonomic revisions. Therefore, species characterization occupies central part of any biodiversity programmes. The current consensus among taxonomists is that the described biodiversity comprise only 10% or fewer of the total biodiversity; vast majority of the global biodiversity remains to be discovered. This is especially the case with underrepresented taxonomic lineages, underrepresented habitats, and unculturable microorganisms, where almost the entire biodiversity remain undescribed. To discover a new species and to name it, taxonomists need to be a specialists competent enough to comply various codes and articles of nomenclatural systems. These codes differs substantially across disciplines; three most widely used codes are International Code of Nomenclature for Plants, Algae and Fungi (ICN), International Commission on Zoological Nomenclature (ICZN) and International Code of Nomenclature of Prokaryotes (ICNP). For viruses two systems are used widely, these are the International Committee on Taxonomy of Viruses (ICTV) system and Baltimore classification system. These codes contain articles detailing how a new species is described, and other rules for various taxonomic revisions (for example, one earlier species splitting into two, two known species combining into one and so on). In a way these codes resemble the discipline of law and a command over all the articles with several case studies are essential for taxonomic competency essential for any taxonomic revisions. Earlier, ICBN (International Code for Botanical Nomenclature) insisted that new species descriptions invariably have to be in Latin for preserving the uniformity; however, as per the latest code ICN, Latin descriptions become optional (descriptions have to be either in English or in Latin). An article uniform across these codes is what construe ‘effective publication’, a publication that can be acceptable for taxonomic revisions and new species descriptions. Generally, a publication available in public repositories (for example, a downloadable ebook in pdf, a published book, an article published in a journal with ICSN number etc.) are all effective publications, while a university dissertation or thesis are not. These codes are very exhaustive and consider all possible controversies. For example, principle of taxonomic priority: “The first formal scientific name given to a plant or animal taxon shall be the name that is to be used, called the valid name in zoology and correct name in Botany. Once a name has been used, no subsequent publication of that name for another taxon shall be valid (zoology) or validly published (botany).”

In order to describe a new species, taxonomist should compare the specimen with species descriptions of all other species of the same genus to make sure that the species in question has not yet been described. However, no consensus exists on which morphological features need to be prioritized, or whether a single unique feature could be sufficient to call it a unique species. For example, two life history forms (sexual vs asexual), or two morphological forms (branched or unbranched) among marine algae. As traditional species descriptions are almost entirely based on morphology, intraspecific variants and ecotypes are often described as different species in this system. The only solution for these problems is DNA taxonomy, as described below.

 

3.4. DNA taxonomy

The term DNA taxonomy refers usage of DNA sequence data for the taxonomic characterization, especially for species discovery and delineation. This term is also used as a synonym for DNA barcoding in many disciplines, however in sensu stricto, DNA barcoding refers identification of previously described species using public databases, while DNA taxonomy refers delineation and description of new species. Both these methods make use of identical sequence barcodes, for example, gene coding for Rubisco Large Subunit in plants (rbcL gene) or gene coding for Cytochrome C oxidase-1 (CO1 gene) in animals. These methods first generate the barcode data and perform database search as in DNA Barcoding to see whether the newly generated sequences matches with any existing sequences in the database (in that case, species discoveries would be invalid). If no statistically significant match is found, two possible reasons could be either the species in question is previously described, yet no one has generated the DNA barcode data for this species at the selected locus (gene or intron), or the species in question is truly new, previously undescribed. To rule out the first possibility, the taxonomist first list out all the species of the genus in question, and generate the missing DNA barcodes (the ‘gaps’) nonexistent in the database. If no significant match is found even after this, then it is cogent to conclude that the species in question is truly novel. However there is no consensus exist among the taxonomists on what construes statistically significant match. A pairwise similarity score of less than 98.7% is generally accepted for eukaryotes (that is if the nearest match of generated DNA sequence has less than or equal to 98.7%, then this might be a new species). However, this cut off is lower for prokaryotes, typically 95%. In another words, as per bacterial taxonomy, the entire primates which typically have higher than 98% sequence homology in commonly used sequence barcodes, should be called one species! Therefore, bacterial species tend to be more rigorous with lot of intraspecific variants that could be termed individual species as per eukaryotic taxonomy. In the case of fruit fly Drosophila, a single mutation in the genome was enough to eruct a new species.

DNA taxonomy relies on phylogenetic inference, wherein a phygenetic tree (also called phylogram) is generated using a number of methods. Popular methods include Maximum Likelihood and Bayesian Inference. The tree is constructed first by constructing a multiple sequence alignment. To make an alignment, sequence data of suspected new species isolates (from multiple locations) is compared with genbank using BLAST-N and several of the top hits are downloaded. All these sequences would be subjected to a multiple sequence alignment algorithm followed by a phylogenetic inference programme, detailed descriptions of the same goes far beyond the scope of current module. Finally, in the generated tree, a distinct clade (a cluster) encompassing all the generated accessions of species in question bolsters our hypothesis of ‘monophyly’ and a new species as per ‘phylogenetic species concept’. In strict sense, phylogenetic species concept defines OTUs (Operational Taxonomic Units) encompassing all descendants of a common ancestor (no descendants as part of other clades, and no non-descendants as par of the clade in question, the so-called ‘reciprocal monophyly’). These OTUs might represent a species, or an intraspecific variant, and is so defined to accommodate ambiguities with the term species.

4. Summary

4.1. It is now estimated that more than 90% of the species on planet earth remain to be discovered, and therefore taxonomy based biodiversity characterization is extremely important

4.2. Amongst the known biodiversity for inventorying, there are several biases at play; habitat bias where tropics and aquatic habitats, despite very high level of biodiversity, remain comparatively less described, and lineage bias where some species rich lineages like prasinophytes remain poorly described.

4.3. For species inventorying, checklists, All Taxa Biodiversity Information, distribution maps, dichotomous species ID keys and species identification field guides are important.

4.4. Newer and faster methods for species identification involves algorithm-based reverse image lookup in the curated database, as well as DNA barcoding

4.5. For species characterization, a specialist taxonomist need to extensively follow and adhere with standard codes of nomenclature

4.6. For species discovery and description, a biphasic approach using one conventional method (often morphology) combined with DNA taxonomy is the current gold standard

4.7. In the final phylogram, reciprocal monophyly of the generated accessions of species in question is a strong evidence for the OTUs to be erected as a new species

 

Further e-resources and learn more

  1. YouTube videos: https://www.youtube.com/watch?v=S3HXj-lDmR0 https://www.youtube.com/watch?v=JghEcX4sc8k https://www.youtube.com/watch?v=IRW6yVOHCQc https://www.youtube.com/watch?v=OH8TBkExqHs
  2. Tree of Life portal www.tolweb.org/
  3. GBIF portal https://www.gbif.org
  4. Hammond, P. (1992). Species inventory. In Global biodiversity(pp. 17-39). Springer, Dordrecht.
  5. Nichols, B. J., & Langdon, K. R. (2007). The Smokies all taxa biodiversity inventory: history and progress. Southeastern Naturalist, 6(sp2), 27-34.
  6. Tautz, D., Arctander, P., Minelli, A., Thomas, R. H., & Vogler, A. P. (2003). A plea for DNA taxonomy. Trends in Ecology & Evolution, 18(2), 70-74.
  7. Blaxter, M. L. (2004). The promise of a DNA taxonomy. Philosophical Transactions of the Royal Society B: Biological Sciences, 359(1444), 669-679.
  8. Bast, F. (2015). Tutorial on phylogenetic inference—2. Resonance, 20(5), 445-457.