Research Article
Print
Research Article
Proposal for an index to evaluate dichotomous keys
expand article infoNguyen Van Sinh§, Martin Wiemers|, Josef Settele|
‡ Institute of Ecology and Biological Resources, VAST, Ha Noi, Vietnam
§ Graduate School of Science and Technology, VAST, Ha Noi, Vietnam
| Helmholtz Centre for Environmental Research – UFZ, Halle, Germany
¶ University of the Philippines Los Baños, Los Baños, Philippines
Open Access

Abstract

Dichotomous keys are the most popular type of identification keys. Studies have been conducted to evaluate dichotomous keys in many aspects. In this paper we propose an index for quantitative evaluation of dichotomous keys (Edicho). The index is based on the evenness and allows comparing identification keys of different sizes.

Keywords

index, dichotomous key, evaluation

Introduction

A taxonomic key is a method used to identify organisms. Dichotomous keys are the most popular type of identification keys. Dichotomous keys are single entry identification keys. They consist of nested questions or couplets, and each question provides two choices or leads (Thesis and Antithesis). These choices contain descriptions of key characteristics of an organism. The paired statements or choices consider the differences between items. After choosing the statement that best matches the object, the user proceeds to another pair of statements until the name of the taxon is identified. There may be several keys for a group of taxa. This prompts the question, which key has a better performance, provided that all the used characters are good ones which allow an unambiguous identification? How can we evaluate quantitatively the performance of the keys? As a key is intended for identification of each of the taxa in the group, the key will achieve the highest performance when the mean number of steps to their identification is minimal. If the number of steps to identification of the taxa in a key become more even, the mean number of steps to their identification is decreasing, and the mean number of steps to identification of the taxa is minimal when the number of steps to their identification are most even (Fig. 1). These considerations lead us to the evenness index of Pielou (1966). This paper proposes an index that is based on Pielou’s evenness index for quantitative evaluation of dichotomous keys.

Figure 1. 

Schematic presentation of 5 dichotomous keys for a group of 8 taxa.

Methods

We use Pielou’s evenness index as a prototype for our index. Pielou’s evenness index (J) can be calculated using the following formula (Help et al. 1998):

where:

- H’ is the Shannon diversity index. This measure was originally proposed by Shannon (1948) to quantify the entropy (uncertainty or information content) in strings of text. The idea is that the more different the letters are, and the more equal their proportional abundances in the string of interest, the more difficult it is to correctly predict which letter will be the next one in the string. The index can be calculated using the following formula:

In which pi is the proportion of characters belonging to the ith type of letter in the string of interest and S the number of types of letter.

- Hmax is the maximum value of H’ and equal to:

As result, Pielou’s evenness index can be calculated according to the following formula:

Results and discussion

If the number of steps we have to pass to come to a decision (a taxon) is Ni and the total steps when we identify all the taxa is N, the proportion of the steps to identify the ith taxon is equal:

pi = Ni/N

As can be inferred from the scheme of a dichotomous key (Fig. 2), the number of taxa in a dichotomous key corresponds to S − the number of types of letters in the formula of Pielou’s evenness index.

We call the index for dichotomous keys EDicho (because of its origin from evenness index). As a result, EDicho is equal:

Where: S is the number of taxa of the key, and pi is the proportion of steps to identify the ith taxon.

Many attempts have been undertaken in order to evaluate identification keys (e.g. Lobanov 1975, 1983, 2015, Pankhurst 1978, Leuschner and Sviridov 1986, Leuschner 1991). Generally, these are methods that are based on the same concept of average length of taxon definition in a key and comparison of this number with the theoretical minimum. However, these attempts do not consider the length evenness of taxon definitions.

Several studies have been conducted to evaluate dichotomous keys in practice of key use (Morse et al. 1996) or to improve the key based on the user-tracking method (Schmidt et al. 2010). According to Osborne (1963), in principle, a simple dichotomous key used by an accurate observer must always lead to correct identification provided that the specimen in hand does actually belong to one of the taxa covered by the key and is not missing any crucial characters. Sandvik (1976) came to the conclusion that keys in which all taxa are gathered on the last two levels (so the number of steps of their identification is relatively equal) have the maximum probability of right determination. So our proposed index (EDicho) can both evaluate the speed and the quality of the determination of a dichotomous key, provided that all else (e.g. choice of characters) being equal.

The EDicho index in its nature is an evenness index, therefore it has all the properties of a normal evenness index and is constrained between 0 and 1. The higher the variation in the number of steps we have to pass to come to the determination of the taxa, the lower is the EDicho index, and the asymptotic lowest value is 0. The highest value of 1 can be achieved in case of all the taxa having the same number of identification steps (Fig. 1.V). As we can see in the Figure 1, the two versions of the dichotomous key (1.I and 1.V) have the same number of taxa (8) and the same number of paired statements (7), but EDicho of the version ‘1.I’ is smaller than that of the version ‘1.V’, because the variation in the length of path of identification steps in the version ‘1.I’ is higher. Thus, the higher the EDicho index is, the “better” is the dichotomous key in the aspect of identification speed and in the aspect of right determination.

An example of calculation of EDicho - the index for dichotomous keys

Let us consider five dichotomous keys as shown in the Figure 1.

Here, the number of taxa (S) equals 8. The number of steps or paired statements (Thesis + Antithesis) for identification of each taxon, the total number of steps for identification of all the taxa, and the proportion of steps to identify each taxon are the data for calculation of H’Dicho of the dichotomous key and are presented in Table 1 for the five versions of the dichotomous key.

The calculation of H’Dicho and Edicho of five versions of the dichotomous key is presented in Table 2.

Figure 2. 

Schematic presentation of a dichotomous key.

Table 1.

The data for calculation of H’Dicho for the keys in Figure 1.

Key version The number of steps for identification of each taxon The total number of steps for identification of all the taxa The proportion of steps to identify each taxon
1.I 1,2,3,4,5,6,7,7 35 1/35,2/35,3/35,4/35,5/35,6/35,7/35,7/35
1.II 1,2,3,4,6,6,6,6 34 1/34,2/34,3/34,4/34,6/34,6/34,6/34,6/34
1.III 1,2,4,4,5,5,5,5 31 1/31,2/31,4/31,4/31,5/31,5/31,5/31,5/31
1.IV 2,2,3,3,4,4,4,4 26 2/26,2/26,3/26,3/26,4/26,4/26,4/26,4/26
1.V 3,3,3,3,3,3,3,3 24 3/24,3/24,3/24,3/24,3/24,3/24,3/24,3/24
Table 2.

Calculation of H’Dicho and Edicho.

Key version H’Dicho EDicho= H’Dicho/ln(8)
1.I -{(1/35).ln(1/35)+(2/35).ln(2/35)+(3/35).ln(3/35)+(4/35).ln(4/35)+
(5/35).ln(5/35)+(6/35).ln(6/35)+(7/35).ln(7/35)+(7/35).ln(7/35)}
0.937
1.II -{(1/34).ln(1/34)+(2/34).ln(2/34)+(3/34).ln(3/34)+(4/34).ln(4/34)+
(6/34).ln(6/34)+(6/34).ln(6/34)+(6/34).ln(6/34)+(6/34).ln(6/34)}
0.943
1.III -{(1/31).ln(1/31)+(2/31).ln(2/31)+(4/31).ln(4/31)+(4/31).ln(4/31)+
(5/31).ln(5/31)+(5/31).ln(5/31)+(5/31).ln(5/31)+(5/31).ln(5/31)}
0.959
1.IV -{(2/26).ln(2/26)+(2/26).ln(2/26)+(3/26).ln(3/26)+(3/26).ln(3/26)+ (4/26).ln(4/26)+(4/26).ln(4/26)+(4/26).ln(4/26)+(4/26).ln(4/26)} 0.983
1.V -{(3/24).ln(3/24)+(3/24).ln(3/24)+(3/24).ln(3/24)+(3/24).ln(3/24)+ (3/24).ln(3/24)+(3/24).ln(3/24)+(3/24).ln(3/24)+(3/24).ln(3/24)} 1.000

Conclusions

By using computer software it is possible to create many dichotomous keys for a group of taxa with the same set of pairs of dichotomous characters. It would be desirable to have a sound basis for choosing one or another key version. The EDicho index developed here is suitable for a quantitative evaluation of dichotomous keys. It can serve well as the mathematical basis for the task of choosing the dichotomous key with the best performance. Because the index is based on the evenness, it can be used to compare the identification keys of different sizes.

Acknowledgement

This work has been supported by the VAST04.06/16-17 project and the IEBR-UFZ joint research LEGATO project.

References

  • HelpCHR, HermanPMJ, SoetaertK (1998) Indices of diversity and evenness.Oceanis24(4): 61–87.
  • LobanovAL (1975) A mathematical apparatus for calculation, an assessment and comparison design data of identification keys.Zoologicheskiy Zhurnal54(4): 485–497. [In Russian]
  • LobanovAL (1983) The principles of creation of insects keys with use electronic computers. The abstract of the thesis on the scientist’s competition degrees of Doct. Biol. Sci. Leningrad: ZIN of Sci. Acad. USSR, 19 pp. [In Russian]
  • MorseDR, TardivalGM, SpicerJ (1996) A Comparison of the Effectiveness of a Dichotomous Key and a Multi-Access Key to Woodlice. Technical report. UKC, University of Kent, Canterbury, UK.
  • PankhurstRJ (1978) Biological Identification. The Principles and Practice of Identification Methods in Biology.Edward Arnold, London, 104 pp.
  • SchmidtG, GiurgiuM, HetznerS, NeumannF (2010) Improvement of identification keys by user-tracking. In: Nimis PL, Vignes Lebbe R (Eds) Tools for Identifying Biodiversity: Progress and Problems, 137–143.