Proposal for an index to evaluate dichotomous keys

Abstract Dichotomous keys are the most popular type of identification keys. Studies have been conducted to evaluate dichotomous keys in many aspects. In this paper we propose an index for quantitative evaluation of dichotomous keys (Edicho). The index is based on the evenness and allows comparing identification keys of different sizes.


Introduction
A taxonomic key is a method used to identify organisms. Dichotomous keys are the most popular type of identification keys. Dichotomous keys are single entry identification keys. They consist of nested questions or couplets, and each question provides two choices or leads (Thesis and Antithesis). These choices contain descriptions of key characteristics of an organism. The paired statements or choices consider the differenc-es between items. After choosing the statement that best matches the object, the user proceeds to another pair of statements until the name of the taxon is identified. There may be several keys for a group of taxa. This prompts the question, which key has a better performance, provided that all the used characters are good ones which allow an unambiguous identification? How can we evaluate quantitatively the performance of the keys? As a key is intended for identification of each of the taxa in the group, the key will achieve the highest performance when the mean number of steps to their identification is minimal. If the number of steps to identification of the taxa in a key become more even, the mean number of steps to their identification is decreasing, and the mean number of steps to identification of the taxa is minimal when the number of steps to their identification are most even (Fig. 1). These considerations lead us to the evenness index of Pielou (1966). This paper proposes an index that is based on Pielou's evenness index for quantitative evaluation of dichotomous keys.

Methods
We use Pielou's evenness index as a prototype for our index. Pielou's evenness index (J) can be calculated using the following formula (Help et al. 1998): where: -H' is the Shannon diversity index. This measure was originally proposed by Shannon (1948) to quantify the entropy (uncertainty or information content) in strings of text. The idea is that the more different the letters are, and the more equal their proportional abundances in the string of interest, the more difficult it is to correctly predict which letter will be the next one in the string. The index can be calculated using the following formula: In which p i is the proportion of characters belonging to the ith type of letter in the string of interest and S the number of types of letter.
-H max is the maximum value of H' and equal to: As result, Pielou's evenness index can be calculated according to the following formula:

Results and discussion
If the number of steps we have to pass to come to a decision (a taxon) is N i and the total steps when we identify all the taxa is N, the proportion of the steps to identify the ith taxon is equal: As can be inferred from the scheme of a dichotomous key (Fig. 2), the number of taxa in a dichotomous key corresponds to S − the number of types of letters in the formula of Pielou's evenness index.
We call the index for dichotomous keys E Dicho (because of its origin from evenness index). As a result, E Dicho is equal: Where: S is the number of taxa of the key, and p i is the proportion of steps to identify the ith taxon.
Many attempts have been undertaken in order to evaluate identification keys (e.g. Lobanov 1975, 1983, 2015, Pankhurst 1978, Leuschner and Sviridov 1986, Leuschner 1991. Generally, these are methods that are based on the same concept of average length of taxon definition in a key and comparison of this number with the theoretical minimum. However, these attempts do not consider the length evenness of taxon definitions. Several studies have been conducted to evaluate dichotomous keys in practice of key use (Morse et al. 1996) or to improve the key based on the user-tracking method (Schmidt et al. 2010). According to Osborne (1963), in principle, a simple dichotomous key used by an accurate observer must always lead to correct identification provided that the specimen in hand does actually belong to one of the taxa covered by the key and is not missing any crucial characters. Sandvik (1976) came to the conclusion that keys in which all taxa are gathered on the last two levels (so the number of steps of their identification is relatively equal) have the maximum probability of right determination. So our proposed index (E Dicho ) can both evaluate the speed and the quality of the determination of a dichotomous key, provided that all else (e.g. choice of characters) being equal.
The E Dicho index in its nature is an evenness index, therefore it has all the properties of a normal evenness index and is constrained between 0 and 1. The higher the variation in the number of steps we have to pass to come to the determination of the taxa, the lower is the E Dicho index, and the asymptotic lowest value is 0. The highest value of 1 can be achieved in case of all the taxa having the same number of identification steps ( Fig. 1.V). As we can see in the Figure 1, the two versions of the dichotomous key (1.I Figure 2. Schematic presentation of a dichotomous key. and 1.V) have the same number of taxa (8) and the same number of paired statements (7), but E Dicho of the version '1.I' is smaller than that of the version '1.V', because the variation in the length of path of identification steps in the version '1.I' is higher. Thus, the higher the E Dicho index is, the "better" is the dichotomous key in the aspect of identification speed and in the aspect of right determination.

An example of calculation of E Dicho -the index for dichotomous keys
Let us consider five dichotomous keys as shown in the Figure 1.
Here, the number of taxa (S) equals 8. The number of steps or paired statements (Thesis + Antithesis) for identification of each taxon, the total number of steps for identification of all the taxa, and the proportion of steps to identify each taxon are the data for calculation of H' Dicho of the dichotomous key and are presented in Table 1 for the five versions of the dichotomous key.
The calculation of H' Dicho and E dicho of five versions of the dichotomous key is presented in Table 2.

Conclusions
By using computer software it is possible to create many dichotomous keys for a group of taxa with the same set of pairs of dichotomous characters. It would be desirable to have a sound basis for choosing one or another key version. The E Dicho index developed here is suitable for a quantitative evaluation of dichotomous keys. It can serve well as the mathematical basis for the task of choosing the dichotomous key with the best performance. Because the index is based on the evenness, it can be used to compare the identification keys of different sizes.