Research Article 
Corresponding author: Nguyen Van Sinh ( vansinh.nguyen@iebr.ac.vn ) Academic editor: Pavel Stoev
© 2017 Nguyen Van Sinh, Martin Wiemers, Josef Settele.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Van Sinh N, Wiemers M, Settele J (2017) Proposal for an index to evaluate dichotomous keys. ZooKeys 685: 8389. https://doi.org/10.3897/zookeys.685.13625

Dichotomous keys are the most popular type of identification keys. Studies have been conducted to evaluate dichotomous keys in many aspects. In this paper we propose an index for quantitative evaluation of dichotomous keys (E_{dicho}). The index is based on the evenness and allows comparing identification keys of different sizes.
index, dichotomous key, evaluation
A taxonomic key is a method used to identify organisms. Dichotomous keys are the most popular type of identification keys. Dichotomous keys are single entry identification keys. They consist of nested questions or couplets, and each question provides two choices or leads (Thesis and Antithesis). These choices contain descriptions of key characteristics of an organism. The paired statements or choices consider the differences between items. After choosing the statement that best matches the object, the user proceeds to another pair of statements until the name of the taxon is identified. There may be several keys for a group of taxa. This prompts the question, which key has a better performance, provided that all the used characters are good ones which allow an unambiguous identification? How can we evaluate quantitatively the performance of the keys? As a key is intended for identification of each of the taxa in the group, the key will achieve the highest performance when the mean number of steps to their identification is minimal. If the number of steps to identification of the taxa in a key become more even, the mean number of steps to their identification is decreasing, and the mean number of steps to identification of the taxa is minimal when the number of steps to their identification are most even (Fig.
We use Pielou’s evenness index as a prototype for our index. Pielou’s evenness index (J) can be calculated using the following formula (
where:
 H’ is the Shannon diversity index. This measure was originally proposed by
In which p_{i} is the proportion of characters belonging to the ith type of letter in the string of interest and S the number of types of letter.
 H_{max} is the maximum value of H’ and equal to:
As result, Pielou’s evenness index can be calculated according to the following formula:
If the number of steps we have to pass to come to a decision (a taxon) is N_{i} and the total steps when we identify all the taxa is N, the proportion of the steps to identify the ith taxon is equal:
p_{i} = N_{i}/N
As can be inferred from the scheme of a dichotomous key (Fig.
We call the index for dichotomous keys E_{Dicho} (because of its origin from evenness index). As a result, E_{Dicho} is equal:
Where: S is the number of taxa of the key, and p_{i} is the proportion of steps to identify the ith taxon.
Many attempts have been undertaken in order to evaluate identification keys (e.g.
Several studies have been conducted to evaluate dichotomous keys in practice of key use (
The E_{Dicho} index in its nature is an evenness index, therefore it has all the properties of a normal evenness index and is constrained between 0 and 1. The higher the variation in the number of steps we have to pass to come to the determination of the taxa, the lower is the E_{Dicho} index, and the asymptotic lowest value is 0. The highest value of 1 can be achieved in case of all the taxa having the same number of identification steps (Fig.
Let us consider five dichotomous keys as shown in the Figure
Here, the number of taxa (S) equals 8. The number of steps or paired statements (Thesis + Antithesis) for identification of each taxon, the total number of steps for identification of all the taxa, and the proportion of steps to identify each taxon are the data for calculation of H’_{Dicho} of the dichotomous key and are presented in Table
The calculation of H’_{Dicho} and E_{dicho} of five versions of the dichotomous key is presented in Table
Key version  The number of steps for identification of each taxon  The total number of steps for identification of all the taxa  The proportion of steps to identify each taxon 

1.I  1,2,3,4,5,6,7,7  35  1/35,2/35,3/35,4/35,5/35,6/35,7/35,7/35 
1.II  1,2,3,4,6,6,6,6  34  1/34,2/34,3/34,4/34,6/34,6/34,6/34,6/34 
1.III  1,2,4,4,5,5,5,5  31  1/31,2/31,4/31,4/31,5/31,5/31,5/31,5/31 
1.IV  2,2,3,3,4,4,4,4  26  2/26,2/26,3/26,3/26,4/26,4/26,4/26,4/26 
1.V  3,3,3,3,3,3,3,3  24  3/24,3/24,3/24,3/24,3/24,3/24,3/24,3/24 
Key version  H’_{Dicho}  E_{Dicho}= H’_{Dicho}/ln(8) 

1.I  {(1/35).ln(1/35)+(2/35).ln(2/35)+(3/35).ln(3/35)+(4/35).ln(4/35)+ (5/35).ln(5/35)+(6/35).ln(6/35)+(7/35).ln(7/35)+(7/35).ln(7/35)} 
0.937 
1.II  {(1/34).ln(1/34)+(2/34).ln(2/34)+(3/34).ln(3/34)+(4/34).ln(4/34)+ (6/34).ln(6/34)+(6/34).ln(6/34)+(6/34).ln(6/34)+(6/34).ln(6/34)} 
0.943 
1.III  {(1/31).ln(1/31)+(2/31).ln(2/31)+(4/31).ln(4/31)+(4/31).ln(4/31)+ (5/31).ln(5/31)+(5/31).ln(5/31)+(5/31).ln(5/31)+(5/31).ln(5/31)} 
0.959 
1.IV  {(2/26).ln(2/26)+(2/26).ln(2/26)+(3/26).ln(3/26)+(3/26).ln(3/26)+ (4/26).ln(4/26)+(4/26).ln(4/26)+(4/26).ln(4/26)+(4/26).ln(4/26)}  0.983 
1.V  {(3/24).ln(3/24)+(3/24).ln(3/24)+(3/24).ln(3/24)+(3/24).ln(3/24)+ (3/24).ln(3/24)+(3/24).ln(3/24)+(3/24).ln(3/24)+(3/24).ln(3/24)}  1.000 
By using computer software it is possible to create many dichotomous keys for a group of taxa with the same set of pairs of dichotomous characters. It would be desirable to have a sound basis for choosing one or another key version. The E_{Dicho} index developed here is suitable for a quantitative evaluation of dichotomous keys. It can serve well as the mathematical basis for the task of choosing the dichotomous key with the best performance. Because the index is based on the evenness, it can be used to compare the identification keys of different sizes.
This work has been supported by the VAST04.06/1617 project and the IEBRUFZ joint research LEGATO project.