A multi-access identification key based on colour patterns in ladybirds (Coleoptera, Coccinellidae)

Abstract An identification key based on French ladybird colouration is proposed for the tribes Chilocorini, Coccinellini, and Epilachnini. These tribes were chosen based on their relatively limited species diversity, as well as their large size and high colour diversity, making them easy to observe and collect. The identification key runs on Xper3 software, which allows the building of structured knowledge bases and online free-access keys. The online interactive Xper key is available at http://french-ladybird.identificationkey.fr.


Introduction
The identification of species is central in ecology, conservation biology, systematics, and related disciplines (species inventories and community studies, ecosystem management, establishment and improvement of environmental public policies, taxonomic reviews, and management of natural history collections) (Oliver 1988, Hebert et al. 2003, Smith et al. 2008, Vander Zanden et al. 2010. Europe is one of the best-known parts in the world in terms of biodiversity (Fontaine et al. 2012), especially concerning distribution patterns at the country scale. This has been highlighted by the release of the Fauna Europaea database since 2004(de Jong et al. 2014, which gathers the scientific names and distributions of all living European animal species and is assembled by a large network of specialists. However, most new species are described by non-professional taxonomists (Fontaine et al. 2012) and the distribution of a majority of organisms remains poorly known. Citizen science programs aim to fill that gap (Silvertown 2009), thanks to the participation of amateurs and the general public to the inventory and description of life (e.g., National Biodiversity Network in the UK, Swedish Species Gateway in Sweden, Chicago Wilderness Project in the USA, Vigie-Nature in France; see Silvertown 2009). From this perspective, visual and interactive identification of species offers tremendous potential for the general public.
If the identification of large and charismatic animals may be easy, the majority of organisms require expert skills for accurate identification and the inability to identify species represents a major challenge known as the Taxonomic Impediment (SCBD 2010). The most basic requirement for people studying and working on biodiversity aspects is the availability of species identification guides. However, easy-to-use identification guides for non-taxonomists and the general public are scarce and available for relatively few taxonomic groups (SCBD 2010). Consequently, the other features of organisms (such as distribution, ecology, biology) remain poorly known (Costello et al. 2006, SCBD 2010.
Coccinellidae is a family of beetles popular and appreciated by naturalists and the general public. Because these animals have ecological and economic values as predators of pest insects (e.g. aphids, scale insects), their identification may be of importance for naturalists, amateurs and professionals (Hemptinne et al. 2005, Hodek and Honěk 2009, Ali et al. 2014. Several citizen science programs aim to describe the distribution patterns of this group: we can mention for instance the Harlequin Ladybird Survey (http://www.harlequin-survey.org) and the Ladybird Survey (http://www. ladybird-survey.org) in the UK, the Lost Ladybug Project (http://www.lostladybug. org) and the Buckeye Lady Beetle Blitz (https://entomology.osu.edu/about-us/multimedia/buckeye-lady-beetle-blitz) in the US, and the Coccinula Recording Scheme in Belgium (Baugnée et al. 2011). The data collected led to a significant number of scientific works that have been published (e.g., Brown et al. 2008, Comont et al. 2012, 2014, Gardiner et al. 2012, Purse et al. 2015. Single-access identification keys consist of a series of identification steps that form a single and unique identification path for a given taxon. Although it is a very powerful tool for identifying species, the user cannot choose the character to be observed (the answer for every single step must be known), and the identification is impossible if some characters are missing (e.g., if the specimen is poorly preserved). Moreover, this type of keys cannot be modulated or adapted to various kinds of publics, environmental conditions, season, or geographical location.
Most North American or European ladybird identification keys are single-access and difficult to use for non-specialists (Dauguet 1949, Iablokoff-Khnzorian 1982, Gordon 1985, Chapin and Brou 1991. Others are mainly based on shape and colour, but most characters need specific vocabulary, which makes the key still too complicated for the general public in the perspective of citizen science programs (Belgium: Baugnée and Branquart 2000;West of France: Le Monnier and Livory 2003;British Isles: Roy et al. 2013;North of France: Declercq et al. 2014).
Modern tools developed along with digital technologies and data processing make identification easier for the user. In this perspective, several interactive identification keys (IIK) are available online (e.g., http://www.ladybird-survey.org/bbc/spotter.php, http://www.discoverlife.org/20/q?guide=Ladybug), but most of them are only digital versions of single-access keys and maintain the same difficulties for the user.
A multi-access interactive key is a computer-aided identification tool that makes it possible to find correct names of species where the user enters attributes (characterstate values) of the specimen (Dallwitz et al. 2013). The advantages compared to conventional keys are as follows: characters can be used in any order, characters are ordered to start with the one that best separates the remaining taxa, keys can be completed with illustrations (pictures, drawings) and texts explaining the terminology used, correct identifications can be obtained despite errors made by the user (FloraBase -https:// florabase.dpaw.wa.gov.au/keys; Dallwitz et al. 2013). The software also includes the possibility to print a single-access key for field identification if needed, and to weight characters according to the user skills and abilities (students, general public, natural-ists…). Despite the advantages provided by multi-access interactive keys, none has been produced for ladybirds so far.
This study aims to i) release the first multi-access digital interactive identification key for French ladybirds based on colour that takes into account intraspecific variability; and ii) study and discuss the discriminating power of the characters: can we identify species by colour pattern only? What are the most discriminating characters?

Taxonomic coverage
As the aim of the key is to provide an identification tool for the general public in the perspective of citizen science programs, we have restricted the taxonomic coverage to the tribes Chilocorini, Coccinellini and Epilachnini (Table 1). Members of these tribes are relatively large (3-9 mm) and display a great diversity of colours, making them easily detectable in their environment and identifiable by non-specialists. We also included the most common colour forms, trying to cover most of the intraspecific variability of these species.
The current taxonomy (Seago et al. 2011) and the species list follow Tronquet (2014) and include native, introduced, and acclimated species. Sixty-six taxa are in- . Platynaspis luteorubra Goeze, 1777 (Chilocorinae, Platynaspini) was also removed due to its small size (2.5-3.5mm). Since only a few discriminating characters are known that are not reliable with colour patterns, Henosepilchna angusticollis Reiche, 1862 is not discriminated from its congener H. argus Geoffroy in Fourcroy, 1785 in the key.
Specimens were examined in the collection of the Muséum national d'Histoire naturelle, Paris, France (MNHN) and their data are available at https://science.mnhn. fr/institution/mnhn/collection/ec/search.

Characters used in the key
A list of 21 morphological characters based on colour and shape is defined, mainly from existing identification keys (Iablokoff-Khnzorian 1982, Baugnée and Branquart 2000, Roy et al. 2013, Declercq et al. 2014 (Table 2). Only characters that are visible to the naked eye or with a ×10 hand lens are included. The character nomenclature follows Roy et al. (2013), except for characters #10, 11, 15, 16, 17, and 18. All characters were treated as discrete.

Interactive identification key construction and statistics
Digitalization of the 47 species was performed using Xper 2 v.2.3.2 (Ung et al. 2010) and transferred to Xper 3 (Vignes- Lebbe et al. 2016). These softwares are dedicated to manage structured taxonomic descriptions, to analyse these descriptions and to produce keys (Kerner et al. 2011, Corvez and Grand 2014, Martin et al. 2015. A wiki and a documentation of Xper 3 are available at http://wiki.xper3.fr/index.php/ UserManualXper3. An Xper knowledge base is a set of items described using the same model and terminology, and documented by texts and images. In this key there are 66 items covering 47 species and 19 intraspecies colour forms. The descriptive model consists of a hierarchy of descriptors and a chosen terminology for expressing different possible values (states). The descriptors are the 21 morphological characters previously described. Some of them are consistent only if some conditions are true for another descriptor and these dependencies define a hierarchical structure of descriptors ( Table 2). The complete terminology (descriptors and states) is documented by images and texts in order to avoid misinterpretation, a crucial point for relevant identifications with the key. Figure 1 presents the description of the species Coccinella quinquepunctata following these model and terms.
Xper 3 was also used to compare species and genera. For each descriptor, the comparison tests are able to distinguish a pair of items. Three different measures are available (Burguière et al. 2013). The result is displayed as a table with different colours to separate three cases: (1) items have the same values for a given descriptor (= no discrimination), (2) one pair of items is completely distinct for a given descriptor (= total discrimination), (3) at least one pair of items has not equal values for the descriptor, but these values overlap (= partial discrimination). For a given descriptor the sum of the comparison for all pairs of items is a measure of its ability to distinct taxa (discriminatory power). The discriminatory power, which represents the quantitative assessments of the ability of a descriptor to distinguish taxa, is measured with the Xper original index (Lebbe 1991) implemented in the Xper 2 software. This index is based on the incompatibility between descriptions. Two taxa are incompatible (or dissimilar or discriminated) if for one given descriptor there are no states of descriptors in common. For each descriptor the index value ranges between 0 (null discriminatory power of the descriptor) and 1 (the descriptor can distinguish all taxa).
Comparisons within and between genera are made with the "compare groups" and "compare items" options of Xper 3 . For the comparison between genera, we estimated the number of discriminating characters, weighted or not by the number of colour forms. A subset of descriptors sufficient to discriminate the total of descriptors with the same efficiency was also calculated with the "minset" tool (Lebbe andVignes 1992, Ziani et al. 1994).

Comparison with standard keys
Two types of keys are available: free-access keys and single-access keys . A free-access key is a very flexible identification key allowing the user to choose the characters he or she wants to describe. Another web service (Ikey+) (Burguière et al. 2013) is a single-access key builder. A single-access key is a classical key in which descriptors are ordered steps. The topology of the key is a tree and it is possible to compute some indices on the tree: number of maximal steps, length of the paths, etc.
A single-access identification key was generated by IKey+ under Xper 3 with the default option and the Xper score method. In this case we show four statistics by taxon: the number of steps, the length of the shortest and the longest paths, and the average length of paths. This key was then compared with five single-access keys for European ladybirds (Dauguet 1949, Baugnée and Branquart 2000, Le Monnier and Livory 2003, Roy et al. 2013, Declercq et al. 2014.

Structure and analysis of the key
The consistency of the knowledge base has been tested with the "Checkbase" functionality of Xper 3 : no items share the same description and all items are described. The base is 100% complete. Twenty-one descriptors are used: five do not have any dependence (either father or son), four are parent descriptors (for which two are also child descriptors) and 14 are child descriptors (for which two are also parent descriptors). Ninetyeight states are described (minimal/maximal/average number of states: 2/12/4.67).

Discriminatory power of descriptors (Table 3)
The four most discriminating characters (XPER index >0.8) are the type of pronotum patterns (#5), the number of elytra markings (#8), and the number of lateral (#10) and longitudinal (#11) lines of elytra markings. These characters can separate taxa in 7 to 13 groups. For example, the two most discriminating characters (#5 and #8) split all the remaining taxa in 10-13 different groups including 2-13 taxa per group.
The characters #14 and #21 are the least discriminating as they both have an XPER index below 0.8. These characters are binary and split all taxa in two unequal groups (60 vs 2 for the character #14, 65 vs 1 for the character #21). Despite its weak discriminating power, the character #21 is the only one that can distinguish the two species Coccinella septempunctata and C. magnifica. Eleven descriptors are sufficient to separate all taxa (Table 3, in bold).

Comparison within and between genera
Comparison within a genus: Coccinella (Table 4) Among the 21 characters, 12 are informative (in blue) whereas the other nine are constant and cannot discriminate within this genus (in red). The intersection column shows what is constant in Coccinella, therefore helping with the description of the genus: black and white pronotum with two patterns (Central structure -solid, trapezium with two anterior-lateral white or orange marks), elytra with different markings, but always devoid of rim around the edge, short down hairs, cream rings around dots, or dark sutural band. (Table 5) Among the genera with at least two species studied, the most constant are Coccinula (5% of discriminating characters), Henosepilachna (14%) and Chilocorus (19%); the most variable are Adalia (76%) and Harmonia (67%). If weighted by the number of described colour forms per genera, the most constant are still Coccinula and Henosepilachna; whereas the most variable genera are Ceratomegilla and Exochomus.

Single-access identification key and comparison with standard keys
For each identification, the descriptive statistics of the generated key (Appendix 1) are: mean 4.2 steps (2-7), 1-4 paths leading to a taxon (mean 1.5). Unlike many other single-access keys, lots of steps for identifying a taxon do not follow the taxonomy. This is the case in the three tribes: for instance, the user can follow five different paths for identifying an Epilachnini species (in green). The same reasoning applies to Coccinella species (in red) with six different paths, and the colour forms of A. decempunctata (marked with a yellow star) with six different paths (Figure 2). In comparison to other standard keys (Table 6), this newly generated key is more efficient for finding the taxon, despite its highest number of included species, except for Coccinula quatuordecimpustulata and the key of Dauguet (1949). For example, only five steps are required in the generated key for identifying Coccinella septempunctata, whereas 8-14 steps are needed in the other keys.

Discussion
The work presented in this study led to the release of the first multi-access interactive digital identification key for French ladybirds. The adaptability and great number of possibilities provided by this new generation tool are unparalleled for this group, and make the key very flexible and abundantly illustrated and described, thanks to images and texts. Since it is available online and open to experts for modification, the identification key can easily be improved. It will be possible to add ladybird taxa and to extend the geographic area (e.g., a key to all European ladybirds).
Most classical and single-access keys share characters that are quite difficult to observe for students, naturalists and the general public (e.g. for ladybirds in Dauguet 1949or Roy et al. 2013: mandibles, tooth on tibia, tarsal claws, mesosternal epimera, abdominal post-coxal lines). Here, all taxa are distinguishable with only 11 characters focusing on markings (number and shape). All characters used in this new key are visible to the naked eye or with a x10 hand lens; therefore this tool is designed for non- Figure 2. Representation of a part of the single-access identification key generated by IKey+ under Xper 3 and the Xper score method (statistics detailed in Appendix 1). The taxonomy is highlighted (the three tribes included in this study, the genus Coccinella and the colour forms of Adalia decempunctata). Numbers in the circles represent the number of steps in the generated key.
specialists. Using this key, most species can be identified through pictures only, as it is already the case in the identification key for the photographic survey of flower visitors (Spipoll citizen science program, www.spipoll.org), also built with Xper 3 .