Corresponding author: Mary E. Barkworth (
Academic editor: V. Blagoderov
The goal of the US Virtual Herbarium (USVH) project is to digitize (database, image, georeference)
The US Virtual Herbarium project was started in 2008 at a meeting held in conjunction with the annual meeting of the Botanical Society of America. Those present were asked whether they were in favor of attempting to develop integrated access to specimen information residing in all US herbaria, creating in essence, a US Virtual Herbarium (USVH). The meeting followed 20+ years of digitizing efforts (primarily databasing) within US herbaria. It had been called because, despite these efforts, there was no evidence of a program to build a national resource that would include all herbaria. Some of those voting had been involved in digitization efforts. Others came looking for help, both financial and technical, in starting the process. At the end of the meeting, all those present endorsed the concept. Thus the project started, not in direct response to a national initiative or program but as a statement of interest by those directly involved with herbaria.
The meeting was held under the auspices of the Western Association of Agricultural Experiment Station Directors (WAAESD). Each state has an Agricultural Experiment Station (AES) and their directors work together, regionally and nationally, in areas of joint interest. Although it was AES directors in the western states who sponsored the meeting, the USVH project has always been national in scope. Formally speaking, the purpose of the meeting was to determine whether there was sufficient support to justify WAAESD sponsorship of a 5-year committee to coordinate work towards a single access point to information from all US herbaria. Given the support expressed, formation of the committee was approved.
WAAESD sponsorship provides a formal but flexible structure within which to operate. It does not provide funding; it does provide freedom in determining how best to pursue a group’s objectives. It also provides a mechanism for disseminating information through the National Information Management and Support System (NIMSS). Reports and announcements posted to NIMSS are sent to AES directors in each state as well as to registered participants. Because most herbaria are not connected with AES, the sponsorship by WAAESD immediately increased awareness of herbaria.
The executive committee’s first task was to develop explicit goals for the project. After considerable debate, it agreed that the overall goal of the US Virtual Herbarium project should be digitizing all specimens in all US herbaria. The result will be a major new scientific resource but the greatest benefits will result from working towards this overall goal, a process that will require helping collectors and curators record information in a manner that maximizes the value of a specimen, use the tools being developed for capturing and sharing collection information, and make use of the resulting information in their research, education, and outreach activities. It will also require increasing interaction among those who work in herbaria and educating users in diverse disciplines about the value and use of collection data. Much of the value of the project lies in ensuring that these benefits are experienced by all those involved with herbaria and in teaching students about algal, fungal, and plant diversity.
Herbarium specimens provide a particularly rich information layer to the world’s biodiversity resources because they represent sessile organisms. They show the ability of a taxon to complete its life cycle at a particular location and time and, in some instances, provide information about the prevailing growing condition (see, e.g.,
There are 729 registered herbaria in the US (
About 78% of US herbaria are owned by an academic institution. Academic herbaria, particularly those in smaller institutions, offer excellent opportunities for involving students. Countering this potential is the fact that small herbaria often receive little or no formal support from their institution and may not be actively curated. Of the remaining herbaria, about 13% are owned by a government entity, usually federal but in some cases state, county, or municipal. About 9% are associated with botanical gardens or independent museums; among these are eight of the herbaria with a million or more specimens.
In 2009, Thiers provided Barkworth with a list of US herbaria registered with
In addition to there being many herbaria in the US, there are many different taxonomic opinions, particularly with respect to vascular plants. These are reflected in state and regional floras. There are resources to help interpret the resulting complexity, e.g.,
Regional networks and herbaria in the U.S.A. Network boundaries are guides; herbaria are free to join the network of their choice. Some herbaria contribute records to more than one network. No network has been established as yet for the Great Plains, Great Lakes, and Southern Rocky Mountain Regions. Data obtained June, 2011.
Overview of US regional and taxonomic herbarium networks. The Southwest and Intermountain Regions share a database but have different portals. “Herbaria” indicates the number of herbaria currently providing information to the network; numbers in parentheses are for extra-regional herbaria. Records are text-based records. Geo: percentage of georeferenced records. Most data obtained from web sites or node managers, March 31, 2012
Network | URL | Taxonomic scope;Location of source herbaria | Herbaria | Records |
---|---|---|---|---|
Existing networks | ||||
California herbaria (CA) |
|
Vascular plants;California | 20 (1) | 1,454,000 |
Pacific Northwest Herbaria (PNW) |
|
US: Alaska to Oregon + Idaho and Montana. CANADA: British Columbia, Yukon. | 57 | 1,763,040 (174,160 images) |
Southwest (SEINet) and Intermountain (IRHN) | US: Southern California east to New Mexico, north to Nevada, Idaho, and ColoradoMEXICO: Baja California, Sonora;Vascular plants. | 32 (2) | 2,069,025(67% Geo) | |
Pacific Islands (CPH) |
|
Hawai’i and the Pacific basin [Currently 3 of 15 herbaria connected]Vascular plants. | 15 | 60,000 |
Northeast (CNH) |
|
US: north and east from Pennsylvania CANADA: Ontario eastward;All taxa. | 58 | 409,883 |
Southeast(SERNEC) |
|
From Eastern Texas to Virginia to the Atlantic and Gulf Coasts;All taxa. | 14 | 140,000 |
Wisconsin Flora |
|
Wisconsin;Vascular plants, lichens | 8 | 370,000 |
Alabama Plant Atlas |
|
Alabama;Vascular plants | 9 | 78,000 |
Bryophytes |
|
North America;Bryophytes. | 10 | 922,047(38% Geo) |
Lichens |
|
North America; Lichens. | 16(1) | 627,756(55% Geo) |
Macrofungi |
|
North America; Macrofungi | 5 | 154,526(13% Geo) |
American Myrtaceae |
|
Myrtaceae in the Americas | 4 | 64158 (63%) |
• Commitment, energy, time, resources, and funding are the most critical needs of the USVH project. Of these, time is usually the most scarce resource, particularly in smaller herbaria in which a single individual has to fulfill many different functions. It can, of course, be alleviated to some extent by funding but digitization will require a time commitment on the part of the person or persons responsible for a herbarium. Funding for other resources is also needed but much can be done with minimal financial support now that effective software and work flows have been developed, particularly if hardware is shared.
• The range in size of US herbaria (from less than 1000 to over 8 million) and their diverse roles is matched by the diversity of their resources and goals. Many have little or no IT support and little or no budget; others, even some smaller herbaria, have strong IT support, significant endowments, and substantial volunteer support. Goals range from research on a global level to being a reference collection for training of seasonal employees.
• Curators have diverse backgrounds. Most, particularly in mid-sized to large herbaria, are professionally trained taxonomists with memberships in professional societies such as the Botanical Society of America and the American Association of Plant taxonomists. Others have backgrounds that range from ecology to paleobotany, with their professional associations being equally diverse. This presents a challenge to developing an effective information flow among all herbaria. Regional collaborations on multiple scales are effective in addressing this challenge but require a leader with time to commit to the task.
• There is no best approach for digitizing herbaria; there are multiple effective approaches. The needs and resources of large research herbaria with multiple type specimens and collections from many countries and multiple centuries differ from those of small herbaria serving a forest district or a teaching institution. In working with those in charge of herbaria, one must recognize and respect their differing priorities and resources. Adopting theoretically suboptimal procedures for digitization may be the best procedure if the resources needed for adopting a better procedure are not available.
• Broadening participation requires minimizing barriers while maximizing benefits. Symbiota (
• It does not matter whether a herbarium starts with imaging or databasing. The important thing is to start. Specimen records that consist only of text-based information can be used for generating checklists, georeferencing, and searching. Specimen records that consist only of an image are of little value until the label information is databased but imaging can accelerate databasing and enable offsite-databasing. Establishing both of these, however, requires infrastructure development, both technical and human.
• Remote data entry and incorporation of optical character technology into the data entry process can speed up data entry but it requires access to images which, in turn, requires access to appropriate equipment.
• Integrating optical character recognition (OCR) technology into data entry tools will accelerate data entry for the very large number of specimens with clean, typewritten or computer generated labels but entries need to be reviewed before being accepted. Major obstacles to widespread adoption of OCR-assisted data capture are a) lack of imaging equipment and b) the need to incorporate OCR-assistance into the data entry module of the various database systems used in herbaria, a process that is underway. For interpreting hand-written or unclear labels, OCR is less effective than humans.
• Automated georeferencing tools, such as Geolocate (
• Batch georeferencing, in which multiple specimens with the same locality information are georeferenced simultaneously, greatly accelerates georeferencing. The acceleration is greatest if records from multiple herbaria can be georeferenced simultaneously. Technological impediments to effective batch georeferencing include the absence of a mechanism for sharing specimen records among networks and the need for tools that “repatriate” the georeferencing information back to the specimen records. The human impediments include lack of knowledge as to how to georeference specimens and/or use the tools available for assisting in the task, impediments that can be overcome by workshops and online tutorials. Another impediment is the need for effective management of such collaborations.
• Enabling collectors to enter their collection information directly into a database that can both generate labels and provide data to the databases of recipient herbaria should be given high priority. Ideally, such programs should make it possible to enter information whether offline or online and for multiple taxonomic groups because individuals frequently collect more than one kind of organism. If data are entered offline, it should be possible to clean them when the connection is restored. (see, e.g., Atrium
Label generating tools will not help digitize the specimens currently in herbaria but early adoption of database-driven label production combined with aggressive pursuit of funding opportunities enabled the herbaria of the University of Wyoming and the Missouri Botanical Garden (1.4 and 6.3 million specimens, respectively) to have over 50% of their collections databased by the time of the survey. The only other large US herbarium to have more than 50% of its 950,000 specimens databased is the National Fungus Collection which has 89% of its collections databased, a noteworthy accomplishment.
• Regional collaborations are the most effective method of spreading digitization. They make it easier to share imaging equipment and develop the localized resources (e.g., checklists, identification tools) that give immediate, easily recognized value to regional portals. They also make establishing personal relationships among data providers easier, relationships that subsequently become effective social networks for sharing ideas and information. Development of regional networks is also critical to building the long term, broadly based support required to create and sustain a truly national herbarium network, one that involves all herbaria.
• The map (
• There is often a lag time between agreeing to establish a network and actually having a network that people can use. Herbaria with their own specimen databases need to develop scripts for exporting their data to the network database and ensuring that new and modified records are exported at regular intervals. Constructing and testing these scripts takes time. It may also be found that the existing data has to be cleaned up before being exported. Another source of delays can come from establishing formal memoranda of understanding. Delays are greatest if the herbaria are located in different countries or belong to a private institution. Some networks operate without formal memoranda.
• There is a need for the single, all-embracing network that is being established by iDigBio (see below). At present, herbaria with specimens from different taxonomic groups need to send their data to multiple networks (there are separate networks for bryophytes, lichens, and macrofungi). Moreover, at present regional nodes only provide access to specimens from herbaria within their region, e.g., data for specimens from the northeastern US residing in herbaria of the intermountain region are not currently made available to the northeastern network. It also means that users wishing to examine all biodiversity within a region have to go to multiple networks to obtain the information they seek and each network. To maximize the value of a truly integrated network, however, its data must be readily accessible and easily queried not just by biodiversity informatics specialists but also by the general public and educators at all levels and in many different disciplines because it is, ultimately, these people whose support will be required to sustain the network’s maintenance and development.
In February 2010, an NSF-funded workshop brought together individuals with knowledge in different aspects of digitization to discuss how best to develop a national herbarium network. Several useful discussions and contacts resulted from the workshop but that fall the NSF announced its Advancing Digitization of Biological Collections (ADBC) Program. ADBC projects fall into two categories, creation of “a permanent database of digitized information from all biological collections in the U.S. (
These two developments forced us to rethink how the US Virtual Herbarium project could best achieve its objectives, assuming they were still valuable, while complementing the work of ADBC-funded projects. The goals of the US Virtual Herbarium project are similar to those of iDigBio apart from its sole focus on herbaria, but it has a somewhat different emphasis. For iDigBio, extending participation to all collections in the US, both large and small, is a third phase, while for USVH, it is the priority. A recent analysis of the botanical capacity of the US (
The BISON project should provide the access to herbarium records and tools for working with them that were part of the original vision for the US Virtual Herbarium project, at least so far as the US is concerned. It is, however, dependent on the quantity and quality of records made available to it. The USVH project’s primary focus is on helping herbaria both provide the needed records and ensure that are of the quality standards needed for use in environmental analyses. In doing so, the project will expand the number of individuals who understand the concepts involved and enable interested individuals to obtain data as it becomes available. Moreover, making information available now has resulted in the herbaria involved receiving feedback concerning some of their specimens, feedback that comes from knowledgeable individuals and will, ultimately, benefit BISON.
Much has been learned about building a herbarium data layer in the US but the majority of herbaria are still not contributing to its development. There are some herbaria that, although digitizing their specimens, do not make the resulting resources available other than on their own network and some that have not started any part of the digitization process. In the latter cases, the problem may be that the herbarium forms a very small part of the responsibilities of the person in charge, or that the person in charge does not know how to start, or that he or she simply does not have the time. Personal contact is often a key step to bringing isolated herbaria into a network. When making such contacts, the benefits that will accrue from membership in a network need to be presented in terms that are relevant to the mission of the herbarium concerned and the person or persons running it. These benefits should, to the maximum extent possible, be immediate and direct. The greatest benefit, without question, is funding but software developments combined with the ability to share resources with and tap into the knowledge of those already in a network have substantially reduced the amount of funding required.
The benefits to medium-sized and smaller herbaria of participating in a regional herbarium include greater publicity, the ability to show how their specimens contribute to overall knowledge, and a mechanism for identifying where to focus future collecting efforts, all of which help validate their worth to institutional administrators. It provides students at academic herbaria an opportunity to participate in a regional and national informatics enterprise while improving the currency of their education. In addition, it helps build professional relationships among individuals who, because of disparate interests and obligations, might not normally connect with each other. Other benefits depend on the resources made available at the network level. These need to benefit a wide range of individuals because it is by offering such benefits that herbaria, and collections in general, earn public support. Such tools can range from quizzes about plants in a grocery store, to games where participants score points for being able to identify plants from images.
Investment in medium-sized and smaller herbaria can have major impacts on the botanical sciences in the US. These herbaria, their associated curatorial staff and users often provide the experiences that steer students towards the botanical sciences. This is important because a disproportionate number of graduate students come from such institutions. Research intensive universities, state and federal agencies, and non-government organizations are dependent upon these “feeder institutions” to provide a flow of graduate students and professional botanists.
All larger herbaria are digitizing their collections, usually maintaining their own database and web site in addition to participating in one or more networks. If, as is the case in several large herbaria, much of their current research and collection activity lies outside the US, these activities may be most appreciated outside the US but they are essential to attainment of the US Virtual Herbarium’s overall goal, digitization of all specimens in all US herbaria. Large herbaria can benefit from joining a network by becoming the “go-to” herbarium for web-related resources. They are also usually better positioned to attract funding for positions to support a regional network. In addition, contributing records to the region where they are located helps them demonstrate that they are “good neighbors” which may assist them in obtaining benefits from the jurisdiction in which they lie.
An area that still needs improvement is building the bridges needed for sharing ideas, information, and concepts between those directly responsible for herbaria and those with specialized knowledge in areas relating to digitization and use of the flood of information it is providing. There are many such areas: biodiversity informatics, information technology, computer science, geography, and education. Working with specialists in these areas will develop a richness and synergy that benefits all involved. The US Virtual Herbarium project can help extend the benefits of such interactions throughout the herbarium community. Among these benefits are increased efficiency in herbarium management which will, ultimately, free up the time of those involved for research and educational activities. Developing these interactions requires that all involved respect each other’s different backgrounds, obligations, interests, and knowledge.
What of the immediate future? There are several steps that the USVH project plans to take. Regional consortia or networks are extremely beneficial in helping move multiple herbaria forward, but some parts of the country have, as yet, no effective network. One of our immediate targets is to facilitate linking all herbaria to a regional network. This can be accomplished either by expanding the region covered by an existing network, possibly with separate portals for subregions (e.g., SEINet and IRHN), or by creating new networks. Both scenarios will require acquisition of additional server space and support personnel.
Georeferencing vastly increases the value of collection records and enables searches across space which may be more relevant to some research questions than searches across taxa (
Data cleaning is another aspect that has, as yet, received surprisingly little attention from herbarium networks. The primary reason may be that the focus is on obtaining records and engaging herbaria, but there are now enough records in each network that building mechanisms for routinely identifying problems is highly desirable. These should be run at the herbarium level with cleaning at the regional level being a second line of defense. The need is for tools that check that georeference and elevation data are at least consistent with the lowest political unit used (usually county for the US, often state for other countries). The scientific name used must also be checked for accuracy because some herbaria may have recorded data in databases (or spreadsheets) without verifying that the names entered were valid. Another check, one that is probably best combined with georeferencing, is for the spelling of place names. Some will be found to be phonetic renditions (Chian for Cheyenne); others are merely misspellings.
Crowd-sourcing of data capture is already being explored in the US and elsewhere. What is not clear yet is how many volunteers can be found to take a short, online training session and then enter data for herbarium specimens online nor whether it is best to focus on identifying and capturing critical data, leaving capture of the remaining data to a later stage, or whether to try and capture all data at once. As with so many other decisions, there are pros and cons to both approaches. It is important, however, that we are transparent in reporting our accomplishments. Capturing a few fields from a million labels is not the same as capturing all label information from a million records.
Taxpayer funds, whether federal, state, or local, will not cover the cost of digitizing herbaria and maintaining herbarium networks. We must aggressively pursue other funding opportunities, including some that most of us involved with herbaria do not normally approach, such as wealthy individuals with an interest in the environment and stores that sell equipment and clothing to people who enjoy hiking. “We” in this case involves all in charge of herbaria but the approach each person takes has to reflect their abilities and interests and as well of those of the herbarium for which they are responsible. It should also complement their other responsibilities (and conform to their institution’s guidelines). The US Virtual Herbarium project can help by disseminating information about successful approaches, developing templates, and seeking funds that will benefit multiple herbaria or networks.
Requests for financial support are more likely to be well received if it can be demonstrated that they will result in a product that benefits many user groups. To encourage use of the information available through existing herbarium networks, we need to work with K-12 educators to develop units that make use of network associated information while meeting state and national science standards. We must also work with state native plant societies, recognizing their value and asking their assistance in promoting use of our networks and their further development. We also need to make sure that government employees are aware of the information being made available, emphasizing its value in their work and to their constituents. And in all these interactions, we must not forget to ask what would make the resources we are developing more useful.
In addition to seeking funding from new sources, all those involved in herbaria must keep looking at work flows to see if they can be made more efficient. Sometimes simple changes, such as using preprinted barcodes to put a catalog number on a specimen rather than using a stamping machine, can save considerable time, time that can used for other purposes. Another possible change is to enable and expect those who borrow specimens to enter their information into the owner’s database or into a regional database from which the owning herbarium could import the records and images. Since almost anyone borrowing specimens nowadays enters information from them into a database, this would require little additional work for the borrower but would greatly aid the loaning institution.
Sustaining the networks also requires maintaining the integrity of the data over time. The costs of doing so are non-trivial because, as
The number and distribution of herbaria in the US, together with the number of specimens they house, make them a prime resource for research in many different disciplines. Providing access to their information will enable sophisticated analyses at levels of scale, scope and accuracy that are unparalleled in the life sciences. It can also be used to introduce and encourage a fascination with plants, fungi, and algae by students at all levels in ways that incorporate inquiry. Digitizing herbaria will also enable those who work in herbaria more opportunities to study the organisms they love, and their interactions, by increasing the ease with which diverse user groups can access herbarium-based information without assistance from herbarium personnel.
The impediments to achieving the goal of the US Virtual Herbarium project, digitizing all specimens in all US herbaria, are resource-based, but they can be offset by focusing on the human factor. The project is dedicated to unlocking the vast resource represented by herbarium specimens by assisting in development of the human and knowledge infrastructure needed. It is accomplishing this task by linking people, ideas and tools into an integrated whole. Much of this involves extending the tools, knowledge, and resources developed by funded projects to more herbaria by establishing connections among people with the varied skills and interests needed, thereby building an integrated community of people working towards a common goal.
We thank all the herbarium curators who responded to the survey and Ben Legler for preparing
This shows the questions asked. It is not the original form; that had a lot more blank space. The survey was kept short out of respect for the respondent’s time.
Herbarium Code: ________________________________________________
Specimen total (estimate): _________________________________________
Number of specimens databased: ___________________________________
Number of specimens imaged:______________________________________
URL for searching database: _______________________________________
URL of regional node through which data are available: __________________
Other nodes through which your specimen data are available: _______________
Herbarium Name:________________________________________________
Department:____________________________________________________
Address 1: _____________________________________________________
Address 2: _____________________________________________________
City: ________________________ Zip Code: ________________________
Phone: ________________________________________________________
PO Box:_________________________ Mail Stop:______________________
Lat.:_________________________ Lon.:_____________________________
Name of contact person: ___________________________________________
Email of contact person:___________________________________________
Taxonomic focus:________________________________________________
Geographic focus:________________________________________________
This is a listing of all web sites mentioned in the text and a brief synopsis of their significance to the paper.
Alabama Plant Atlas: Provides information about plants in Alabama, including information derived from several herbarium databases.
Algaebase: AlgaeBase is a database of information on algae that includes terrestrial, marine and freshwater organisms.
Apiary: Program for enabling capture of collection data in the field.
Atrium: Technology data for managing diverse biodiversity data.
Consortium of California Herbaria: State herbarium network.
Consortium of North American Bryophyte Herbaria: Taxonomically focused herbarium network.
Consortium of North American Lichen Herbaria. Taxonomically focused herbarium network.
Consortium of Pacific Northwest Herbaria: Regionally focused herbarium network.
Cooperative Taxonomic resource for American
Index fungorum: Synonymized list of fungal names.
Institute for Museum and Library Services (IMLS): US federal agency that has funded some of the work described.
Intermountain Region Herbarium Network: Regionally focused herbarium network. Shares database with SEINet.
International Plant Names Index (IPNI): List of plant names and an indication of whether or not they are valid. Only shows nomenclatural synonyms.
Mycoportal: Taxonomically focused herbarium network.
National Information Management and Support System (NIMSS): Information systems that serves the Agricultural Experiment Stations and the Extension Service in each state.
National Science Foundation (NSF): US federal agency that has funded much of the work described.
SERNEC: Regional network for strengthening communication and promoting data sharing among herbaria, now also serving as a regional herbarium network.
SourceForge: Web site that provides access to open source software.
Southwestern Environmental Information Network (SEINet): Regionally focused herbarium network. Herbaria in the Intermountain Region share data with this network.
Symbiota: Open source software for promoting collaboration and data sharing among herbaria.
Tropicos: Nomenclatural resource for bryophytes and vascular plants that shows how a name has been treated in different publications. Also the specimen database of the Missouri Botanical Garden.
US Virtual Herbarium (USVH): Project for promoting digitization in US Herbaria. This web site is not being maintained because of funding decisions by the US government. Arrangements are being made to move it, or something similar, to another site.
Utah State University Herbarium: Provides access to the results of the 2012 herbarium survey.
WisFlora: Provides information about plants in Wisconsin, including information derived from several herbarium databases.
Presented below are the questions asked on the 2012 survey. To save space, only the questions asked about digitization are shown. For more information, see
About how many specimens are there in your herbarium? Please provide a single number, not separate estimates for different kinds of specimens.
How many specimens in your collection have been at least partially databased?
How many specimens have been fully databased (you may answer unknown)?
How many of your
How many of your
The next questions ask about the web site(s) through which your specimen information is available. If your database cannot be searched via a web site, you have finished the survey. Thank you for taking the time to complete it. If you wish to make a comment or suggestion, please use the space the end. Hand written comments are welcome
If your records are searchable via an institutional web site, what is its URL?
If your records are searchable via one or more regional websites, what are their URLs?
If your records are searchable via one or more taxonomically focused web sites, what are their URLs?
If you provide searchable access to your records through a regional web site that lies primarily outside the US, please indicate the focus of the site(s) and its(their) URL(s).
YOUR Comments: