Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Articles

spacer

D-Lib Magazine
March 2003

Volume 9 Number 3

ISSN 1082-9873

Examples of Practical Digital Libraries

Collections Built Internationally Using Greenstone

 

Ian H. Witten
New Zealand Digital Library Project
University of Waikato, New Zealand
<ihw@cs.waikato.ac.nz>

Red Line

spacer

The Greenstone Digital Library Software [1] provides a way of building and distributing digital library collections, opening up new possibilities for organizing information and making it available over the Internet or on CD-ROM [Witten and Bainbridge, 2003]. Produced by the New Zealand Digital Library project [2], Greenstone is intended to lower the bar for construction of practical digital libraries, yet at the same time leave a great deal of flexibility in the hands of the user.

In accordance with the maxim "simple things should be easy, complex things should be possible" new users can quickly put together standard-looking collections from a set of source documents that may be HTML, Word, PDF, or many other formats [Witten et al, 2001]. Given an existing collection, it is easy to clone its structure and populate an identical copy with entirely new documents, provided they are in the same formats as those in the existing collection. A more committed user who studies the options that Greenstone offers can personalize the digital library system and create new kinds of collection that take advantage of available metadata to provide different kinds of browsing facilities, which are akin to different perspectives on the collection. Users with programming skills can extend the system by adding modular units called "plugins" that accommodate new document and metadata formats, and new browsing and document access facilities.

Greenstone has been used to make many digital library collections. Some were created within the New Zealand Digital Library as demonstration collections. However, the use of Greenstone internationally is growing rapidly, and several web sites show collections created by external users. Most contain unusual and interesting material, presented in novel and imaginative ways. This article briefly reviews a selection of Greenstone digital library sites to give a feeling for how Greenstone is being used for public digital libraries throughout the world. Examples are given:

  • from different countries (China, Germany, India, Russia, the UK, and the US);
  • from different kinds of library (historical, educational, cultural, and research);
  • with different sorts of source material (text, document images, pictures, and voice).

There are many more examples of Greenstone in other places. This is just a small sampling.

Examples of Digital Libraries

China: Peking University digital library

Site: <http://162.105.138.23/tapian/tp.htm>

Screen shot from Peking University digital library

Figure 1.1: Home Page
(For a larger view, click here.)

Screen shot from Peking University digital library

Figure 1.2: Illustrative Collage
(For a larger view, click here.)

An experimental collection was created by the Chinese Department at Peking University with the assistance of a New Zealand Digital Library project member who visited there some years ago. The collection contains rubbings of Tang Dynasty poetry, whose originals were carved into wood or stone. These are collections of images, but the text has been hand-entered into electronic form. The entire interface is in Chinese, and, like all Greenstone collections, is fully searchable.

Non-alphabetic languages present interesting problems because they require different techniques for presenting ordered lists of titles. Chinese has no single universally used way of ordering text strings analogous to alphabetic ordering in European languages. Several different ordering schemes are used as the basis of printed dictionaries and telephone directories, and browsers for these languages therefore call for special design (see [Witten and Bainbridge, 2003], for further discussion). This is a good illustration of the open-ended nature of digital library requirements.

Germany: Digitale Bibliothek Information und Medien

Site: <http://diana.iuk.hdm-stuttgart.de/digbib/gsdl/cgi-bin/library>

Screen shot from Germany Digitale Bibliothek Information und Medien

Figure 2.1: Home Page
(For a larger view, click here.)

This "Information and Media" digital library created by the University of Applied Sciences, Stuttgart, Germany, includes three collections: a bibliography, full-text documents about digital libraries and related topics, and technical documentation on the Linux operating system.

While most digital libraries give every appearance of being unmanned, this one advertises an administrator and helpdesk, both reachable by email.

The bibliography collection has full-text-searchable author and title indexes, but does not permit combined searches on multiple metadata fields. However, Greenstone does have a "form search" option that provides the facilities normally available with conventional bibliographic metadata search: a "simple" version that implements an AND operation between one or more fields, and an "advanced" version that provides general Boolean combination (and also allows stemming and case-folding to be controlled individually for each field). Greenstone has "plug-in" import options for numerous metadata formats, including OAI, MARC, BibTex, and Refer.

Russia: Mari El Republic government information

Site: <http://gov.mari.ru/gsdl/cgi/library>

Screen shot from Mari El Republic government information

Figure 3.1: Home Page
(For a larger view, click here.)

Screen shot from Mari El Republic government information

Figure 3.2: Sample Page
(For a larger view, click here.)

The regional government department in the Mari El Republic of the Russian Federation has built several Russian-language collections. Most of the collections are administrative, but one contains folk tales. This site is interesting because, by themselves and on their own initiative, the site operators added a Russian-language interface to Greenstone, which at the time offered several other different user interface languages. Since then, interfaces in languages such as Hebrew and Indonesian have been added to the standard list. The current list includes Arabic, Chinese, Dutch, English, French, German, Hebrew, Indonesian, Italian, Maori, Portuguese, Russian, Spanish, and Turkish.

India: Indian language demonstrations

Site: <http://144.16.72.189/cgi-bin/library.exe>

Screen shot of home page from Indian language demonstrations

Figure 4.1: Home Page
(For a larger view, click here.)

Screen shot of sample page from Indian language demonstrations

Figure 4.2: First Sample Page
(For a larger view, click here.)

Screen shot of sample page from Indian language demonstrations

Figure 4.3: Second Sample Page
(For a larger view, click here.)

Indian languages are particularly difficult to deal with because operating system (Windows) support for them tends to lag behind that for other widely-used languages. The Indian Institute of Science at Bangalore has built a demonstration collection that gives examples in both the Hindi (Devanagari) language and the Kannada language. To view these properly requires downloading special fonts; the sample pages give an example of each.

India: Archives of Indian Labour

Site: <http://www.indialabourarchives.org/>

Screen shot of home page from Archives of Indian Labour

Figure 5.1: Home Page
(For a larger view, click here.)

A collaborative project between the National Labour Institute and the Association of Indian Labour Historians, the Archives of Indian Labour are dedicated to preserving and making accessible historic documents on the Indian working class. This library, which is in the English language, contains collections on the All India Trade Union Congress (1928-96), the Commission on Labour (1930-1991), an Oral History of the Labour Movement in India, and special collections on key events in India's labour history.

United Kingdom: Gresham College Archives

Site: <http://www.gresham.ac.uk/greenstone/frameset.html>

Screen shot of home page from Gresham College Archives

Figure 6.1: Home Page
(For a larger view, click here.)

This collection includes all lectures given at Gresham College, London, from 1987, along with many other special publications, such as the Brief History of Gresham College (1597-1997). It is divided manually into the various subjects covered by the College. The collection is also issued on a standalone Greenstone CD-ROM that self-installs on any Windows computer and is accessed through a Web browser in exactly the same way as the online version.

United Kingdom: Kids' digital library

Site: temporarily unavailable.

Screen shot of home page from Kids' digital library

Figure 7.1: Home Page
(For a larger view, click here.)

A project at Middlesex University has been experimenting with a "Kids' digital library," deployed in a school in North London. Children can submit stories and poems to the library, which contains a collection of their work. Teachers can monitor submissions before they are incorporated. This project has involved significant changes to Greenstone at the coding level, which is possible because Greenstone is open source software.

United States: New York Botanical Garden

Site: <http://image2.nybg.org/cgi-bin/nybg.exe>

Screen shot of home page New York Botanical Garden

Figure 8.1: Home Page
(For a larger view, click here.)

Screen shot of sample page New York Botanical Garden

Figure 8.2: First Sample Page
(For a larger view, click here.)

Screen shot of home page New York Botanical Garden

Figure 8.3: Second Sample Page
(For a larger view, click here.)

The LuEsther T. Mertz Library has begun to digitize and make Web-accessible three rare 19th century works on American trees by French botanists André and François André Michaux. This eight-volume collection of three important illustrated botanical books reflects the early investigation of the flora of North America by botanists who were seeking new plants for commerce and horticulture. It contains many gorgeous full color plates.

United States: Aladin digital library

Site: <http://www.aladin.wrlc.org/gsdl/>

Screen shot of home page Aladin digital library

Figure 9.1: Home Page
(For a larger view, click here.)

This site contains digital material from the special collections of the seven universities of the Washington Research Library Consortium in Washington D.C. There are presently four collections. The first contains documents recording the foundation and day-to-day operation of the American National Theatre and Academy. The second has images of deeds, certificates, brochures, studies, reports, and correspondence documenting the history of Reston, Virginia. The third has one hundred illustrations produced for Harper's Weekly during 1861-1865 that relate specifically to the Commonwealth of Virginia's involvement in the Civil War. The fourth is a predominantly audio collection that gives recordings of interviews conducted by Felix Grant of jazz and blues artists. For copyright reasons, access to the digital audio files is restricted to members of the Washington Research Library Consortium.

United States: Center for the Study of Digital Libraries

Site: <http://botany.cs.tamu.edu/greenstone>

Screen shot of home page Center for the Study of Digital Libraries

Figure 10.1: Home Page
(For a larger view, click here.)

This digital libraries research site at Texas A&M University has an emphasis on digital floras — collections of digital images of plants. There are several prototype Greenstone collections containing numerous plant images, classified according to a family tree, and a separate collection with detailed biological descriptions of plants. Good use is made of Greenstone's hierarchical browsing facilities to allow access through standard biological taxonomic structures.

United States: Project Gutenberg

Site: <http://public.ibiblio.org/gsdl/cgi-bin/library.cgi?a=p&p=about&c=gberg>

Screen shot of home page Project Gutenberg

Figure 11.1: Home Page
(For a larger view, click here.)

An on-going project to produce and distribute free electronic editions of literature, Gutenberg now contains more than 3,700 titles from Shakespeare to Dickens to the Bronte sisters. This site, maintained by Ibiblio (one of the original Gutenberg mirror sites) uses Greenstone to make the entire collection available in fully searchable form. Access to large collections through full-text search is simple, fast, requires no metadata, and scales easily to massive amounts of text.

United States: LeHigh University Library

Site: <http://bridges.lib.lehigh.edu/cgi-bin/library>

Screen shot of home page LeHigh University Library

Figure 12.1: Home Page
(For a larger view, click here.)

Screen shot of sample page LeHigh University Library

Figure 12.2: First Sample Page
(For a larger view, click here.)

Screen shot of sample page LeHigh University Library

Figure 12.3: Second Sample Page
(For a larger view, click here.)

This site contains two collections: one of ancient Chinese vessels, and another called "Bridges of the 19th Century" that contains thirty monographs, manuals, and documents on American bridge engineering from their Special Collections. This collection is noteworthy for the care and attention that has been paid to designing the pages, and the way in which the default Greenstone design has been overridden, as the sample pages show.

United States: Pictures of the world

Site: <http://tuatara.ucr.edu/gsdl-bin/library?a=p&p=about&c=pictures>

Screen shot of home page Pictures of the world

Figure 13.1: Home Page
(For a larger view, click here.)

This is a personal collection of photographs, made available by Gordon Paynter, that presents a rich set of searching and browsing options — by date, place, title, and reel of film. Although quite small at present, there are virtually no limits to how large the collection can grow because structure throughout is based on metadata, which is quite small in volume. During early testing, Greenstone was used to build collections of over 10 million relatively lengthy metadata items (in the form of MARC records) without any problems arising.

The metadata for these photographs was entered in a succinct XML format allowing multiple assignments of metadata to the same item, and a single metadata assignment to apply to several items. The directory hierarchy containing the source files, or the filename conventions, can be used to allow assignment of the same metadata value simultaneously to large numbers of files. When new metadata values are assigned to an item, any previous values can be added to, or ignored.

United States: Mercy Corps

Site: Not available.

Screen shot of sample page Mercy Corps

Figure 14.1: Sample Page
(For a larger view, click here.)

The Mercy Corps, centered in Portland Oregon and with operations in about thirty of the world's most unstable countries, is using Greenstone to organize its extensive collection of in-house documents, manuals, forms, and memos. This is not a public site. However, it is especially noteworthy because the Mercy Corps has made significant enhancements to Greenstone to support a workflow for new acquisitions to the library. Field offices submit new documents by filling out metadata on a simple web-based form and attaching the document. It arrives in the in-tray of a central librarian who checks it for correctness and integrity before finally including the metadata in the appropriate collection. Collections, rebuilt automatically every night, are available on the web for in-house use and are written at regular intervals to CD-ROM for physical distribution.

The sample page shows a particular collection, displayed in a frame that contains the standard Greenstone interface.

New Zealand: New Zealand Digital Library

Site: <http://nzdl.org>

Screen shot of home page for the New Zealand Digital Library

Figure 15.1: Home Page
(For a larger view, click here.)

Screen shot of sample page for the New Zealand Digital Library

Figure 15.2: Continuation of Home Page
(For a larger view, click here.)

The New Zealand Digital Library website shows several dozen demonstration collections built by project staff. Some highlight particularly unusual capabilities such as the Musical Digital Library and First Aid in Pictures.

The Musical Digital Library subsection offers several innovative collections that involve music retrieval by singing or humming a snatch of the desired tune. In some collections, text search can be combined with melody matching to yield a more comprehensive search technique.

First Aid in Pictures is a collection designed for illiterate users: it presents purely pictorial, diagrammatic, information on basic First Aid. All the indexing mechanisms are also purely visually based. Explanatory text can be displayed at the bottom of each page, and spoken by a voice synthesizer.

International: Humanitarian collections

Site: <http://humaninfo.org>

Screen shot of home page for the Humanitarian collections

Figure 16.1: CD-ROM
(For a larger view, click here.)

Screen shot of sample home page for a Humanitarian collection

Figure 16.2: Sample Home Page
(For a larger view, click here.)

Greenstone is being used to deliver humanitarian and related information in developing countries on CD-ROM [Witten et al., 2002]. There are about twenty different collections from organizations such as the United Nations University, United Nations Food and Agriculture Organization, World Health Organization, Pan-American Health organization, GTZ, United Nations Development Programme, and UNAIDS. Many of these are produced by Human Info, a small NGO in Belgium, in conjunction with an OCR service bureau in Romania.

The image shows the CD-ROM cover of one of the collections, and the sample page shows the home page of the Medical and Health Library.

International: UNESCO project

Site: <http://www.unesco.org/webworld/build_info/gct/bestpractices/anthologies.shtml>

Screen shot of home page for the UNESCO project

Figure 17.1: CD-ROM
(For a larger view, click here.)

UNESCO is participating in developing and distributing Greenstone. Digital libraries are radically reforming how information is disseminated and acquired in UNESCO's partner communities and institutions in the fields of education, science and culture around the world, particularly in developing countries. The Greenstone project is an international cooperative effort with UNESCO established in August 2000. This initiative will encourage the effective deployment of digital libraries to share information and place it in the public domain. The software is distributed on a CD-ROM (the cover of which is shown in the sample image above). The current version, released in August 2002, is monolingual (English), but a trilingual version is about to be released with all interfaces, help text, documentation (several hundred pages of user guides and developer manuals), and installation instructions and readme files in English, French and Spanish.

International: Global Library Services Network

Site: <http://www.glsn.com>

Screen shot of sample page (central) for the Global Library Services Network

Figure 18.1: First Sample Page (central)
(For a larger view, click here.)

Screen shot of sample page for the Global Library Services Network

Figure 18.2: Second Sample Page (central)
(For a larger view, click here.)

Screen shot of sample page (local) for the Global Library Services Network

Figure 18.3: Sample Page (local)
(For a larger view, click here.)

Screen shot of access panel page for the Global Library Services Network

Figure 18.4: Access Panel (local)
(For a larger view, click here.)

GLSN provides remote communities with access to digital libraries for use offline. It implements an architecture and infrastructure to allow large-scale, non-networked digital libraries in remote places to be acquired, installed and updated, on a commercial (but low-cost) basis. GLSN makes arrangements with information providers, both commercial and non-commercial, to populate the collections. There are many freely available demonstration collections, mostly on health topics such as Adolescent Health, Asthma, Chinese medicine, to name just a few.

The first two sample images show the library catalog at the central GLSN site, and a user selecting a particular collection (on Malaria) to download. The next shows the user examining the library on their local computer after downloading and installing the software, while the last shows the GLSN main panel that gives access to the local collections ("Personal knowledge network") and the central library.

GLSN uses Greenstone at its core. GLSN has extended the basic facilities of Greenstone with an interactive web-based interface for selecting documents and gathering metadata.

Conclusions

The examples above show a wide variety of digital library types — and they are by no means exhaustive. We know of many other Greenstone collections in countries from Canada to South Africa, some of which have unusual features such as collections optimized for viewing on small-screen handheld devices.

The last four examples represent institutional rather than individual users. Each institution has large numbers of different collections. The New Zealand Digital Library, which originated Greenstone, offers scores of collections and represents the cutting edge of digital library research using Greenstone as a vehicle for dissemination. The Humanitarian collections involve a huge ancillary effort in digitizing thousands of books, reports, and other documents for inclusion on Greenstone CD-ROMs, and a vast distribution mechanism — fifty thousand copies are distributed annually, of which 60% are provided free. The UNESCO project distributes not information collections themselves but the capacity to build new information collections, which is a more effective strategy for sustained long-term human development. Finally, Global Library Services Network is a large-scale commercial application of Greenstone aimed at the educational and health sectors, particularly in remote regions.

Virtually every new collection involves its own idiosyncratic requirements. Consequently, those building digital libraries need constant access to advice and assistance from others in order to continue to learn how to tailor the software to meet ever-changing requirements. There is a lively email discussion group for assistance with Greenstone; the group is made up of participants from over 40 different countries. The software itself is being downloaded an average of 1,500 times per month.

Truly, the world of practical digital libraries is burgeoning. The time has come to stop talking about digital libraries and get on with building them!

Acknowledgements

The Greenstone Digital Library software has grown out of the stimulating research environment of the New Zealand Digital Library project, and I would like to acknowledge the profound influence of all project members.

Notes

[1] Greenstone Digital Library Software. Available from <http://www.greenstone.org>.

[2] New Zealand Digital Library project. Available at <http://nzdl.org>.

References

[Witten-2001] Witten, I.H., Bainbridge, D. and Boddie, S. (2001). "Power to the people: end-user building of digital library collections." Proc ACM Digital Libraries, Roanoke, VA.

[Witten et al., 2002] Witten, I.H., Loots, M., Trujillo, N.F. and Bainbridge, D. (2002). "The promise of digital libraries in developing countries," The Electronic Library, Vol. 20, No. 1, pp. 7-13.

[Witten and Bainbridge, 2003] Witten, I.H. and Bainbridge, D. (2003). How to Build a Digital Library. Morgan Kaufmann, San Francisco, CA.

Copyright © Ian H. Witten
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | In Brief
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/march2003-witten