Search D-Lib:
D-Lib-blocks5
The Magazine of Digital Library Research

D-Lib Magazine

July/August 2016
Volume 22, Number 7/8
Table of Contents

 

Deploying Islandora as a Digital Repository Platform: a Multifaceted Experience at the University of Denver Libraries

Shea-Tinn Yeh, Fernando Reyes, Jeff Rynhart, Philip Bain
University of Denver
{sheila.yeh, fernando.reyes, jeff.rynhart, philip.bain}@du.edu

DOI: 10.1045/july2016-yeh

 

Printer-friendly Version

 

Abstract

The purpose of a highly integrated software framework such as Islandora is to satisfy as many workflows as possible in a single ecosystem for a digital repository. The Library Technology Department at the University of Denver was tasked with implementing an Islandora open-source framework for its Special Collections Department because the current host was being retired. Although Islandora's front-end is tailored for librarians, its back-end is complex, and built upon many subsystems. A failure in any of the subsystems guarantees a domino effect and a chain reaction which can obfuscate the root cause of the issue. Though product documentation and support communication channels exist, many of the problems we faced were unique to our specific hardware and software configuration. The development team had to learn fast, and be innovative, agile, and systematic in order to work with such a complicated system. This article describes the tactics used in this repository development effort, as well as the library's stakeholder relationship management. We believe our experience will be illustrative for administrators, librarians and developers, and help them better understand the many facets of an in-house, open-source digital repository development project.

Keywords: Digital Repository, Islandora, Open-source Software, Systems Development

 

1 Introduction

As a member of the Colorado Alliance of Research Libraries (The Alliance), the University of Denver (DU) participated in the Alliance Digital Repository (ADR) collaborative from October 2006 to August 2015. [1] The repository was originally built upon Flexible Extensible Digital Object Repository Architecture (Fedora) and Fez open-source software (OSS). Due to that platform's inability to scale and the Fez community's limited resources to expand functionalities, ADR was migrated to Islandora in June 2012. Islandora is an OSS based on Fedora, the content-management framework Drupal, and an array of small applications. For nearly three years, the DU's digital repository was hosted on Islandora 6 by the Alliance, an early Islandora adopter. However, due to various organizational and budgetary constraints, it was decided that the ADR collaborative had to be dissolved, forcing the members to re-evaluate their repository alternatives, and then migrate content by the end of August 2015, with only five months in which to complete the task.

At the time, digital repository platform options were abundant in the market including pure open source choices such as DSpace, Hydra, and Islandora, as well as proprietary solutions such as Digital Commons and Artstor. The immediate questions were: (1) which is the best option for the University's special collections' new home? and (2) how do we make a rational decision given such a tight time constraint? In addition to attending product demos, the stakeholders utilized the decision making 'matrix analysis' technique to make the final decision. The matrix is built on a spreadsheet by first identifying the key factors (e.g. cost, preservation readiness, user interface) and assigning a score to each factor. Through a simple calculation, the choices are presented in a weighted manner (Figure 1). In the end, the library selected Islandora as the platform and successfully migrated more than 38,000 objects in 70 plus special collections from ADR to an in-house Islandora 7.x-1.5 platform within the given five months.

Although Islandora is touted as a universal solution with many functions to support digital asset management, preservation and presentation, a significant number of these functions do not work as advertised "out of the box." This article is partially motivated by a desire to correct misguided assumptions and offer technical advice for development tactics and stakeholder management. The article is organized as: Section 2 describes Islandora's architecture and Section 3 describes our experience and lessons learned.

yeh-fig1-rev

Figure 1: University of Denver Libraries Decision Matrix

 

2 Islandora Architecture

Islandora's ecosystem is comprised of three major components: Drupal, Islandora, and Fedora. Drupal, as the front-end of the system, offers an array of collaborative tools and applications for presentation. Fedora, at the back-end of the system, provides a data store within which digital assets are managed. [2] Islandora, the "glue" which holds the system together, facilitates communication and messaging services between the other parts of this ecosystem. [3] The core design principle is the separation of data from presentation in order to leverage the strength of individual components and make extensive use of other OSS tools. The most commonly integrated components are Djatoka, an open-source JPEG image viewer, and ffmpeg, a video conversion utility that allows for playback of different video file types. To communicate with Fedora, Islandora uses the Representational State Transfer (REST) architecture to retrieve content from the databases in Fedora. REST is a commonly used methodology for computer networking to send or request data between systems. To manipulate and prepare the data for presentation in Drupal, Islandora makes use of Drupal's Hook system. [4] Hooks are essentially code snippets providing functionality for content models that contain information about data objects. Figure 2 illustrates the Islandora architecture.

yeh-fig2

Figure 2: Islandora Architecture as Adapted from Mark Leggott's Islandora Blog Post

 

3 Experience and Lessons Learned

Although a complete Islandora ecosystem can be downloaded with one click, the "out of the box" system was not ready to deliver our content. The following is a summary of our experience in setting up a functional system from the ground up in the following categories: documentation, server environment, module implementation, user interface, results display, permissions control, preservation, and stakeholder management.

 

3.1 Documentation

A well documented system development project is more likely to result in a system that meets the expectations of its intended users and software developers. [5] Documentation for Islandora's deployment is available online. Some questions and answers are reported on a Google Groups discussion page where members of the community help each other, on Wikipedia, and on GitHub pages. In general, our experience is that the documentation is only adequate when everything goes right. Most often, the issues reported and discussions that follow are specific to the hardware and software environment of a particular institution, thus the solutions are not always applicable. In addition, when problems arise, we need to search or peruse all discussion portals to find solutions. We believe a single-site knowledge database would be beneficial for Islandora users.

 

3.2 Server Environment

The complex architecture of an Islandora ecosystem, along with its high and often unpredictable use of system resources, can cause servers to become unstable (e.g. system crashes or incomplete indexing), especially during rigorous testing, experimenting, and troubleshooting. That's one reason it is important to set up a development environment on a separate server for pre-production activities. For our production system, we utilized a dual server setup; one for Drupal and Islandora, and the other for supporting components and Fedora. However, we still experienced very high memory use. At one point during the migration process, more memory had to be installed on the server to ease the increasing demand. The time and money spent on additional memory was neither efficient nor ideal. In hindsight, we believe our memory resource issues could have been better managed by enabling sufficient Swap (system memory reserve) resources to accommodate the large memory needs during the data ingestion process, and having open source documentation. Figure 3 illustrates the server environment at DU.

yeh-fig3

Figure 3: University of Denver Islandora Digital Repository Server Environment

 

3.3 Module Implementation

Locating 'modules', pieces of code that perform small but specific tasks, that are compatible with our current Islandora version was more difficult than anticipated. For the most part this was because modules are not located in a centralized repository, and if one is not meticulous, a poor choice can result in hard-to-locate problems. A developer's attempt to install an incompatible module often leads to confusion, frustration, and downtime. Additionally, some modules need to be installed or configured by one method while others have an alternative procedure. Overall, the diversity and inconsistency between modules and methods are a significant challenge during development and maintenance. We have learned that this process can be streamlined by employing popular 'package managers,' such as Composer for PHP where modules are hosted in a single repository and installed using simple console and consistent commands.

 

3.4 User Interface

The Islandora default theme can be customized with HTML and CSS, and it is a straightforward process. Theme development is performed within the Drupal system. Themes can be customized by anyone with HTML and CSS experience, and Drupal offers plenty of opportunities to be creative with the user interface. We consider the use of Drupal in the Islandora architecture an advantageous design in terms of leveraging existing Web user interface development expertise.

 

3.5 Metadata Object Description Schema (MODS) Results Display

After installing the Islandora Solr Metadata module, we realized that no MODS fields were available for results displayed, because contrary to what had been stated in the documentation, none exist in the Solr search platform indexes. We traced this problem to the XSLT that was available to transform metadata upon ingestion for indexing. The XSLT did not contain correct transformation for the MODS schema. In order to build a MODS results display module as requested by the stakeholders, we designed and implemented a set of recursive transformation that extracts the MODS data to be indexed. This transformation process can be customized to accommodate any other schemas, and thus greatly expands the flexibility of the Islandora metadata display functionality. We plan to contribute our work to the GitHub repository for the community. Figure 4 and Figure 5 illustrate how our work has enhanced metadata elements displayed from the DC to MODS module. Figure 6 offers XSLT code snippets.

yeh-fig4

Figure 4: A Result Display in Dublin Core Module

yeh-fig5

Figure 5: A Result Display in MODS Module

yeh-fig6

Figure 6: XSLT Code Snippets

 

3.6 Islandora Permissions

Permissions control access to specific functions of the system. Setting up the correct permissions in Islandora requires thorough planning as all user permission settings appear on one long screen, making it easy to miss a task when the settings seem endless. In addition, groups of users have their own permission settings, so creating multiple user groups (a common and recommended practice) further increases the complexity of the permissions settings page. The problems we encountered were frequently corrected by updating the permissions configuration. We recommend that when an issue occurs it is vital to make verifying permission status the first action taken. For example, if a viewer shows no content it may be due to a permission failure for the displaying module. We have since trained system users to understand permissions and the roles they play in their workflows. We also instructed them on how best to inspect and adjust permission settings. This is an easy task which empowers users and builds trust between development teams and stakeholders.

 

3.7 Preservation with DuraCloud

At the time of migration we had a large amount of data (~5TB) and expected our need for storage space to double in the near term. This presented a challenge for robust backup and data integrity validation. While the University Technology Services Department provides us with weekly snapshots and monthly backups, we now rely upon DuraCloud for data preservation. DuraCloud, developed by DuraSpace, provides a subscription-based cloud storage service that ensures a copy of the Library's content is always available, with ready access. Their system allows us to customize how backups are created and maintained and to manage the network connection. Our data repository can be continuously monitored down to the folder level, and we can run an immediate backup at any time. The most important feature for us is the Bit Integrity Checker, enabling data to be continuously checked by the system to ensure that there has been no 'silent data corruption.' If any data, even a single bit in a file becomes corrupted, the system will inform the administrators and take preventative actions. This has an advantage over a regular enterprise backup mechanism which may lack bit-level, dynamic integrity checking, where in the event of corrupted data, the enterprise system continues to blindly back up the corrupted data. DuraCloud also provides a Spaces feature which allows for separate locations to store data from multiple servers, or to create backups for multiple applications on a single server. This prevents folders from accidentally merging (overwriting data from each other). All of these features contribute to our confidence that in the event of a server failure in the local data center, we will be able to quickly transfer any missing files from DuraCloud directly back to our systems. Furthermore, we can be sure that the restored data is accurate.

 

3.8 Stakeholder Management

Developing and deploying an information system is a complex activity. The complexity is known to be magnified by continuous changes in stakeholder requirements. If the increasing complexity is not managed appropriately, the system can fail. [6] We recognized that stakeholders for this project include both internal library users (archives and special collections units) and intra-campus users (art, marketing, and athletic departments). The collaborators are the ADR and the University's Technology Services. Among all the necessary project management essentials such as planning, procurement, risk management, change management, stakeholder management and communication, we consider stakeholder management as the most important element for this complex project.

Our stakeholders were informed of progress on a regular basis and were involved with system and user interface testing. They were encouraged to log the bugs discovered into a shared directory where the developers prioritized, fixed, and noted the status. These notes also helped clarify confusion and served as discussion evidence to everyone's benefit, in the event a change requirement was arbitrarily requested. Due to a limited development timeframe, we targeted functional bugs and 'must have' features before addressing cosmetic bugs and enhancement requests. By utilizing Drupal's content management capabilities, we empowered our stakeholders to perform many non-critical tasks directly in the Drupal front-end.

 

4 Conclusions

This article is intended to provide readers with helpful information about our development experience working with a complex open-source repository system, and to share our insights which may not otherwise be known. The Library's Special Collections @ DU has been live on Islandora 7 since August 2015. Overall, the development team, users and stakeholders are satisfied that the project achieved its goals. The development team has completed all bug fixes and enhancement requests. When it becomes available, a migration to the new Fedora4/Islandora version is planned, to take advantage of the expanded performance and functionality of the major update to Fedora. Finally, we would urge institutions to involve employee staff developers with Java, PHP, HTML, CSS, XSLT, systems administration, and project management capabilities when undertaking an in-house digital repository project. We found that many skills are needed in order to ensure that such a challenging undertaking is a success.

 

References

[1] Colorado Alliance of Research Libraries.
[2] Leggott, M. Islandora: a Drupal/Fedora Repository System. http://hdl.handle.net/1853/28495
[3] Owens, Trevor. Islandora's Open Source Ecosystem and Digital Preservation. An interview with Mark Leggott.
[4] What's new with Islandora 7? An Interview with Jonathan Green.
[5] Nasution, M. F. F., Weistroffer, H. R., Documentation in Systems Development: A Significant Criterion for Project Success. HICSS '09 Proceedings of the 42nd Hawaii International Conference on System Sciences. 2009. pp 1-9.
[6] Benbya, H., and McKelvey, B. (2006). Toward a complexity theory of information systems development. Information Technology & People, 19(1), 12-34. http://doi.org/10.1108/09593840610649952
 

About the Authors

Shea-Tinn (Sheila) Yeh is a Ph.D. candidate in Computer Science and Information Systems at the Business School, University of Colorado Denver. She has also earned an M.L.S. degree in Library and Information Science from the University of Maryland and an M.S.E. degree in Industrial and Human Computer Engineering from the Wright State University. Sheila is currently an assistant professor and the Digital Infrastructure and Technology Coordinator at the University of Denver Libraries. She leads the Library's innovative and service-oriented information technology department.

 

Fernando Reyes is a Senior Systems Developer at the University of Denver Libraries. He holds a Master's degree in Computer Information Systems from the University of Denver and is an adjunct faculty for the University of Denver University College. He specializes in web application development using a broad spectrum of technologies. He leads the library's application development efforts.

 

Jeff Rynhart is a Systems Programmer II at the University of Denver Libraries and is currently pursuing a Bachelor's degree in Information Technology at the University of Denver. He began his programming career as a C++ developer. He also has many years of experience working as a full-stack web application developer and plans to pursue a postgraduate degree related to robotics or remote automation systems.

 

Philip Bain has a M.A. degree in Digital Media Studies from the Emergent Digital Practices program at the University of Denver. He is currently a Systems Programmer I at the University of Denver Libraries. Phil designs and builds web applications for faculty and staff in support of their teaching and learning.