Appendix A. Harvesting metadata with OAI-PMH
Note: This is a pre-edited version of a previously published article, Eric Lease Morgan "What is the Open Archives Initiative?" interChange: Newsletter of the International SGML/XML User's Group 8(2):June 2002, pgs. 18-22.
The article describes the intent of the Open Archives Initiative and illustrates a way to implement ver- sion 1.1 of the protocol. As of this writing, the protocol has been renamed to the Open Archives Initiat- ive-Protocol for Metadata Harvesting, and it is now at version 2.0. Don't let this dissuade you from read- ing this section. The majority of it is still quite valid.
What is the Open Archives Initiative?
In a sentence, the Open Archives Initiative (OAI) is a protocol built on top of HTTP designed to distrib- ute, gather, and federate meta data. The protocol is expressed in XML. This article describes the prob- lems the OAI is trying to address and outlines how the OAI system is intended to work. By the end of the article you will be more educated about the OAI and hopefully become inspired to implement your own OAI repository or even become a service provider. The conical home page for the Open Archives Initiative is http://www.openarchives.org/ [http://www.openarchives.org/] .
Simply stated, the problem is, "How do I identify and locate the information I need?"
We all seem to be drinking from the proverbial fire hose and suffering from at least a little bit of inform- ation overload. Using Internet search engines to find the information we need and desire literally return thousands of hits. Items in these search results are often times inadequately described making the selec- tion of particular returned items a hit or miss proposition. Bibliographic databases -- indexes of schol- arly, formally published journal and magazine literature -- overwhelm the user with too many input op- tions and rarely return the full-text of identified articles. Instead, these databases leave the user with a citation requiring a trip to the library where they will have to navigate a physically large space and hope the article is on the shelf.
From a content provider's point of view, the problem can be stated conversely, "How do I make people aware of the data and information I disseminate?"
There are many people, content providers, who have information to share to people who really need it.