Are you a developer looking to build better experiences for library users? Are you a data scientist studying library information architecture? Are you interested in text mining Harvard Library's records to look for trends and insights related to your field of study?
Harvard Library is among the world's largest academic libraries. The data behind our collections has the power to tell compelling stories and open our eyes to new ways of doing things — making the knowledge we preserve for the world accessible in new and exciting ways.
That's why we provides open access to our metadata through bibliographic datasets and APIs.
Available APIs & Datasets
Harvard LibraryCloud is a metadata hub that provides granular, open access to a large aggregation of Harvard library bibliographic metadata.
The public LibraryCloud Item API supports searching LibraryCloud and obtaining results in a normalized MODS or Dublin Core format.
LibraryCloud contains records from Harvard's Alma instance (over 12.7M bib records), SharedShelf (4M image records), and ArchivesSpace finding aids (2M finding aid components). Alma metadata has additionally been enriched with the Stackscore usage metric, as well as holdings, and LC classification subject headings.
LibraryCloud also contains an alpha release of a Collections API, that is planned for use as a digital collection definition and export service. The Collection API allows a group of LibraryCloud records to be labeled as part of a named collection. The collection may then be harvested through OAI-PMH in order to import metadata into online digital exhibit platforms, such as Spotlight or DPLA. The full build out of the collection API and a collection builder web application is still a work in progress.
Caselaw Access Project
The Caselaw Access Project (“CAP”) expands public access to U.S. law. Our goal is to make all published U.S. court decisions freely available to the public online, in a consistent format, digitized from the collection of the Harvard Law Library.
CAP includes all state courts, federal courts, and territorial courts for American Samoa, Dakota Territory, Guam, Native American Courts, Navajo Nation, and the Northern Mariana Islands. The earliest case is from 1658, and the most recent cases are from 2018.
CAP includes a robust set of tools which facilitate access to the cases and associated metadata. We currently offer five ways to access the data: API, bulk downloads, search, browse, and a historical trends viewer.
Presto Data Lookup
The Presto Data Lookup service is a RESTful web API that offers programmatic access to data in the library's central online systems.
The Data Lookup API uses a simple URL request syntax and returns results in XML or JSON format.
Note that some of the resources available in this service must be accessed from a pre-registered IP address. Write to the Presto support team to request access. Please include the IP address that needs access, and planned usage.
This dataset contains over 12 million bibliographic records for materials held by the Harvard Library, including books, journals, electronic resources, manuscripts, archival materials, scores, audio, video and other materials.
The metadata has been created, acquired and modified over decades, and represents a range of cataloging rules and practices. The records have not been altered or quality-checked during the export process and are offered as is.
We suggest the following language to provide proper attribution when using this dataset:
This [title of report or article or dataset] contains information from the Bibliographic Dataset, which is provided by the Harvard Library under its Bibliographic Dataset Use Terms and includes data made available by, among others, OCLC Online Computer Library Center, Inc. and the Library of Congress.
Bibliographic Dataset Use Terms
Pursuant to its Open Metadata Policy, the Harvard Library makes this set of bibliographic records and the metadata contained therein (together, the “Metadata”) available for public use under the CC0 1.0 Public Domain Designation
Although Harvard does not impose any legally binding conditions on access to the Metadata, Harvard requests that you act in accordance with the following Community Norms of the Harvard Library with respect to the Metadata:
- Harvard requests that the Harvard Library and OCLC Online Computer Library Center, Inc. (“OCLC”) and the Library of Congress be given attribution as a source of the Metadata, to the extent it is technologically feasible to do so.
- Harvard requests that you make the Metadata and any improvements thereto freely available on the same terms as Harvard has done, i.e., without claiming any legal right in, or imposing any legally binding conditions on access to, the Metadata or your improvements, and with a request to act in accordance with these Community Norms.
- With respect to Metadata consisting of or contained in records Harvard has obtained from the OCLC WorldCat database, Harvard requests that you respect and act in accordance with the community norms set forth in the WorldCat Rights and Responsibilities for the OCLC Cooperative. Use of metadata from the WorldCat database for study and research is consistent with those norms, but if you plan to use such Metadata for other purposes, whether or not you are an OCLC member, we ask that you review and comply with those norms.
Harvard Library Policy On Open Metadata
The Harvard Library provides open access to library metadata, subject to legal and privacy factors. In particular, the Library makes available its own catalog metadata under appropriate broad use licenses. The Library Board is responsible for interpreting this policy, resolving disputes concerning its interpretation and application, and modifying it as necessary.
This policy applies to all metadata that the library holds. For instance, the metadata from the DASH repository is also distributed under an open license.
Some metadata may have been placed under contractual obligations preventing distribution prior to the establishment of this policy. In such cases, of course, the library cannot legally, and will not, distribute the metadata beyond what such agreements allow.
Metadata that involves the usage of library materials by individual patrons will not be distributed without sufficient anonymization or aggregation to provide reasonable protection against the reconstruction of individual patron usage.
Because each metadata set may have individual legal and privacy characteristics, appropriate licenses are designed on an individual dataset basis. However, the goal is to make these licenses as broad as possible.