Harvard Library APIs & Datasets

Harvard Library provides open access to our metadata through APIs, including LibraryCloud and the Caselaw Access Project.


Are you a developer looking to build better experiences for library users? Are you a data scientist studying library information architecture? Are you interested in text mining Harvard Library's records to look for trends and insights related to your field of study? 

Harvard Library is among the world's largest academic libraries. The data behind our collections has the power to tell compelling stories and open our eyes to new ways of doing things — making the knowledge we preserve for the world accessible in new and exciting ways.  

That's why we provide open access to our metadata through our APIs. 

Available APIs & Datasets


Harvard LibraryCloud is a metadata hub that provides granular, open access to a large aggregation of Harvard library bibliographic metadata.

The public LibraryCloud Item API supports searching LibraryCloud and obtaining results in a normalized MODS or Dublin Core format.

LibraryCloud contains records from Harvard's Alma instance (over 12.7M bib records), SharedShelf (4M image records), and ArchivesSpace finding aids (2M finding aid components). Alma metadata has additionally been enriched with the Stackscore usage metric, as well as holdings, and LC classification subject headings.

LibraryCloud also contains an alpha release of a Collections API, that is planned for use as a digital collection definition and export service. The Collection API allows a group of LibraryCloud records to be labeled as part of a named collection. The collection may then be harvested through OAI-PMH in order to import metadata into online digital exhibit platforms, such as Spotlight or DPLA. The full build out of the collection API and a collection builder web application is still a work in progress.


Harvard Library Bibliographic Metadata

The Harvard Library Bibliographic Metadata collection is an open access data set that provides a snapshot of HOLLIS bibliographic records and holdings records. These are available as a bulk download via Harvard Dataverse.

Generated from metadata in Harvard's Alma instance, this collection contains all active (i.e., not suppressed or deleted) bibliographic records that have one or more active holdings in Alma, the library’s information management system. Due to size limitations, the over 12.7 million bibliographic records are split across multiple files. Each file contains approximately 200,000 bibliographic records, as well as their associated holdings records, in MARC XML format. 

Additional information about the contents of the data set is available in an informational datasheet posted along with the data in Dataverse.

View the Metadata Collection

Caselaw Access Project

The Caselaw Access Project (“CAP”) expands public access to U.S. law. Our goal is to make all published U.S. court decisions freely available to the public online, in a consistent format, digitized from the collection of the Harvard Law Library.

CAP includes all state courts, federal courts, and territorial courts for American Samoa, Dakota Territory, Guam, Native American Courts, Navajo Nation, and the Northern Mariana Islands. The earliest case is from 1658, and the most recent cases are from 2018.

CAP includes a robust set of tools which facilitate access to the cases and associated metadata. We currently offer five ways to access the data: APIbulk downloadssearchbrowse, and a historical trends viewer.

View CAP Tools

Presto Data Lookup

The Presto Data Lookup service is a RESTful web API that offers programmatic access to data in the library's central online systems.

The Data Lookup API uses a simple URL request syntax and returns results in XML or JSON format.

Note that some of the resources available in this service must be accessed from a pre-registered IP address.  Write to the Presto support team to request access. Please include the IP address that needs access, and planned usage. 

View API

Harvard Library Policy On Open Metadata

The Harvard Library provides open access to library metadata, subject to legal and privacy factors. In particular, the Library makes available its own catalog metadata under appropriate broad use licenses. The Library Board is responsible for interpreting this policy, resolving disputes concerning its interpretation and application, and modifying it as necessary.

This policy applies to all metadata that the library holds. For instance, the metadata from the DASH repository is also distributed under an open license.

Some metadata may have been placed under contractual obligations preventing distribution prior to the establishment of this policy. In such cases, of course, the library cannot legally, and will not, distribute the metadata beyond what such agreements allow.

Metadata that involves the usage of library materials by individual patrons will not be distributed without sufficient anonymization or aggregation to provide reasonable protection against the reconstruction of individual patron usage.

Because each metadata set may have individual legal and privacy characteristics, appropriate licenses are designed on an individual dataset basis. However, the goal is to make these licenses as broad as possible.