WAX History

Overview of history of WAX and web archiving at Harvard.

WAX History

On February 4, 2009, the public interface for Harvard's new Web Archive Collection Service (WAX) launched at WAX began as a pilot project in July 2006, funded by the University's Library Digital Initiative (LDI) to address the management of web sites by collection managers for long-term archiving. It was the first LDI project specifically oriented toward preserving "born-digital" material. WAX has now transitioned to a production system supported by the University Library's central infrastructure.

Collection managers, working in the online environment, must continue to acquire the content that they have always collected physically. With blogs supplanting diaries, e-mail supplanting traditional correspondence, and HTML materials supplanting many forms of print collateral, collection managers have grown increasingly concerned about potential gaps in the documentation of our cultural heritage.

WAX was developed as an initial – and only partial – response to these and other concerns, which range from technical feasibility to legal and financial implications. The pilot focused on harvesting content from the surface web: content that is discoverable to search engines through web crawlers, as opposed to content hidden from web crawlers in a database or restricted by password or login protection.

The WAX pilot was designed to address the capture, management, storage, and display of web sites for long-term archiving. It was a collaboration of the University Library's Office for Information Systems with three University partners, each fielding a single project: the Harvard University Archives (Harvard University Library); the Arthur and Elizabeth Schlesinger Library on the History of Women in America (Radcliffe Institute for Advanced Study); and the Edwin O. Reischauer Institute of Japanese Studies (Faculty of Arts and Sciences, with sponsorship from Harvard College Library).

The WAX system was built using several open source tools developed by the Internet Archive and other International Internet Preservation Consortium (IIPC) members. These IIPC tools include the Heritrix web crawler, the Wayback index and rendering tool and the NutchWAX index and search tool. WAX also uses Quartz, open source job scheduling software from OpenSymphony.

Partners and Projects

Arthur and Elizabeth Schlesinger Library on the History of Women in America, Radcliffe Institute for Advanced Study: Capturing Women's Voices on the Web

As part of its mission to capture the voices of women whose points of view might not be found elsewhere, as well as to document the use of blogs and other forms of web publishing by American women in the early 21st century, the Schlesinger Library has selected and archived a sample of approximately 20 blogs. These blogs illuminate the lives of African-American and Latina women, lesbians, and women grappling with health and reproductive issues, and typically reflect their engagement with politics, their personal lives and philosophies, and their work lives.

Edwin O. Reischauer Institute of Japanese Studies, Harvard University: Constitutional Revision in Japan Research Project/憲法改正論議に関する研究

The purpose of the Constitutional Revision in Japan Research Project is to document and preserve the move to revise the Japanese constitution and to understand its implications to Japan's politics, society, economy, and culture. The 57 websites in the archive serve as a stable source of born-digital information on the debate and on the activities of individuals and groups involved in constitutional revision. The array of archived websites include the following groups: political parties, politicians, governmental organizations, citizens' groups and non-governmental organizations, research institutes, labor groups, business groups, religious organizations, and scholars. These websites were selected as most active in debating constitutional revision, and in their selection the Project seeks to be as comprehensive as possible rather than represent majority views. In collecting these websites, the Project seeks to capture the engagement of civil society on the issue of constitutional revision which in essence is part of a larger discussion on the future of Japanese society.

Harvard University Archives, Harvard Library : A-Sites: Archived Harvard Web Sites

 A main mission of the Harvard University Archives is to identify, collect, and preserve the documentary heritage of Harvard University.  40 web sites have been selected to archive born-digital materials from Degree-Granting Departments and Committees of the Harvard University Faculty of Arts and Sciences.  The web archive will address gaps which exist in previously well-documented areas, such as university publications, biographical information, and the history of the Harvard curriculum.  The web archive will also enable the capture of new forms of Harvard communications such as blogs and online calendars.

See related links about WAX