LTS

Overview: Web Archive Collection Service (WAX)

The Web Archive Collection Service (WAX) supports the collection of selected web content to ensure its long term preservation and accessibility for teaching and research. WAX began as a pilot service in 2006 and became a regular production service in February 2009. See WAX History for more information about the origins of this service.

What is WAX?

The WAX system lets a Harvard curator harvest one or more thematically related web sites into an archived collection. The curator uses a web-based administrative interface (called WAXI) to select, capture (harvest), organize, and describe the collection. The archived web collection is stored in the HL Digital Repository Service (DRS) and can be searched or browsed from the WAX Public Interface.

The WAX system was built by LTS using several open source tools developed by the Internet Archive and other International Internet Preservation Consortium (IIPC) members. These tools include the Heritrix web crawler (used to capture web sites for archiving), the Wayback index and rendering tool and the NutchWAX index and search tool. WAX also uses Quartz, open source job scheduling software from OpenSymphony.

WAX  harvests content from the surface web -- content that is discoverable to search engines or web crawlers, as opposed to content hidden from web crawlers in a database or restricted by password or login protection.

Who can use WAX?

Harvard libraries, museums and archives are eligible to use the WAX Service. Other Harvard organizational units and individual members of the Harvard community are eligible, when sponsored by a Harvard library, museum or archive.

What materials are eligible for WAX?

Web sites being considered for WAX archiving should consist of materials that have library-like qualities (materials with persistent value, intended to support research or teaching). WAX is not designed for short term use. WAX collections can belong to any academic discipline, subject domain, etc.

How to participate

The planning needed to create a WAX collection usually takes about 3 months. Both new participants and returning participants will need to prepare as noted below. Questions about WAX participation should be directed to the LTS.

Note that the scheduling of WAX projects is based on the availability of LTS resources. 

  • Assign a “curator” for the project. The curator should be a staff member from the library, archive, or museum that is sponsoring the WAX collection. The curator will be the main contact for LTS regarding the project and will take the lead on planning and setup.
  • Complete the Digital Project Inquiry Form. Use this form to describe the collection and project goals. A group of specialists working on digital services will evaluate the inquiry, determine the best approach, and respond to whoever is inquiring.

Fees

For current set up and maintenance fee rates, see Library Systems Fees and Assessments.