Skip to Main Content

Federal Data Backup

Locate and preserve the federal data you need.

Web Archiving Resources

One of the best ways to preserve federal or institutional data is through web archiving, which strives to preserve an original website in its original format. There are a number of free or low cost tools available to accomplish this task.

The Internet Archive

  • The Internet Archive's tool for accessing archived websites is the Wayback Machine. Given a live or inaccessible URL, this tool will present a calendar showing every time the website was crawled. You can then view any of these crawls in your browser how they would have appeared at the time of the crawl.
  • You can request current websites be crawled and added to the Wayback Machine by using the "Save Page Now" text box on the site's homepage. All you need to do is paste in the URL of the site you would like archived and it will initiate a crawl.
  • You can also access the Wayback Machine or initiate new archive crawls using browser plugins or mobile apps.
  • The Internet Archive also has a subscription service called Archive-It that allows for more granular control over web archive crawls. While this service may not be ideally suited for individual users due to cost, the Lehigh Library and Technology Services has a subscription to this service. The Lehigh Libraries Special Collections actively backs up all Lehigh web pages as well as the pages of select community partners. If you believe that a Lehigh web page is not being preserved or have a site that you think should be added to Lehigh's web archive collection, please reach out to inspc@lehigh.edu or 610-758-5337.

Webrecorder

  • Webrecorder started as an open source tool for crawling and saving web pages. A number of tools built on this open source software development are now available
  • Archive Webpage is a Chrome extension that allows a user to archive web data by simply browsing pages. All data captured using this tool is saved locally using interoperable standards (WARC & WACZ). This tool is also available as a standalone desktop application. 
  • Replay Webpage is a tool to facilitate viewing and interacting with archived web pages. It can be used within a browser or as a standalone desktop application. The WARC and WACZ files generated by the Archive Webpage tool or any other web archiving tool can be used as inputs for this service.
  • Browsertrix combines these open source tools into a "cloud-based web archiving platform." This is a subscription service available from $30/month to $120/month depending on the amount of storage, crawling time, and concurrent crawls.

Conifer

  • Conifer is another tool built on the open source Webrecorder tools. It is a hosted solution, but offers users 5GB of free storage.
  • This tool is maintained by Rhizome, a non-profit dedicated to preserving born-digital art and culture.
  • This tool is supported by The Andrew W. Mellon Foundation, the James S. and John L. Knight Foundation, Google and the Google Cultural Institute, the National Endowment for the Arts, and the New York State Council on the Arts