Internet Archive
Wayback Machine

Introduction

I have rewritten two websites, with the original owners' permission, that were only available on the Internet Archive's Wayback Machine.

One was the HMS Gambia Associations website which was active from 2003, but disappeared in 2014. It was on the Internet Archive and now has a new site.

The other was the Bristol Gunners website which was active in 2008, and again from 2013 to 2016, Internet Archive and now also has a new site.

This page was written to show some of the things I learned while using the Internet Archive's Wayback Machine.


Finding the Pages

The Internet Archive has several Application Programming Interfacea (APIs) available, such as CDX. I found it just as easy to go the Internet Archive or Internet Archive's Wayback Machine home pages and either look for the site or related items from their search box.

If you already know which site you are looking for then you can use:

https://web.archive.org/web/*/[site-url]

For example:

https://web.archive.org/web/*/http://www.hmsgambia.com/

To find all the pages and resources of a site saved by the Internet Archive click on URLs menu item, which will give a paginated list.

Wayback Machine's menu

Wayback Machine's menu

Alternatively, you can use:

https://web.archive.org/web/*/[site-url]/*

For example:

https://web.archive.org/web/*/http://www.thebristolgunners.webspace.virginmedia.com/*

Alternatively, you can use the CDX API to obtain the list by using:

https://web.archive.org/cdx/search/cdx?url=[site-url]/*

For example:

https://web.archive.org/cdx/search/cdx?url=http://www.thebristolgunners.webspace.virginmedia.com/*


Downloading the Pages

As the sites I wanted to save the pages from were fairly small, what I did was visit each of them in the Wayback Machine and put if_ in the Wayback Machine url after the date to remove the Wayback Machine overlays. For example:

https://web.archive.org/web/20160315145149if_/http://www.thebristolgunners.webspace.virginmedia.com/

I then saved each page by right clicking on it and choosing "Save as.." > "Webpage, Complete" which saves the page and the resources used by it.

If the sites were larger or I was short of time then I would probably look around for one of the Wayback Machine downloaders.


URL Modifiers

The Wayback Machine's URL for a page can be modified by adding a code after the date. Depending on the page the codes will have different effects, but most will simple remove the Wayback Machine overlay, especially on older pages.

id_ will give you the raw page / image / javascript, etc, as originally archived without the overlay.

if_ this is a better option to use for viewing web pages with no overlay. It is meant for framed or iframed content pages.

im_ this is used for images.

cs_ this is used for css stylesheet files.

js_ this is used for javascript.

fw_ this is also meant for framed or iframed content. The overlay is removed.

oe_ this is for embedded content and works similar to if_ and fw_ , so no overlay.

mp_ this is for media content, and \similarly to oe_ / if_ / fw_


Browser Extensions

I use the Wayback Machine fairly often and have tried several Chrome browser extensions for it. My favourite is the official Wayback Machine one.

Wayback Machine - the offical extension. It can look for the oldest and newest versions of the saved page and allows you to save particular pages.


Sources and Resources

5 basic techniques for automating investigations using the Wayback Machine - Medium
How to Recover your Content from Wayback Machine - InMotion Hosting
Internet Archive
Internet Archive Developers
Is there a way to disable the top bar? - Reddit
The Ultimate Wayback Machine Cheat Sheet for OSINT, Cybersecurity, and Archival Research - LinkedIn
Internet Archive's Wayback Machine
Wayback Machine - Wikipedia