Introduction
I have rewritten two websites, with the original owners' permission, that were only available on the Internet Archive's Wayback Machine.
One was the HMS Gambia Associations website which was active from 2003, but disappeared in 2014. It was on the Internet Archive and now has a new site.
The other was the Bristol Gunners website which was active in 2008, and again from 2013 to 2016, Internet Archive and now also has a new site.
This page was written to show some of the things I learned while using the Internet Archive's Wayback Machine.
Finding the Pages
The Internet Archive has several Application Programming Interfacea (APIs) available, such as CDX. I found it just as easy to go the Internet Archive or Internet Archive's Wayback Machine home pages and either look for the site or related items from their search box.
If you already know which site you are looking for then you can use:
https://web.archive.org/web/*/[site-url]
For example:
https://web.archive.org/web/*/http://www.hmsgambia.com/
To find all the pages and resources of a site saved by the Internet Archive click on URLs menu item, which will give a paginated list.

Wayback Machine's menu
Alternatively, you can use:
https://web.archive.org/web/*/[site-url]/*
For example:
https://web.archive.org/web/*/http://www.thebristolgunners.webspace.virginmedia.com/*
Alternatively, you can use the CDX API to obtain the list by using:
https://web.archive.org/cdx/search/cdx?url=[site-url]/*
For example:
https://web.archive.org/cdx/search/cdx?url=http://www.thebristolgunners.webspace.virginmedia.com/*
Using the CDX API
You can use the Wayback Machine's CDX API to get all the URLs saved by the Wayback Machine. To use it use:
https://web.archive.org/cdx/search/cdx?url=[site-url]/*
For example:
https://web.archive.org/cdx/search/cdx?url=https://hmsgambia.org/*
The following fields are returned: urlkey, timestamp, original (URL), mimetype, statuscode, digest,and length.
The Internet Archive's Wayback CDX Server GitHub page details how you can filter or limit the amount fo data that is returned.
I wanted to get a list of subdomains used by a university in its early web presence. Using a plain CDX query such as https://web.archive.org/cdx/search/cdx?url=https://university.edu/* was too much for the browser and it stopped working because so many results were being returned.
Reading through the GitHub page and knowing I just wanted the early URLs and their timestamp, I changed the query to:
https://web.archive.org/cdx/search/cdx?url=*.university.edu&collapse=urlkey&to=2005&fl=timestamp,original
I could have added a &output=json parameter, but did not. The query returned a little under half a million lines, but did not choke the browser. I copied and pasted the results into Excel. Then used the Data > Text to Columns function to split it, then sorted the sheet by the URL. I then created a new column and used a formula to extract the subdomains. The formula I used was:
=MID(B2, FIND("//", B2, 1)+2,FIND(".", B2, 7)-8)
I highlighted the column with the formula and used the Home > Fill > Down function to fill the rest of the column. This worked very well but I still had almost half million rows in the spreadsheet.
In a new column , I used the formula =UNIQUE(C:C) to get the unique values from the calculated subdomain column.
This may seem a little complicated and there are probably methods to do it more easily, but this worked and gave me a list of the unique subdomains used to work with.
Downloading the Pages
As the sites I wanted to save the pages from were fairly small, what I did was visit each of them in the Wayback Machine and put if_ in the Wayback Machine url after the date to remove the Wayback Machine overlays. For example:
https://web.archive.org/web/20160315145149if_/http://www.thebristolgunners.webspace.virginmedia.com/
I then saved each page by right clicking on it and choosing "Save as.." > "Webpage, Complete" which saves the page and the resources used by it.
If the sites were larger or I was short of time then I would probably look around for one of the Wayback Machine downloaders.
URL Modifiers
The Wayback Machine's URL for a page can be modified by adding a code after the date. Depending on the page the codes will have different effects, but most will simple remove the Wayback Machine overlay, especially on older pages.
id_ will give you the raw page / image / javascript, etc, as originally archived without the overlay.
if_ this is a better option to use for viewing web pages with no overlay. It is meant for framed or iframed content pages.
im_ this is used for images.
cs_ this is used for css stylesheet files.
js_ this is used for javascript.
fw_ this is also meant for framed or iframed content. The overlay is removed.
oe_ this is for embedded content and works similar to if_ and fw_ , so no overlay.
mp_ this is for media content, and \similarly to oe_ / if_ / fw_
Browser Extensions
I use the Wayback Machine fairly often and have tried several Chrome browser extensions for it. My favourite is the official Wayback Machine one.
Wayback Machine - the offical extension. It can look for the oldest and newest versions of the saved page and allows you to save particular pages.
Sources and Resources
5 basic techniques for automating investigations using the Wayback Machine - Medium
How to Recover your Content from Wayback Machine - InMotion Hosting
Internet Archive
Internet Archive Developers
Is there a way to disable the top bar? - Reddit
The Ultimate Wayback Machine Cheat Sheet for OSINT, Cybersecurity, and Archival Research - LinkedIn
Wayback CDX Server - GitHub
Internet Archive's Wayback Machine
Wayback Machine - Wikipedia