Google Search Console index page
Introduction
I use the free version of Google's Programmable Search Engine on my sites. In November 2023, I started receiving more notifcations than usual from Google's Search Console (GSC) saying pages and videos could not be indexed from Google Search Console. I decided to take a look at what was happening.
It is important to remember that Google's Search Console just gives information about the site on Google Search. It is not a full analytics suite such as Google Analytics and that Google Google's Programmable Search Engine API is a separate product from the company.
My Site
Using the tools for file counting, I find that my sitemap.xml lists 992 HTML files. I have 4,106 HTML files in total, with 224 of them in a folder containing test pages, odds bits of code I've written and half-formed ideas, that I do not want indexed. This means there should be 3,882 HTML files that Google should be able to find.
Instead, Google has indexed 888 of them but knows about another 2,644 files. A total of 3,532 HTML files.
My site was started in May 1999, it's now November 2023 meaning my site is almost 25 years old. In that time, files and whole sections of the site have been created, renamed, moved or deleted but Google is still managing to send around 50 visitors a day to it. That's not bad for a hobby site. Some pages are ancient by internet standards and not very well written, HTML wise, which seems to be the cause of some pages not being indexed. Some of these older pages will need rewriting, so removing the errors that can be fixed may be a long process.
This page was written to help me understand how GSC works and what I could do to improve the number of pages it is indexing. Barry Hunter has written an article in the Search Console Help about which errors can be left for a while and which need to be fixed.
Sitemap
Before doing anything else, I checked the Console's sitemap section. It was obvious something was wrong with my sitemap.xml as no pages were being read from it!
Google Search Console sitemap section before and after fixing sitemap.xml
I opened the sitemap in the Chrome browser and it told me where the error was. It turned out it was a missing > character! It appears Google Search Console cannot read anything from the file if there is any error at all in it.
Error checking sitemap.xml
There are other sitemap.xml checkers around such as the ones from My Sitemap Generator and XML-Sitemaps.
To bring up a report of what Google thinks of your sitemap, go to the GSC and click on "Siteamps" in the left-hand menu. Click on the current sitemap that appears in the right-hand pane and then "See Page Indexing". Mine had quite a few errors that I will try and fix first before the other errors on the main screens.
Verification
Somewhere in the process Google Search Console will then give you a verification code that needs to be pasted into the DNS TXT record for the domain. There may be problems doing this as explained in Multiple TXT fields for same subdomain. Some registrars will allow separate entries with each entry enclosed in quotes, others will allow each entry on new lines, others need a new text record.
I found that for my DNS records that I needed to created a TXT record rather than trying to edit an existing one...
Adding a TXT entry to a DNS record
Once the entry has been made, Google Search Console will then verify that it has been done properly. One nice thing is that the sitemap used by them will be copied to the new property. The bad news is that the statistics will start afresh for the new property and will take a while to populate properly.
Google Search Console domain ownership verification
If there is an error verifying the domain, one of the other methods must be used as discussed above. There are other methods of verification as seen on the Verify your site ownership GSC help pages.
Erros After Adding SSL
I got a shock when I visited the statistics pages a little while after installing the SSL Certificates - both the Google search impressions and traffic being sent from them had dropped to next to zero!
The statistics in Google Search Console dropped to next to zero after installing the SSL Certificates
What was happening?
It happened because of the way I originally added the sites, I used the domain name starting with HTTP; after adding the SSL Certifcates the sites now use HTTPS and are not being recorded by the Console.
What do do about it?
A new property has to be created. Do this by clicking on the dropdown in the top left of the Google Search Console page and choose "Add property" from the dropdown.
The Google Search Console "Add property" dropdown
Recently Google Search Console added a choice of what happens next. You can add a URL prefix, for example HTTPS, or you can register the entire domain. As I do not use subdomains, I opted to register the entire domain for each of my sites. My choice just requires the main domain name without HTTP or HTTPS. In order to do this you must have access to the DNS records for the sites because those need to be edited.
The Google Search Console property types
Why pages aren’t indexed
Google seems to crawl sites using a mobile first bot. This means that some reported errors will only show up on a page when viewed on a smart phone or the screen width is reduced on a larger screen.
In the left-hand side of the GSC are a number of menu items under "Indexing" that give more details of why pages and videos are not being indexed. In that pane is column named "Source" If the entry in the column says If it says "Website" then there is a link somewhere on your website that needs to be fixed in some way. If it says "Google systems", then Google may have discovered it elsewhere, but it still may be something not quite right on your site.
Page menu with reasons pages are not being indexed
The resaons I commonly see for my pages not being indexed are:
Page with redirect
Blocked by robots.txt
Blocked due to access forbidden (403)
Blocked due to other 4xx issue
Crawled - currently not indexed
Discovered - currently not indexed
Duplicate without user-selected canonical
Indexed, though blocked by robots.txt
Not found (404)
Redirect Error
For videos, the reasons they are not being indexed are:
No thumbnail URL provided
Video outside the viewport
Clicking on any section in the left-hand pane will open it up and give a list of files in the section. Clicking on one of the files will open a panel giving information about that file. There will also be a link named "Inspect URL" which when clicked on will give more information, the most important of which, I find is the "Referring page". This lists the page(s) that Google found references to the page on. This is especially important if your page listed is not in your sitemap.xml file.
Sources and Resources
Google's Programmable Search Engine
Google's Search Central Documentation
Google's Search Console
Google's Search Console Help
Multiple TXT fields for same subdomain (Stack Exchange / Server Fault) - Adding TXT records to the DNS records
My Sitemap Generator - sitemap.xml checker
Verify your site ownership
XML-Sitemaps - sitemap.xml checker