Just about every website needs Google. The idea is that Google will index each page on the internet, so that when people are looking for products or services like yours, they should find your site in Google’s search results. But the reality is rarely this simple. Some pages never actually get indexed by Google, making them impossible to find in search results.
That’s not to say that entire websites are often missed – generally, only a few pages on a website will be indexed, while the others could be waiting weeks before being crawled. There are a number of reasons for this happening, some of which relate to general Search Engine Optimization (SEO). For instance, poor quality content may cause your site to rank lower, as well as suffer from indexing issues.
For the most part though, many websites have a lot of content that should be indexed, but hasn’t been. We’ve explored a few of the most common causes for this, as well as ways to prevent your pages from being missed by Googlebot below.
If you’re not sure what percentage of your website is being indexed currently, you can easily check this by going to the Index Coverage report in Google Search Console. Here, you’ll see the number of excluded pages, and you can try to work out if there is any correlation between these pages. Essentially, see if there is a reason these particular pages haven’t been indexed.
A good example of unindexed pages are product pages for a large retail website – while you want as many of these as possible to appear in search results, you wouldn’t expect to find every single one there. Pages for out of stock products, or duplicate products pages, for instance, might not be considered high enough quality for Google to index them.
Three of the main issues Google reports when Googlebot doesn’t index pages include duplicate content issues, ‘Crawled – currently not indexed’, and ‘Discovered – currently not indexed’. The first two issues tend to be relatively easy to fix, but it can be a bit more tricky to rectify pages that are classified as ‘Discovered – currently not indexed’.
There are a few reasons you might have duplicate content on your website. Perhaps the most common reason is that you’re using content that has been provided by the manufacturer of a product when describing it. It’s likely that your competitors will be doing the same thing, so this will be flagged as duplicate content.
Another reason for duplicate content is that you have particular pages on your site that are targeted at different countries. For example, you might have different versions of a page that’s tailored to a UK, US, Canadian and Australian audience.
If this is the result you’re seeing, it means Googlebot visited your website, but failed to index the page. While it’s impossible to be certain as to the reason for this, in most cases it’s due to poor quality content. And considering the amount of content that can be found on the web, it should come as no surprise that Google is becoming increasingly picky about which pages they index.
The best way to avoid this from happening is to make sure that all the content on your site is valuable, as well as unique. This includes titles and page descriptions too. So do your best to use keywords in your copy, and make the page interesting enough for people to want to read it! There’s no point posting content just for the sake of it.
As mentioned above, this is arguably the hardest issue to deal with. This is because you can’t be sure whether you have a problem with your content, which isn’t considered to be of a high enough quality, or due to the crawl budget. If there are too many URLs in the crawling queue, which is more common with larger sites, they might not have been indexed yet, but are due to be eventually.
The only way to address ‘Discovered – currently not indexed’ pages is to try and cover all your bases, by improving your content wherever possible, and by blocking Google from crawling less valuable pages, to optimise your crawl budget. You can do this by using the noindex tag or the robots.txt file.
Although there’s no guarantee that you’ll be able to get all of your pages into Google’s index, there are a few best practices you can use to increase your chances. We’ve discussed these in more detail below:
There’s a good chance that you’re doing this anyway for SEO purposes, but it’s always a good idea to remind yourself to include internal links in your copy. That way, you’re signalling to Google that the page you’re pointing to is an important one, and it should be indexed.
Optimise the Right Pages
The key thing with indexing is to get your most valuable pages into Google search results. So make sure you’re fully optimising your main pages, such as the homepage, contact page, and your primary sales pages.
As your website grows and evolves, you’re bound to have some pages that become obsolete. It can be sensible to periodically review your site, to ensure that you’re getting rid of these pages, or at least marking them as non-indexable.
Avoid ‘Soft 404’ Signals
This is something that is easily overlooked – if you include phrases like ‘not available’ or ‘not found’ in the copy of your page, this might suggest a 404 status to Google. Having the number ‘404’ in the page URL might also cause an issue.
Even if you do use all these strategies though, it’s important to remember that Google has a finite number of resources, whereas the available online content is nearly limitless. So not every page will ever be indexed! As long as your main pages have been indexed by Google, and are appearing in search results, you don’t need to worry too much.