Managing Search - How to remove pages from search engines

How to influence search engine indexing through Google Search Console and Meta-Tags.
SEO
Author

Allen Kamp

Published

November 15, 2021

Search Engines like Google & Bing provide free tools to allow a webmaster to monitor, submit and withdraw/remove pages from search listings. As Google is the largest volume of search queries and traffic, we will show how to use their tool, but Bing also has similar features and processes.

After running through these instructions. - You will have your web-property added to a Google Search Console.

We will not cover SEO topics or how to improve your search ranking.

gsc-01.png

Getting Started with Search Engine Console.

  • Sign up for Google Search Console.

  • Add the property that you are interested in monitoring. If you are not asked to add one when you first sign in, use the menu on top left allows you to search for registered and add new ones + Add Property.

  • Select Property Type. Initially, I recommend using URL prefix rather than the new Domain method, because it allows more granularity if you wish to grant/share access, and also offers more verification options.

  • Verify the property using the option that you prefer. From experience I have found that the dns option (adding a specified txt entry) is often the fastest, providing you have access to your domain dns settings.

Once you have been granted access, you can now explore the options in the left-hand menu. If this is the first time this domain has been added to the Search Engine Console, you may have to wait for 24 hours before the reports are populated. (There is also Python API to access some of this data. )

Sharing / Granting Access

Sadly, this is where we see Google product silos and poor integration with legacy systems in action. Mind the cobwebs, we will see the glympses of the old webmaster tools product here, that was replaced by search engine console in 2015. Google has a habit of replacing products without reaching feature parity, and then re-using the old bits “temporarily/forever” for the use-cases they forgot to cover before the team disbanded.

To grant access for a given property (site). So that other (specified) people can see this site in their search console tool. (You retain control to remove access). This is the best way to grant access for consultants.

  • Go to Settings near bottom of the left menu.

  • Click Users and Permission

  • Click the three vertical dots on the right of your owner listing. Select Manage Property Owners

  • Click Verification details of the property

  • Click Add an owner and enter the relevant gmail email account.

If you check the Verification Details you will see their permission details will be by delegation, and you have the option to Unverify them. They should now be able to see the property when then log into Google Search Console


Removing Urls

Why would I want to remove a url from search indexes? A Common reason is
that the page is for an older product, but Google is giving it too much emphasis in search results vs newer content. You aren’t ready to remove the pages from the site and you still want them accessible to existing users, but you want new people to discover the new product when they search. If the old product was popular, it may continue to outrank the newer one, possible even get listed in the SiteLinks section for the site’s search results. The only real option here is removal from the search index.

The Removals option in the left menu, will allow you to quickly remove index pages temporarily, or flag outdated content. Whilst this sounds useful. In many cases it is only a short term remedy as Google will re-index on the next crawl.

The long-term method is to add meta tags to the pages to give search engines a strong hint. Good search engines will obey, and the page will be removed in time (may take up to 14 days depending on the Search Engine refresh cycle.)

The two main meta tags are noindex and nofollow.

  • noindex - Do not show this page, media, or resource in search results

  • nofollow - Do not follow the links on this page. If you don’t specify this directive, Google may use the links on the page to discover those linked pages.

To prevent (well-behaved) search engine web crawlers from indexing a page on your site, add the tags into the <head>..</head> of the page/s you wish to remove from the index.

<meta name="robots" content="noindex">
<meta name="robots" content="nofollow">

or combine them

<meta name="robots" content="noindex,nofollow">

If you want to do this for all pages on a particular site, you could add this to the your page template.

Other Methods - meta tags in response header.

See Add meta tag in response header for more header-tag options -

HTTP/1.1 200 OK
(…)
X-Robots-Tag: noindex, nofollow
(…)

This may be an option if you have access to edit/add to the response header of your server. The advantage of this method is that is requires now modification to any pages, but be warned it will apply to all pages, (unless your server has way method to selectively apply).

I am not sure if this is an option with regular github.io pages, but if using a custom domain with gihub pages and cloudflare, you can try Custom Headers with Cloudflare workers

Other Methods - Robots.txt

Another method to control crawler access, without altering the pages, is to use a robots.txt file. But it is the wrong mechanism to use to removing existing indexed pages. As Google as says

A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, block indexing with noindex or password-protect the page.

Other Methods - Site Map

Sitemaps are great way allow search engine to discover all of your pages, and hint at the emphasis for each page you list. However they are not effective for removing pages from the search index. You can find out more about sitemaps here (What is a Sitemap?)[https://developers.google.com/search/docs/advanced/sitemaps/overview].


Reference: Remove a page hosted on your site from Google