How to Fix Crawl Errors in Webmaster Tools

Since relaunching my website Martial Arts Videos two months ago, I have been faced with a difficult problem. The website gets little to no traffic from Google. Some days I am only getting one or two visits from Google; and traffic from Google rarely exceeds ten.

A quick search on Google shows that most of my recent articles are not indexed. Some of older articles are listed, however they seem to be listed on the second page of Google's search results. Annoyingly, the corresponding update from the Martial Arts Videos Facebook page is listed first.

Martial Arts Videos Search Engine Traffic

I have never used any black-hat techniques to promote the website (or any website for that matter) and Google confirmed that it did not have any penalties. This led me to believe that my low search engine rankings were being caused by too many low quality links. In particular, Martial Arts Videos was linked in the sidebar on a related website I own (MMAClips) that had hundreds of thousands of pages indexed in Google. The result was that I had over 600,000 incoming links from a low quality website.

I discussed this issue in-depth in my article “How To Stop Bad Incoming Links Hurting Your Search Engine Rankings“. As you may recall, I used the Google Disavow Tool to request that Google removes all links from MMAClips.

The result has been a little surprising. I had assumed that disavowing a full domain would remove all links at once. It doesn't. Instead, I have seen the number of incoming links from MMAClips slowly decrease every week. Currently, 418,133 incoming links remain.

Disavowed MMAClips

With hundreds of thousands of links still pointing at MartialArtsVideos.com, there is a chance that my ranking is still being affected by the number of incoming links. There is also a chance that it is being caused by something else.

How to Fix Crawl Errors

I initially developed Martial Arts Videos in the first half of 2012. The site used an automated script to publish YouTube videos about selected martial arts topics. When the website was relaunched in June 2013, I reduced the total number of posts from 10,300 to only 18.

As you would expect, I saw a large increase in 404 errors as a result of this (also known as not found errors). When I checked Google Webmaster Tools last night, I had a total of 17,493 not found errors.

Martial Arts Videos Crawl Errors

Google allows you to download a list of your 404 error pages in CSV format or via Google Docs.

Download Crawl Errors CSV

I was surprised to see that only around 2,200 URL's were listed in the CSV file; despite there being around 17,500 listed on Google Webmaster Tools. Perhaps even stranger was that the list included 404 error codes and 418 error codes. The 418 error code apparently stands for “I'm a teapot”.

It is not clear if this reduced list of error URL's signifies that the list is incomplete, or if the total figure quoted in Webmaster Tools is incorrect.

View Your Crawl Error List

If you search the internet on how you should handle 404 error pages, you will see two conflicting pieces of advice. One group advise that you 301 redirect all of 404 error pages to your home page. For those of you who are not familiar with HTTP status codes, a 301 code means that a page or website has been permanently moved somewhere else. It is commonly used when your permalink structure is changed, however I have seen many websites redirect their 404 errors directly to their home page using a 301 redirect.

The other group advises the complete opposite. They state that your 404 not found error pages should remain as they are. Alternatively, you can use the status code 410 to advise that a page will be gone forever. Google claim that 404 errors do not hurt a website. Google also state that they handle 404 and 410 errors in the exact same way, therefore it seems pointless to set up lots of 410 errors for pages that are gone if you can simply let your website generate 404 errors automatically. Afterall, in the eyes of Google, they are one and the same.

If you’re getting rid of that content entirely and don’t have anything on your site that would fill the same user need, then the old URL should return a 404 or 410. Currently Google treats 410s (Gone) the same as 404s (Not found).

So what should you do: Should you redirect your 404 errors or just leave them alone and let the search engines do their thing? Most respected SEO authorities, and Google themselves, advise not redirecting 404 pages to your home page. Apparently, this can actually hurt your rankings as a not found error message could be passed onto your home page; however, with most large travel websites using this tactic, I am not sure this is actually true in practice.

If your pages still exist in another location, you should redirect the content to the correct URL. You can do this easily by adding a 301 redirection request to your .htaccess file. All you need to do is enter “Redirect 301”, followed by your old address and new address. For example:

Redirect 301 /article-about-water.html http://www.yourwebsite.com/water.html

A 404 error page seems to be the best option if the page has been deleted permanently. That is, after all, what the code was created for. I can see why some people would want to redirect their error pages to their home page, however if you have an informative 404 page that directs users to good content, it should not be a major issue. WordPress handles 404 errors natively, though it is prudent to check that Google is getting the right response from your error pages. A template that states 404 does not necessarily mean that it is sending the right status code back to Google.

You can check this easily using the “Fetch as Google” tool within the crawl section of Webmaster Tools. That will show you exactly what Google sees when it visits your URL and advises you the status it receives.

Fetch as Google

I have shown that, according to Google and respected SEO companies, you should simply leave your deleted pages as 404s and let Google work out everything themselves. I realise this is what you are supposed to do, however I am not 100% sure whether or not it is definitely the best thing to do in practice.

Bizarre Behaviour From Google

From what I understand, and admittedly, I am not an expert on this subject; Google will check a 404 URL again and again to see if the error has been corrected (i.e. the page content has returned or the URL has been 301'd to the correct place). So I initially thought that a 410 status code would be the best solution for me as my content was not going to return. Yet Google advises that they handle 404 and 410 status codes in the same way.

I therefore assumed that after removing around 10,000 posts, Google would report an increase in 404 errors and then I would see them disappear over the following weeks. That is not what has actually occurred. Google has actually been reporting more and more not found errors every single day. The screenshot that I published earlier in this article was taken last night and shows that I had 17,493 not found errors. Less than 12 hours later, Google increased the number of not found errors by 64.

Martial Arts Videos Crawl Errors

This increase in not found errors is baffling. The sitemap for Martial Arts Videos lists 73 URLS (i.e. 73 unique pages of content). The URLs that are being listed as not found are not listed on the website anywhere, nor are they linked anywhere. This makes it all the more confusing that Google is increasing the number of not found errors on the site.

If you click on any of the URLs that are listed as not found, you will see how Google found the link. It explains the date the URL was first detected and the last crawled date.

Not Found Details

None of the URLs are currently linked in my active sitemap.

Not Linked from Sitemap

What I find strange is that Google is finding links from pages that were removed more than six weeks before. One of the links is from a sitemap that does not even exist.

Linked From URLs

As I write this, Martial Arts Videos has:

  • 10 Server Errors – These are reported as 500 codes, which means refers to internal server errors.
  • 17,558 Not Found Errors – These are 404 error codes.
  • 1,834 Other Errors – All other errors had a 418 error code. As I mentioned before, this apparently means “I'm a teapot”!!

My aim is to remove all of these errors. Once they have been removed, I hope to see a jump in my search engine traffic.

On the information page for soft 404 errors, Google notes:

Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic. Firstly, it tells search engines that there’s a real page at that URL. As a result, that URL may be crawled and its content indexed. Because of the time Googlebot spends on non-existent pages, your unique URLs may not be discovered as quickly or visited as frequently and your site’s crawl coverage may be impacted (also, you probably don’t want your site to rank well for the search query [File not found]).

I have read a few comments from people on Google Groups that also believe the part denoted above in bold, is true. That is, that when Google's spiders spend too long trying to index non-existent pages, your crawl rate for the pages you do want indexed is affected.

Perhaps that is what is occurring with Martial Arts Videos. Could it be that Google is spending so much time trying to index pages that were removed from the website two months ago, that it is not indexing the pages that should be indexed? Who knows. What Google says and what Google are not always the same.

Putting that aside, how do I remove my old URLs from Google's index. One option appears to be the “Remove URLs” under the “Google Index” section of Webmaster Tools.

Remove URL Through Webmaster Tools

Unfortunately, you can only remove one URL at a time.

Enter Your URL

During the removal process, Google advises that for permanent removal, the content has to be blocked using the robots.txt file. This suggests that removing a URL from Google's index does not guarantee that the page will no longer generate a 404 error.

Confirm URL Removal Request

Page removal is not instant. When you request a page to be removed from Google's search results, the request will be placed in a pending status. I only made this request today, therefore I am unsure how long this process takes (I suspect it is at least a few days).

URL Pending Deletion

Removing 10,000 URLs from Google's index in this manner is not practical. Further inspection led me back to the Do 404s hurt my site? on Google. It confirms that removal requests are not necessary if a URL already returns a 404 error message:

Q: Can I use Google’s URL removal tool to make 404 errors disappear from my account faster?
A:
No; the URL removal tool removes URLs from Google’s search results, not from your Webmaster Tools account. It’s designed for urgent removal requests only, and using it isn’t necessary when a URL already returns a 404, as such a URL will drop out of our search results naturally over time. See the bottom half of this blog post for more details on what the URL removal tool can and can’t do for you.

Many website owners state that the best thing is to just wait for Google to correct everything. Some people are reporting this takes three months, others are saying that errors are still there after nine to twelve months.

One of the best responses I have read was published on Stack Exchange. It said:

Webmaster Tools is notoriously slow at updating the links/errors page. In particular, even when a page is no longer linked to, Google's bot keeps requesting the page and reporting that it cannot be found.

If any of the URLs follow a common pattern you can do a 301 redirect to the correct page, which should speed up Google's removal of those errors. (Note: I wouldn't recommend adding thousands of lines to htaccess because that can seriously impact performance.)

Aside from that there isn't much you can do unfortunately besides wait it out. If there are definitely no links pointing to the non-existent pages then the Crawl Errors section will slowly shrink over time. It can take up to 3 months in my experience.

An Overview of Fixing Crawl Errors

In theory, fixing crawl errors is easy. You simply need to 301 live pages to their current location and let deleted pages go to a 404 error page.

In practice, that does not seem to be happening….at least not for Martial Arts Videos. Webmaster Tools is reporting more errors for the website every day. This is a bizarre occurrence when you consider all of these articles were removed closed to two months ago.

It seems I only have two options:

  • 301 all 404 error pages to the home page.
  • Leave everything as it is and let Google resolve everything in their own time.

My dilemma is this: Can I afford to wait for Google to resolve this issue? I am paying writers hundreds of dollars every month to write for Martial Arts Videos and the website will not be profitable until traffic from Google gets to the level it should be at. Many website owners have stated that it can take a year or longer for Google to remove all errors. I really cannot afford to wait a year for this to be addressed.

The problem is that I do not even know if this is the cause of my low search engine traffic. It could be caused by something else. If you recall, Google states that 404s do not hurt my site, so perhaps my traffic will return when the number of low quality incoming links pointing to the site are gone.

I would love to tell you all what is the best thing to do in this situation. The truth is, I am far from an expert on this issue. Therefore, for the time being, I will listen to the advice of people a lot smarter than me and leave my website as it is.

Hopefully, Google will not take months to resolve the issue. If you know of a better way to handle this problem, please leave a comment and let me know what needs to be done.

Thanks for reading :)

Kevin

I am an experienced blogger who has been working on the internet since 2000. On this blog, I talk about WordPress, internet marketing, YouTube, technology and travelling.
Share This