I’ve spent a lot of quality time with the Google Webmaster Tools (GWT) this week, and it has been an altogether frustrating and enlightening experience. The bottom line is that it is showing my site as having a lot of errors of the 404 – Not Found variety, and this caused a bit of concern because 250+ of those has got to be hurting my search engine ranking.
It is additionally frustrating because I’ve gone to great lengths to prevent this very sort of thing from happening. I use Robots Meta to prevent certain pages from being indexed by search engines, All In One SEO to create meta data, and Redirection to make sure modification or deletion of posts doesn’t cause any disruption. And yet, there they are, staring me in the face. A bunch of pages that can’t be found and are returning errors. First, I’m going to talk about where these errors came from–because not all errors are equal–and whether they actually need to be fixed or not. Second, I’ll let you in on the secret to 404s and SEO.
What causes the errors?
GWT admits that not all errors are really a problem with the text:
Note: Not all errors may be actual problems. For example, you may have chosen to deliberately block crawlers from some pages. If that’s the case, there’s no need to fix the error.
If you have deleted a post or page, updated your sitemap, and you consider the case closed, you probably don’t need to worry about it. Eventually Google will stop trying to reach the link and the error will disappear all on its own. The problem is if you have other pages on your site that link to those you have deleted. GWT will tell you what those pages are, and you should edit them to remove the offending links.
This is probably the most benign of the errors because you can see it coming. Others are more mysterious.
Related Posts Plugin
Similar to the last, Related Posts plugins (I use YARPP and rather like it) don’t generally set all of their links nofollow, so they generate a ton of internal links on your site. These links aren’t generally set to nofollow because 1) they’re internal and 2) if you delete a post, Related Posts will update automatically and won’t link to the deleted post anymore. Unfortunately, Google has indexed that Page A links to Page B, so when Page B gets deleted, Google decides there’s an error. This, too, will pass in time as Google catches up, but it’s something of which you should be aware.
Back-end or Codeish Errors
I have no idea what causes these or where they come from, but GWT claims that a lot of my pages are linking to things that simply don’t exist. Namely, some pages are supposedly linking to */function.include, but near as I can tell, there are no links on the originating page that point at */function.include. This would point to there being a problem with the theme I’m using–maybe it has some code pointing to the wrong place and that’s throwing errors–but if that were the case, the errors should be happening from every single page, not just a few.
I went through and manually removed these links from Google’s index, but I’m skeptical of that solution. I’d rather know what is causing it and get it fixed, but this issue is so perplexing that I don’t know how. The good news is that actual users of the site aren’t attempting to follow these links because they don’t really exist on the page, so while the crawler may have trouble, the readers won’t.
This one is more because I’m spastic than anything else. For those of you who have followed this site for a while, you might recall that it has undergone significant changes in the last four years. I’ve gone from WordPress to Mambo!+WordPress to Joomla!+WordPress and then back to WordPress exclusively. I have created a dozen different sub-sites, spin-off blogs, forums, wikis, etc., and consequently deleted those blogs and come back to just having the one centralized site.
As such, I should have gone back and edited my robots.txt to exclude… well, pretty much everything. I’ve done that now, in addition to removing those links from Google’s index, so hopefully that will take care of it.
Combining WordPress blogs
When I closed the blogs I mentioned above, I usually imported their posts into my primary site. This causes so many headaches if you’re not careful, so be prepared to sort out the kinks. GWT’s ability to tell you where the errors are happening is great for going back end editing posts to remove or update links, but it’s definitely a manual process. There is simply no way around fixing this stuff: you’re going to have to set aside a block of time, sit down, and get it right.
This one originally perplexed me, as I had pages and pages of errors due to Pagination. This is where you’re browsing through the site and you’re on */page/108, and you can go to either */page/107 or */page/109. When I was typing this, it finally hit me what caused this: going from a single blog post on each page to 5 or 10. I suddenly have less pages, but Google hasn’t caught up yet and is still trying to hit those old links. It’ll learn eventually.
So, do 404s hurt SEO?
That depends, as I alluded to above, on whether they are internal or external links that are Not Found. Search engines won’t penalize you if other sites link incorrectly to your content and those links can’t be followed. If they did penalize you for that, then spammers or trolls could create sites with massive amounts of broken links to any site they wanted and drop its pagerank immediately. This obviously wouldn’t be fair, and thankfully search engines don’t work that way. Regardless, it is best to have a custom 404 page to deal with external links that 404. The key is making sure that actual people (rather than bots or crawlers) find your site helpful and get to the information they need/want.
Internal 404s will most certainly cause harm, and that’s where GWT can be of great benefit. By displaying not just the pages that can’t be found but also the pages that link to the 404ed, it helps you find the pages and fix them. As far as search engines are concerned, if your site can’t maintain internal link integrity, it isn’t trustworthy or helpful, so why would they send people your way? If Google started sending people to a bunch of broken sites that didn’t work well, people would stop trusting Google to provide good search results and they’d use a different search provider. That’s why the search engine checks to make sure sites are holding up and working well, and if the site isn’t, it’s pagerank will drop.
Maintaining internal link integrity is essential, not just for SEO, but also for keeping you readers happy. If someone clicks on a link on your site that goes to your site, they expect that link to work. When it doesn’t, no custom 404 page is going to make them happy. They might accept one error, but beyond that they’re more likely to just surf away.
While it would be ideal to never generate errors, chances are you’ll have at least a few if you’ve been around for a while and actually do something with your website. After 4+ years of active development and changes and well over 300 blog posts in just the last year and a half, these things happen, so I’m going to try to not let them get me down. Use the Google Webmaster Tools to your benefit and get your errors sorted. The work will be worth it in the end, and both the crawlers and your users will be happier when they are able to breeze through without hitting brick walls.
And once you get them taken care of, make sure to check back with GWT regularly to make sure the problem never gets out of hand. Once I get this all fixed, I’ll be logging into GWT at least once a week to make sure nothing new has cropped up. I am confident that my pagerank will benefit from the dilligence, and it’ll make my readers happier to have a site that functions entirely as it should. For that happiness, it is well worth the extra work.