Will 404 links hurt my SEO? Only if I’m bleeding internally.

I’ve spent a lot of quality time with the Google Webmaster Tools (GWT) this week, and it has been an altogether frustrating and enlightening experience. The bottom line is that it is showing my site as having a lot of errors of the 404 – Not Found variety, and this caused a bit of concern because 250+ of those has got to be hurting my search engine ranking.

It is additionally frustrating because I’ve gone to great lengths to prevent this very sort of thing from happening. I use Robots Meta to prevent certain pages from being indexed by search engines, All In One SEO to create meta data, and Redirection to make sure modification or deletion of posts doesn’t cause any disruption. And yet, there they are, staring me in the face. A bunch of pages that can’t be found and are returning errors. First, I’m going to talk about where these errors came from–because not all errors are equal–and whether they actually need to be fixed or not. Second, I’ll let you in on the secret to 404s and SEO.

What causes the errors?

Deleted Posts/Pages

GWT admits that not all errors are really a problem with the text:

Note: Not all errors may be actual problems. For example, you may have chosen to deliberately block crawlers from some pages. If that’s the case, there’s no need to fix the error.

If you have deleted a post or page, updated your sitemap, and you consider the case closed, you probably don’t need to worry about it. Eventually Google will stop trying to reach the link and the error will disappear all on its own. The problem is if you have other pages on your site that link to those you have deleted. GWT will tell you what those pages are, and you should edit them to remove the offending links.

This is probably the most benign of the errors because you can see it coming. Others are more mysterious.

Related Posts Plugin

Similar to the last, Related Posts plugins (I use YARPP and rather like it) don’t generally set all of their links nofollow, so they generate a ton of internal links on your site. These links aren’t generally set to nofollow because 1) they’re internal and 2) if you delete a post, Related Posts will update automatically and won’t link to the deleted post anymore. Unfortunately, Google has indexed that Page A links to Page B, so when Page B gets deleted, Google decides there’s an error. This, too, will pass in time as Google catches up, but it’s something of which you should be aware.

Back-end or Codeish Errors

Some errors are beyond comprehension.
Some errors are beyond comprehension.

I have no idea what causes these or where they come from, but GWT claims that a lot of my pages are linking to things that simply don’t exist. Namely, some pages are supposedly linking to */function.include, but near as I can tell, there are no links on the originating page that point at */function.include. This would point to there being a problem with the theme I’m using–maybe it has some code pointing to the wrong place and that’s throwing errors–but if that were the case, the errors should be happening from every single page, not just a few.

I went through and manually removed these links from Google’s index, but I’m skeptical of that solution. I’d rather know what is causing it and get it fixed, but this issue is so perplexing that I don’t know how. The good news is that actual users of the site aren’t attempting to follow these links because they don’t really exist on the page, so while the crawler may have trouble, the readers won’t.

Subdirectories

This one is more because I’m spastic than anything else. For those of you who have followed this site for a while, you might recall that it has undergone significant changes in the last four years. I’ve gone from WordPress to Mambo!+WordPress to Joomla!+WordPress and then back to WordPress exclusively. I have created a dozen different sub-sites, spin-off blogs, forums, wikis, etc., and consequently deleted those blogs and come back to just having the one centralized site.

As such, I should have gone back and edited my robots.txt to exclude… well, pretty much everything. I’ve done that now, in addition to removing those links from Google’s index, so hopefully that will take care of it.

Combining WordPress blogs

When I closed the blogs I mentioned above, I usually imported their posts into my primary site. This causes so many headaches if you’re not careful, so be prepared to sort out the kinks. GWT’s ability to tell you where the errors are happening is great for going back end editing posts to remove or update links, but it’s definitely a manual process. There is simply no way around fixing this stuff: you’re going to have to set aside a block of time, sit down, and get it right.

Pagination

This one originally perplexed me, as I had pages and pages of errors due to Pagination. This is where you’re browsing through the site and you’re on */page/108, and you can go to either */page/107 or */page/109. When I was typing this, it finally hit me what caused this: going from a single blog post on each page to 5 or 10. I suddenly have less pages, but Google hasn’t caught up yet and is still trying to hit those old links. It’ll learn eventually.

So, do 404s hurt SEO?

That depends, as I alluded to above, on whether they are internal or external links that are Not Found. Search engines won’t penalize you if other sites link incorrectly to your content and those links can’t be followed. If they did penalize you for that, then spammers or trolls could create sites with massive amounts of broken links to any site they wanted and drop its pagerank immediately. This obviously wouldn’t be fair, and thankfully search engines don’t work that way. Regardless, it is best to have a custom 404 page to deal with external links that 404. The key is making sure that actual people (rather than bots or crawlers) find your site helpful and get to the information they need/want.

Internal 404s will most certainly cause harm, and that’s where GWT can be of great benefit. By displaying not just the pages that can’t be found but also the pages that link to the 404ed, it helps you find the pages and fix them. As far as search engines are concerned, if your site can’t maintain internal link integrity, it isn’t trustworthy or helpful, so why would they send people your way? If Google started sending people to a bunch of broken sites that didn’t work well, people would stop trusting Google to provide good search results and they’d use a different search provider. That’s why the search engine checks to make sure sites are holding up and working well, and if the site isn’t, it’s pagerank will drop.

Maintaining internal link integrity is essential, not just for SEO, but also for keeping you readers happy. If someone clicks on a link on your site that goes to your site, they expect that link to work. When it doesn’t, no custom 404 page is going to make them happy. They might accept one error, but beyond that they’re more likely to just surf away.

In Conclusion

While it would be ideal to never generate errors, chances are you’ll have at least a few if you’ve been around for a while and actually do something with your website. After 4+ years of active development and changes and well over 300 blog posts in just the last year and a half, these things happen, so I’m going to try to not let them get me down. Use the Google Webmaster Tools to your benefit and get your errors sorted. The work will be worth it in the end, and both the crawlers and your users will be happier when they are able to breeze through without hitting brick walls.

And once you get them taken care of, make sure to check back with GWT regularly to make sure the problem never gets out of hand. Once I get this all fixed, I’ll be logging into GWT at least once a week to make sure nothing new has cropped up. I am confident that my pagerank will benefit from the dilligence, and it’ll make my readers happier to have a site that functions entirely as it should. For that happiness, it is well worth the extra work.

My WordPress SEO Strategy

So you’ve built your website, but now you want to know how to get people there. You’ve got a great CMS (WordPress, in my case) and you’ve heard something about the black sorcery that is Search Engine Optimization (SEO), but you aren’t really sure what to do.

No worries, because I’ve got two very easy steps for you. That is to say, they’re easy to wrap your mind around, and a LOT less ambiguous than what you’ll find on other sites. When I first started researching SEO, it seemed like no one quite wanted to get to the heart of the matter, which was a step-by-step account of what you should actually do to ensure that search engines index your site correctly.

First step: Follow this guide on Yoast’s website. You don’t have to do everything on it, but certainly give everything due consideration. For instance, I don’t use Headspace, but I absolutely use Robots Meta and Redirection. If you walk through that guide, taking his suggestions seriously and implementing most/all of them, you’ll see your traffic from searches increase.

Second step: Find topics about which no one has written… and write about them.

At face value, this seems a lot harder than it is, but you’d be amazed at how much has not been done on the Internet. I’m not going to give you a list of topics (because I’ve got big plans to start this second step myself next year), but look around for stuff that hasn’t been covered and cover it. This is what journalists do when they try to be the first to break a story. For bloggers, you don’t even need to break it open, you just have to do it right.

If you’ve ever gone looking for help online with a technical problem, be it with Windows, Adobe Photoshop, Linux, whatever, you’ve probably ended up browsing for minutes, hours, or days through myriad forums, wikis, and guides. You finally find the answers you need and figure out the problem, but now you have two options. You can move on to the next hurdle you have to jump, or you can document it.

If you are looking for help with a specific process/problem, chances are other people are too. So put together a very detailed, specific step-by-step blog post on how to do the specific thing you are trying to do, and be sure to use a boring but precisely accurate title (Using the blend tool in Adobe Photoshop to combine two landscapes, or something like that). In a lot of these cases, you’re not even doing much original writing, you’re just copying from forums/wikis/etc. (always providing citation to the original sources) and bringing everything together into one easy to find, read, and use page.

A good example of this is my post about How to Install Wrath of the Lich King on Linux. It’s not a particularly hard process, but a friend of mine was having a lot of difficulty with it. I got it installed, emailed him how to make it work, then thought I’d go ahead and throw those same instructions on my blog. To someone who has used Linux and Cedega for a while, it was relatively easy, but if this is your first go-around, it’s impossible. By providing the instructions in a simple, easy to read/use page, my traffic has increased significantly and I’ve now got a page that’s the first search result on Google for a topic.

You can do this too, just follow the two simple instructions above. 1) Optimize your WordPress setup to improve SEO, and 2) Write about stuff other people haven’t (admittedly, it helps if you’er also writing stuff that other people want to read!). Do that, and you can’t lose.