Will 404 links hurt my SEO? Only if I’m bleeding internally.

I’ve spent a lot of quality time with the Google Webmaster Tools (GWT) this week, and it has been an altogether frustrating and enlightening experience. The bottom line is that it is showing my site as having a lot of errors of the 404 – Not Found variety, and this caused a bit of concern because 250+ of those has got to be hurting my search engine ranking.

It is additionally frustrating because I’ve gone to great lengths to prevent this very sort of thing from happening. I use Robots Meta to prevent certain pages from being indexed by search engines, All In One SEO to create meta data, and Redirection to make sure modification or deletion of posts doesn’t cause any disruption. And yet, there they are, staring me in the face. A bunch of pages that can’t be found and are returning errors. First, I’m going to talk about where these errors came from–because not all errors are equal–and whether they actually need to be fixed or not. Second, I’ll let you in on the secret to 404s and SEO.

What causes the errors?

Deleted Posts/Pages

GWT admits that not all errors are really a problem with the text:

Note: Not all errors may be actual problems. For example, you may have chosen to deliberately block crawlers from some pages. If that’s the case, there’s no need to fix the error.

If you have deleted a post or page, updated your sitemap, and you consider the case closed, you probably don’t need to worry about it. Eventually Google will stop trying to reach the link and the error will disappear all on its own. The problem is if you have other pages on your site that link to those you have deleted. GWT will tell you what those pages are, and you should edit them to remove the offending links.

This is probably the most benign of the errors because you can see it coming. Others are more mysterious.

Related Posts Plugin

Similar to the last, Related Posts plugins (I use YARPP and rather like it) don’t generally set all of their links nofollow, so they generate a ton of internal links on your site. These links aren’t generally set to nofollow because 1) they’re internal and 2) if you delete a post, Related Posts will update automatically and won’t link to the deleted post anymore. Unfortunately, Google has indexed that Page A links to Page B, so when Page B gets deleted, Google decides there’s an error. This, too, will pass in time as Google catches up, but it’s something of which you should be aware.

Back-end or Codeish Errors

Some errors are beyond comprehension.
Some errors are beyond comprehension.

I have no idea what causes these or where they come from, but GWT claims that a lot of my pages are linking to things that simply don’t exist. Namely, some pages are supposedly linking to */function.include, but near as I can tell, there are no links on the originating page that point at */function.include. This would point to there being a problem with the theme I’m using–maybe it has some code pointing to the wrong place and that’s throwing errors–but if that were the case, the errors should be happening from every single page, not just a few.

I went through and manually removed these links from Google’s index, but I’m skeptical of that solution. I’d rather know what is causing it and get it fixed, but this issue is so perplexing that I don’t know how. The good news is that actual users of the site aren’t attempting to follow these links because they don’t really exist on the page, so while the crawler may have trouble, the readers won’t.

Subdirectories

This one is more because I’m spastic than anything else. For those of you who have followed this site for a while, you might recall that it has undergone significant changes in the last four years. I’ve gone from WordPress to Mambo!+WordPress to Joomla!+WordPress and then back to WordPress exclusively. I have created a dozen different sub-sites, spin-off blogs, forums, wikis, etc., and consequently deleted those blogs and come back to just having the one centralized site.

As such, I should have gone back and edited my robots.txt to exclude… well, pretty much everything. I’ve done that now, in addition to removing those links from Google’s index, so hopefully that will take care of it.

Combining WordPress blogs

When I closed the blogs I mentioned above, I usually imported their posts into my primary site. This causes so many headaches if you’re not careful, so be prepared to sort out the kinks. GWT’s ability to tell you where the errors are happening is great for going back end editing posts to remove or update links, but it’s definitely a manual process. There is simply no way around fixing this stuff: you’re going to have to set aside a block of time, sit down, and get it right.

Pagination

This one originally perplexed me, as I had pages and pages of errors due to Pagination. This is where you’re browsing through the site and you’re on */page/108, and you can go to either */page/107 or */page/109. When I was typing this, it finally hit me what caused this: going from a single blog post on each page to 5 or 10. I suddenly have less pages, but Google hasn’t caught up yet and is still trying to hit those old links. It’ll learn eventually.

So, do 404s hurt SEO?

That depends, as I alluded to above, on whether they are internal or external links that are Not Found. Search engines won’t penalize you if other sites link incorrectly to your content and those links can’t be followed. If they did penalize you for that, then spammers or trolls could create sites with massive amounts of broken links to any site they wanted and drop its pagerank immediately. This obviously wouldn’t be fair, and thankfully search engines don’t work that way. Regardless, it is best to have a custom 404 page to deal with external links that 404. The key is making sure that actual people (rather than bots or crawlers) find your site helpful and get to the information they need/want.

Internal 404s will most certainly cause harm, and that’s where GWT can be of great benefit. By displaying not just the pages that can’t be found but also the pages that link to the 404ed, it helps you find the pages and fix them. As far as search engines are concerned, if your site can’t maintain internal link integrity, it isn’t trustworthy or helpful, so why would they send people your way? If Google started sending people to a bunch of broken sites that didn’t work well, people would stop trusting Google to provide good search results and they’d use a different search provider. That’s why the search engine checks to make sure sites are holding up and working well, and if the site isn’t, it’s pagerank will drop.

Maintaining internal link integrity is essential, not just for SEO, but also for keeping you readers happy. If someone clicks on a link on your site that goes to your site, they expect that link to work. When it doesn’t, no custom 404 page is going to make them happy. They might accept one error, but beyond that they’re more likely to just surf away.

In Conclusion

While it would be ideal to never generate errors, chances are you’ll have at least a few if you’ve been around for a while and actually do something with your website. After 4+ years of active development and changes and well over 300 blog posts in just the last year and a half, these things happen, so I’m going to try to not let them get me down. Use the Google Webmaster Tools to your benefit and get your errors sorted. The work will be worth it in the end, and both the crawlers and your users will be happier when they are able to breeze through without hitting brick walls.

And once you get them taken care of, make sure to check back with GWT regularly to make sure the problem never gets out of hand. Once I get this all fixed, I’ll be logging into GWT at least once a week to make sure nothing new has cropped up. I am confident that my pagerank will benefit from the dilligence, and it’ll make my readers happier to have a site that functions entirely as it should. For that happiness, it is well worth the extra work.

Cleaning house

Though I am an avid collector of site statistics, spending hours playing with Google Analytics on both my personal website and the ones I run for the university, I don’t really do much with the stats. They don’t drastically shape the way I run my sites, and though I find them intriguing, I don’t do much beyond becoming intrigued. And because I have Google Analytics running, it doesn’t often occur to me to log into Google’s Webmaster Tools to see how things are going on that end. I’ve got a sitemap in place and everything’s solid, but until recently I never thought much about the health of my site.

Crawl
All 198 errors are 404s, for the love...

When I do look, however, I notice that things aren’t as wonderful as I’d like. GWT tells me that there are 198 errors, which sounds pretty serious, and I’ve read elsewhere that an abundance of errors like this can really hurt a site’s pagerank. What’s worse, that number (198) is pretty new. The amount of errors seems to be growing. I suppose I shouldn’t be surprised, what with the recent overhaul of SilverPen and all, but a lot of pages aren’t being found. Even more frustrating, a lot of the pages that can’t be found are ones that never existed to begin with. I also find it odd and rather perturbing that Google uses the sitemap as more of a handy reference than as the set of instructions I had intended it to be, so it is indexing some subdirectories and sites I don’t use or particularly want indexed. I’m not entirely sure what Google’s smoking.

I decided to clean things up a bit to see if I could reduce the number of errors I have listed. I didn’t hit all 198 issues, but the ones causing the most errors and some of the more obvious ones have been fixed, often with either a 301 redirect or by editing the page to fix whatever was wrong with it. I’ll know in a few weeks whether this helped or not–unfortunately, you can’t just reset Google’s findings and tell it to crawl your site again.

Remember back when SilverPen was actually five blogs (Reader, Writer, Tech, Theology, Main)? That was only 1.5-2 years ago (though it definitely feels longer), but let me tell you, it was a bad idea. The whole thing, total failure, shouldn’t be done again. As I have been wont to say lately, “If I knew then what I know now…” *shakes fist and waves cane in the air*. I have been slowly working to reintegrate the blogs back into one, because in addition to those five there was also a later poetry one, and one for stories about the elven character Arias Stormsworn, and one for entries directly related to being newlyweds (written by both April and me). At long last, I have all those shut down, cleaned out, and brought under one roof.

They were really bugging me, not just because of the page errors, but because they represented security risks. Because I wasn’t using or signing into them, the software and plugins got further and further out of date, and that is always a hazard. What’s more, I felt I had this creep of databases and RSS feeds that was getting out of hand. When I look at my DBs and I’m not sure which is which, that raises a flag for me. Now everything is pared down where I know what is what and it’s all solid. In addition to all this, I also signed into April’s site and added some plugins to manage her SEO and tighten everything up, which should help with SilverPen Publishing’s overall health.

I have discovered the wonder of how WordPress manages RSS feeds, so I have created a couple of specific feeds for the items that need them. If I’d known this was possible years ago, everything might have been different. Still, I won’t let that get me down. It’s nice to have everything cleaned up and ship-shape.