Gathering Statistics

As bloggers, we invest a lot of time in our websites: skinning them and making them aesthetically pleasing, generating helpful or interesting content, and working to make sure they are fully accessible so everyone can get to that content. So it’s no surprise that we want to know how many people are visiting, and perhaps when they are visiting, and certainly what it is they are looking for and reading when they come. There are a multitude of statistics programs you can employ to serve up this data, but not all statistics programs are equal. I’m going to compare and contrast some of the leading analytics programs to give you an idea of what you might try.

AWStats

This was the first web statistics program I used, and it was, until recently, my favourite. Displaying everything I might want to know in a single page, I could quickly see how many visitors I had gotten, where they came from, what search terms brought them to my site, etc. To be honest, though, the thing I liked most about it was that the numbers it gave me were very high. It was an ego boost to look at my AWStats page, because it claimed to filter out bots and put them in a separate category (thereby ensuring me that the high numbers I was seeing were both real people and unique visitors). AWStats gave me the impression that it was accurate.

Sadly, that was not the case, which I discovered after installing Bad Behavior. This intriguing little bit of code keeps bots and scrapers from even hitting your page, and after installing it, the traffic AWStats reported I was receiving dropped by a factor of about 5. AWStats can’t differentiate bots from people well enough, and so it was over-inflating my traffic. All-in-all, it ended up being worthless.

WordPress Stats Plugin

I’d gotten a clue that perhaps my visitors weren’t so numerous as AWStats had intimated when I installed this plugin. It showed my numbers as being much smaller, but it felt much more accurate as well. What’s also nice about this plugin is that it fits right into your blog dashboard, but the data isn’t stored in your database. Rather, it is hosted and stored by WordPress, which translates to WPStats having zero impact on the performance of your website. As an added bonus, it will work in any WP blog, regardless of your configuration. Since I use WordPress-MU, this is particularly nice, as a lot of plugins and programs don’t work right with MU. The Stats Plugin is probably my most reliable statistics gathering device.

An additional bonus to the WordPress Stats Plugin is that it can be set to not log your own visits. If you, the administrator, are logged into your WP blog, the stats plugin won’t record your traffic. With other statistics programs, I would find myself hesitant to visit or browse my site because I didn’t want to falsely inflate my statistics, but such concerns are a thing of the past! Unfortunately, WPStats doesn’t have the level of information others do, such as geographical location or bounce rating, so you might want to augment it with something more powerful.

Google Analyticator

This is another program that runs outside your site, thereby not slowing your blog down. It also provides much of the same information the others do: geographic location, number of visits, bounce rate… but I had some problems getting it to work with MU. Like Woopra, which I’ll talk about in a second, it doesn’t handle the subdirectory structure I have at SilverPen Publishing. It will record data from the root site, but not its sub-sites. It will work with subdomains, rather than subdirectories, but I’d have to do a lot of wrangling with Bluehost to get them to edit my Apache files for subdomains to work right, and it’s just not worth the hassle.

Google Analytics is powerful and accurate, and I do use it for the front part of my site here, but it is not one I’ve come to love and rely on.

Woopra

This might be the coolest one I use, though like the Google Analyticator, it also refuses to handle subdirectories. Currently only in beta (so they don’t guarantee you’ll be able to get an account, as that process is handled manually), Woopra is a full client you install on your computer that displays the data they have on their servers. A bit of code on your page allows their servers to monitor your traffic, so again, this has little impact on the performance of your website because the stats collection is happening in their database.

What sets Woopra apart is that it is real-time. You can watch people visit your site, see where in the world they are coming from, see exactly what page they are reading, and see when they leave. Woopra also gives you the ability to tag your users so you can see names rather than just IPs. This functionality isn’t rock-solid in the beta yet, but it is certainly made easier with WordPress. If a visitor leaves a comment and puts their name to it (rather than leaving it anonymous), Woopra will pick that up and tag their IP accordingly so you can more easily understand your stats.

No other stats program I have seen shows you real-time information like Woopra, and it’s really fun to play with. If you can get an account, this is definitely one to check out.

Please remember that Woopra is still in beta, and development cycles will cause fluctuation in the performance of the program. There are a number of issues that still need to be resolved before Woopra leaves beta, but they’ve got a solid foundation and this program certainly has the potential to be one of the top analytics programs on the Web.

Feedburner Feeds

Rather than for your site, Feedburner provides stats for your RSS feeds, which is also quite important. I encourage all of my readers to get my content through RSS readers (such as Google Reader, which I use) because I have found it to be so much more convenient. However, with Feedburner I was able to discover that, thus far, very few of my readers use the feeds.

With this information, I can change how I display and advertise my content, rather than relying on a method that isn’t working as well as I had hoped. As a sidenote, because Feedburner focuses on feeds, it doesn’t have the subdirectory problems I encountered with other stats programs. But, because it is such a niche product (focusing on the feed rather than the site), it is also one I visit and rely on less. Perhaps once my RSS traffic gets higher, Feedburner will be more worthwhile for me.

Concluding Thoughts

Especially when starting out with a website, it is important to get all the help you can. Statistics programs can give you an idea of what’s working and what’s not, and perhaps show some pathways towards what will work better. They highlight your strengths and weaknesses, but don’t get too caught up in the numbers. Web statistics programs aren’t all-powerful, all-knowing pieces of software (as AWStats aptly demonstrates). Do what you enjoy, write what you want, and use the stats programs to tweak the site. Don’t let your web presence revolve around the stats; they’re not everything.

Support Open Formats

Much like the current campaigns for presidential nominees in the United States, I’m somewhat tired of hearing about this subject. It is old and tired, and an uphill battle that feels as if it will never end. Formats like Microsoft’s Open Office XML (OOXML) are simply bad, cludgy, poorly designed… and in the case of OOXML, Microsoft’s own products, namely their Office 2007 suite, don’t or can’t implement the spec correctly.

What’s frustrating is that Microsoft has the money and the power granted through monopoly to make all of that irrelevant. NoOOXML.org has eight top-notch reasons why OOXML should be struck down, but the ISO (International Organization for Standardization) board approved the format anyways.

The bottom line is that governments should not encode, encrypt, archive, present, or distribute public documents in a proprietary format. Microsoft holds all the strings on their formats, and at any point they could pull the plug and say that no program(s) but their own can open their formats. It would be illegal for anything other than a Microsoft program to open a Microsoft document. And then, should Microsoft go bankrupt and stop producing their software… and if their software then becomes so out-of-date it can no longer run on modern computers…

No empire lasts forever, folks, so by using Microsoft’s formats, we’re handing our documents to a company that will fade away and take our archives and records with them.

The solution is to use open standards, such as the Open Document Format (ODF), which was approved by ISO in 2006 and is full-implementable, open source, and actually works as its spec claims it should. The ISO’s decision on OOXML is being appealed by a number of countries around the world who are unwilling to be bought off and would rather we have a proper, standardized format. If we cannot rely on a supposedly independent, unbiased body to produce standards, intercommunication between regions will become impossible. Without standards, we will eventually degenerate into Babel.

I urge you to sign both the petition above on NoOOXML’s site as well as the Hague Declaration’s petition. A part of me recognizes what most of us already know: that online petitions are pretty much worthless. But I still feel that it is important to put my name out there and to have something I can point at to say, “That’s what I support.” When I write to my congress-people, I can point at these documents and say, “This is what I want you to do.”

Educate yourselves about the issue, and speak accordingly.

Garbage in, garbage out

The adage is pretty widely known, but every once in a while the realization creeps up on me that my input is occasionally of the type that should be gently placed in the recycle bin and carried to the curb.

I now have Confluence bound to our active directory, and following that accomplishment I got it to automatically stick users into the confluence-users group when they log in. I was pretty excited that it pulled all the usernames over and is syncing live with the LDAP, but I hadn’t gotten groups figured out until a guy from Enterprise Systems (a different group in my department–I’m in User Support) dropped in. He’s been helping me understand LDAP better and I’ve been bouncing ideas off him, so when I mentioned the groups issue (where I wanted Confluence to pull Groups/Roles from our AD so we can manage it from our standard account management system), he asked about how I had it set up and to what container I was pointing Confluence. Something he said sparked a realization and I changed one small tag (two letters, in fact) which fixed Confluence. It’s now pulling the groups I wanted it to pull.

I simply hadn’t given it the right parameters, and while I wish something could “Just Work,” that’s never actually the case. So now, both users and their groups are pulling from the AD. It’s not set up correctly on the AD for Confluence (I’m pulling the groups associated with our current “Common” folders, which are often used for shared documents, so it seemed like a good idea at the time), but it’s a fantastic proof of concept and it wouldn’t be hard to create a new locker with groups structured just for Confluence, should we decide to buy the software.

I’m just excited that it works. It has certainly been a frustrating process, but learning AD binding, how to search for and run queries against an LDAP server, and how to structure all this stuff has been a really valuable experience. I should probably spend some time tomorrow documenting how I got it to work so I have some point of reference for the next time this comes around (hopefully soon, if we buy Confluence).

My brain is mush

I’ve spent the day working on two things. I’m configuring Confluence as a wiki solution for the ERP, and I needed it to 1) run as a service, and 2) email comments/discussion to certain members as a tool for collaboration. This meant, concerning the second one, that I need our server to function in both web and email capacities.

I’ve set up email servers before, but absolutely hate the SMTP functionality built into Microsoft Windows Server 2003. Therefore, last time I did this, I used HMailServer. Unfortunately, my documentation was not complete and failed to note that, when authenticating to HMailServer, your username is the full email address (so username@hmailserver.edu). I was just using the username, so everything else was configured correctly, but it wasn’t working, and it wasn’t providing me any errors. A simple mistake, but it took me the better part of the day to figure it out.

As for running Confluence as a service, our installation of Windows was missing a .dll it needed for that to work properly. After copying the .dll from the Confluence directory to the system32 file, everything worked perfectly.

Two good milestones reached in one day. I’m pretty proud of it, and tomorrow I’ll start looking into LDAP integration. For now, though, I’m done.