Log File Analysis As An SEO Tool
Not many web owners know or understand what a log file analysis task is, but I think every webmaster should be familiar with what it is, because doing this exercise uncovers so much intelligence about how your site is being crawled on both a micro and macro level.
I'm going to cover what the log file analysis is, why it's important and what it can show you regarding which search engine bots are crawling your site and just as importantly, which pages are being crawled and indexed.
Let's dive in!
What's A Log File Anyways?
First, you need to know what a log file is. It's a line item by line item text of each time a web crawler comes to your web hosting server and views, or crawls your pages. It logs the date, time, how many bytes were downloaded, how long it took and which web crawler came knocking for each URL on your domain.
It's boring as hell and hard to read in what's called the "raw" format. Hence, we have log analyzers that compile our log files and pretty them up so we can more easily see them. Hang tight, screen shots are coming.
Why Do I Want To Analyze My Log Files?
Because the analysis tells you the following:
- Which URLs are being crawled - hopefully, your evergreen and/or most important content is being regularly visited by search engine crawlers
- Any 4XX URLs that are hogging your domain's crawl budget - more on this in a minute! Dead pages should NOT.BE.CRAWLED. So this is why you do log file analysis from time to time!
- Which search engines are knocking on your web server door - you want Google mobile, Google mobile and other popular search engines crawling and indexing your URLs. This is the only way you're going to find out!
- If you've re-launched your site - it's important to make sure your site is getting search engine crawler attention as soon as you re-launch. What if you're an ecommerce company? We're talking money!
- If your traffic has dropped dramatically - it could be your site's been penalized by a search engine like Google. You can prove it by analyzing your log files to see if any Google crawlers have come to visit your site.
Let's veer off for a moment and talk about crawl budget - what is it?
Google tells us that their crawlers do not crawl every single freaking page on your site. What if your domain has 12K pages? Are they all that important? Google also says this is normal, so don't panic. However, by using your robots.txt file and XML sitemaps, you can exert control over which pages have priority in Google's crawl budget.
Where The Hell Do I Get My Log Files?
If you have access to your web hosting server, you can go into your file manager and find a folder called "Logs," typically. You download a month at a time, or painfully - each day. You download them in a zip file.
If this seems too daunting, have your friendly neighborhood web developer who maintains your site do this for you. Have them email you the files, or share them on a common drive. You'll need these!
I recommend getting at least 90 days if you've not done this before. Of course, if you feel really perky, get more than that, but 90 days is a good start.
It's Time For...Log File Analyzer Software!!
So now you need a tool to unpack these zip files and do some heavy lifting for you. The most popular is Screaming Frog Log Analyser. It's a free download, but you can only analyze 1,000 URLs and files.
With Screaming Frog, all's you need to do is import the zip files, and it'll do the rest.
So here is what a log file analysis looks like:
In the upper left, you get an overview of the stats. The upper right is response codes, such as 3XX and 4XX. What you don't want to see is a bunch of 4XX codes; this wastes crawl budget. The lower right is events - was the crawler successful in fetching the URL? The lower right is URLs crawled.
So here you see the response codes that were found in this four month analysis period. Again, we don't want to see 4XX codes if at all possible.
Here are search engine crawlers that visited URLs. Isn't this much easier to see than trying to parse individual lines?
And what you should find very useful is this table listing your URLs that crawled:
There's more to log analyzer file software, but I hope this basic overview shows you how useful the tool can be.
I recommend doing a log analysis exercise three to four times per year, or if you have a significant event occur, such as a template re-design, moving to a new shopping cart platform, going from non-secure to secure, etc.
It's amazing what you can learn just from seeing which crawlers and how often, and what content they're analyzing for indexing and ranking.
If you'd like to know more about crawl budget, XML sitemaps, robots.txt files and troubleshooting website issues, take an advanced SEO training class with Invenio to gain confidence in better understanding why your site may not be ranking as well as it could.
Until we meet again, stay safely between the ditches!
All the very best to you,
All screenshots courtesy of author August 2019