Information Architecture & SEO
Information architecture (IA) is a discipline where an individual or a team organizes a web site's content so it's easier for users to find quickly. There are multiple layers to implement for IA, but what I want to concentrate on today is illustrating how the lack of IA can negatively impact user experience (UX) in ways that affects your search engine optimization efforts.
I'll tell this in a story about a client I currently work for. It's a fairly robust e-commerce business that does pretty well, has been around for a lot of years and is just now starting to get a grip on what they need for SEO.
I'm part of a team that is tackling multiple issues, and last month, I was assigned to review, audit and make recommendations to clean up multiple XML sitemaps for the site.
So, here's what happens when nobody is in charge of site architecture, and everyone just kind of creates their own file structure. The sitemaps get bloated with duplicate URLs in different file folders. The impact is multi-fold:
- Crawl budget is severely wasted with the same URLs/content showing up in different folders and getting crawled and crawled and crawled. I'm talking thousands, not a few.
- New content isn't even seeing the light of day in terms of crawling because of the issue above.
- Many of these URLs contain extremely outdated content that is no longer relevant. This makes for a poor user experience, because who wants to read web pages that are no longer relevant?
No crawling means no indexing, no ranking, no visibility in search results.
The other problem is that new content may be buried several clicks down, and if you know human nature, then you realize most people aren't going to dig deep into your site, right?
Can you see the issue now? I bet you can!
What's worse is that many of the old, old URLs are 301 redirects that have been in place for more than four years. In general, you only need to keep 301 redirects in place for about a year; after that, you can remove the old pages, put a 410 error on them, and remove then from Google's index via the URL removal request function in Google Search Console.
With IA, the site is reviewed, a content inventory is done, and the site is organized in whatever way makes sense. Some of the ways to organize a site's pages are:
- By product
- By service (function)
- By location(s)
- By departments
- By staff
And so on. You get the picture, right?
So if someone was the master architect for the site, this person could keep tabs on content through the months and years and make recommendations for keeping page counts at a sane level, recommend removing old URLs that don't matter anymore and make sure that the newly published, updated, fresh information is crawled in a timely manner, leading to more opportunities to rank higher in search results.
If this is done, more visitors come to the site, go deeper into the web pages, stay on the site longer and find what they are looking for in fewer clicks. These are critical engagement metrics that matter!
If you find your site is in this situation, here are some steps you can take to tame the beast:
- Do a thorough site crawl - I use Screaming Frog
- Download all the 404 error URLs and 3XX redirects
- Go through all of these URLs, and see if there candidates for removal
- Access your XML sitemap - you can usually do this putting your domain in the browser bar and adding "/sitemap_index.xml" after the domain URL.
Print out the XML sitemap and compare URLs. If you see a large number of URLs that no longer need to be in the sitemap, remove them, put in a 410 (removed permanently), take them out of the sitemap, submit the URLs to Google Search Console (Google Index ---> Remove URLs)
Be sure to add new URLs if they aren't in the XML sitemap already. If you're not comfortable doing this, get your friendly neighborhood web dev person to do this for you.
- You can find out exactly which pages are being crawled by the search engine spiders.
- Use a log analyzer tool - I use the Screaming Frog Log Analyser.
- Download your raw log files from your web hosting server. If you're not comfortable with this, have your friendly neighborhood web dev person do this for you. Download at least 30 days' worth, to give you a good inventory.
- Run those log files through the Screaming Frog tool.
Review the ones that are being crawled. You might be surprised - unpleasantly so - at what your site's crawl budget is being used for. This is your chance to use it wisely! Read more about analyzing and maximizing your crawl budget here.
If you do these two exercises, I'm betting you'll see opportunities to develop and implement a better IA structure for you site. And if so, good for you - you'll improve user experience, use your crawl budget better and improve key performance indicators for your site!
Good information architecture requires constant application, review and diligence to adhere to the organizational structure for the site. But it's worth it.
BTW, I have another client - a higher education website - and we've discovered approximately 4000 URLs that could be removed. We haven't done a log file analysis, but I'm guessing we would find some nasty surprises about what's being crawled by the search engine spiders.
Want to know more about IA, crawl budget and XML sitemaps? Take an advanced SEO training class with Invenio, and you'll be an expert at tightening up your site for better indexing and ranking.
Until we meet again, stay safely between the ditches!
All the very best to you,
Messy desk image courtesy of flickr.com