By Mary Wilson
24 Apr 2019

Stories of the Strange: Site Auditing Oddities

Advanced SEO     Technical SEO

Joelle-Irvine-AA-twitter

A technically-sound website is the basis for digital marketing success. Consider your website as if it were a storefront; just as a brick-and-mortar needs investment to stay ready for customers, your site does as well.

As a technical SEO team within an SEO agency, we review lots of websites. Our goal is to uncover issues that could hold our clients back from getting the most from their investment in SEO. While we specialize in off-page SEO — specifically, link building — the definition of “optimization” applies to the entire process of improving the performance of a site in search. That includes an optimized strategy, and strategically, it typically makes sense to address the issues occurring on-site before looking for ways to optimize performance beyond the domain.

When auditing a site, there is no-one-size-fits-all check. Every site is different. Each has been touched by different developers, hosted on different platforms, and as it’s grown, has had its own journey. And so, each audit goes through a process unique to the site.

There are issues that are common to encounter: unsecured sites, improper indexing, no XML sitemap or robots.txt. Other times, we’ll have suggestions for optimization that wouldn’t necessarily be considered technical, but could improve performance: keyword targeting opportunities, content the site may need, or internal linking optimization.

But sometimes, we discover things that make us scratch our heads. These are a few of the odd site issues we’ve encountered over the last year.

Caching Conundrums

While this mysterious error has since been resolved, the great caching conundrum threw me for a loop when I first ran into it.

Viewing Google’s cached version of a webpage allows you to access previous versions of recently-updated pages, or view pages that are not currently responding.

Doing so is typically a simple process; to look at a cached version of a page you can either:

  • Put the word “cache:” (with a colon) into the address bar:

Cache:examplesite.com

  • Find the site in the SERPs, and click on the drop-down “cache” option within the result.

However, slowly throughout the summer of 2018, both of these options started showing a 404: Not Found code. We saw 404 codes on big sites like Moz, but not on informational authority sites like Wikipedia. We began tracking organic traffic, indexation, and keywords in a state of anticipation and fear of what might come.

While exploring the issue internally, a fix appeared!

If you reached your 404’d cache page from the SERP drop-down, you could manually adjust the URL, and it was very likely a cache would appear.

It was as simple as adding or deleting a “www.”

Example of the Caching Error in Google

From the SERP drop-down, Page One Power’s 404-cache is :

https://webcache.googleusercontent.com/search?q=cache:zSgWJuQ6SiAJ:https://www.pageonepower.com/+&cd=1&hl=en&ct=clnk&gl=us

However, if you removed the “www.” like so:

https://webcache.googleusercontent.com/search?q=cache:zSgWJuQ6SiAJ:https://pageonepower.com/+&cd=1&hl=en&ct=clnk&gl=us

You’d see a live cache!

This trick worked both ways:

The SERP drop-down 404-cache for Moz’s website is:

https://webcache.googleusercontent.com/search?q=cache:0mHt2cKI_FIJ:https://moz.com/+&cd=1&hl=en&ct=clnk&gl=us

But, when you added a “www.” like so:

https://webcache.googleusercontent.com/search?q=cache:0mHt2cKI_FIJ:https://www.moz.com/+&cd=1&hl=en&ct=clnk&gl=us

You’d see a live cached version! How strange is that?

And perhaps the most mystifying thing about it? Google never mentioned it. This peculiarity was never addressed or acknowledged as an issue, despite many conversations surrounding the issue between SEO professionals. And one day, it was all fixed again.

Later, it was determined that the most likely cause of the issue was the recent switch to mobile-first indexing. This had likely triggered an unexpected chain reaction which happened to impact caching on desktop.

The lesson learned? Sometimes we just really don’t know what’s going on behind the SERPs, but SEOs will do their best to figure it out — and often, they’ll find a solution.

Exploring Hidden Elements

We love Google Search Console’s Fetch and Render tool. It’s a great way to look at what Googlebot picks up on your site and what it thinks users are seeing. But looking at this tool can turn into an adventure when content is unexpectedly hidden!

Pleasing webpage design and UX can sometimes seem to be at odds with SEO best practices. A well-designed site that utilizes JavaScript might look great to users, but hiding content can have an impact on your rankings. The verdict is out on exactly how hiding your content affects the way Google assesses those pages, but from what we’ve been told by Google representatives, there is good reason to believe that hidden content has less weight in rankings. Considering this, it’s important to ensure any content that is considered essential to a page is placed in highly-visible places on your site.

Sometimes, however, content on your site may be hidden unexpectedly and unintentionally. There are common reasons content might be hidden: whether served via JavaScript or iframe, or blocked in a robots directive, an SEO expert can often spot the symptoms of hidden content before seeing it laid bare by the tools.

Luckily, the Fetch and Render tool will give you a couple hints about what content it can’t see.

Exploring Fetch and Render is the “gold standard” of what the Googlebot sees; taking a peek can send you down a rabbit hole of hidden options.

Exception: not all CSS or JavaScript-delivered content is treated the same by Googlebot, even though it may be able to see it the same.  

Disobeyed Directives

There are a few ways you can tell Googlebot how to treat your site. The most common ways are in a */robots.txt or in <head> of your page. If you have one, you can find these pretty easily.

Examples of How to Direct Googlebot

yoursiteurl.com/robots.txt is where crawlers go first, and where you can tell those robots what to do on your site.

There are also page-specific directions like:

<meta name="robots" content="index, follow">

If put in the header, this is a request for the robots going through the page to index that page and follow the links on it.

But every once in a while, we see the Googlebot ignoring the directions given to it.

While it’s definitely a mystery, and initially causes alarm, after a deep dive we sometimes find that the blocked page is incredibly useful to users — and perhaps the roadblocks in front of Google should be changed.

Evaluating user information can be a great way to watch for pages that are especially useful that should be highlighted in your internal link structure, and that should be evaluated again to make sure they aren’t being hidden by your directions to Google.

When we do website audits, there are always areas that send us down a whirlwind of discovery.

Every audit is an adventure, and each website is as unique as the organization it represents. Getting to the bottom of a technical issue can have a huge impact on the success of a site, so while our hope is that we’ll discover our client’s sites are already technically optimized, it’s always exciting to discover ways to help them improve as well. Sometimes, we discover oddities like these along the way.

These three are just a few that have taken us by surprise in 2018. Hopefully, there are many more interesting times to come!

Mary Wilson

Mary Wilson is a Technical SEO Specialist at Page One Power. She loves lazy adventures, walks through the woods, and endless learning. She's the 'Elle Woods' of SEO and is excited to see people reading a bio. Follow her on LinkedIn.