We’ve been in the SEO space long enough to know that it's links that really move the needle, but the need also needs to be technically ready to move. Over time, we’ve added a robust tech team to our quiver of services because it’s so important that websites are ready to rank once link velocity starts rolling in.
What we’ve found is that most technical SEO audits follow the same script. Crawlability, indexing, site speed, and core web vitals. Maybe a quick pass at redirect chains if you're feeling thorough. Check the boxes, export the report, send it to the dev team, wait six months for nothing to get fixed.
Now we have a whole other reason to push technical SEO because of AI. The Sites showing up in AI Overviews aren't just technically clean; they're technically sound for AI search. Here is the difference. A “Clean” website still means and always means a website that has no crawl errors, fast load times, and appropriate technical markup. The new idea is this thought of “Legible,” which means the machines reading your content can understand what it is, who made it, and whether it can be cited with confidence.
Schema markup is what bridges those two things. And most audits treat it as an afterthought, if they address it at all.
This checklist fixes that. Work through it section by section. Schema gets its own chapter because in 2026, it deserves one.
Section 1: Crawlability and Indexing
This is still the foundation. Nothing else matters if Googlebot and the other or any of the growing fleet of AI crawlers can't get to your pages.
robots.txt
Let's start with the old standard robots.txt. Start by pulling up your robots.txt and read it like it's a code of conduct document. Most robots.txt files haven't been touched in years, and definitely before AI crawlers existed. If you aren't checking the basics of this item, you may be inadvertently blocking bots that could be citing your content, or you may be leaving the door wide open to training bots consuming your content without sending a single visitor back.
Check for:
- Is Googlebot and Bingbot given full access to your important pages?
- Are low-value URLs (faceted navigation, internal search, parameter URLs) blocked appropriately?
- Have you made an intentional decision about AI crawlers? GPTBot, ClaudeBot, and PerplexityBot each have a separate user-agent string. Blocking training bots while allowing retrieval bots is now a legitimate strategic call, not paranoia.
- Is your sitemap referenced at the bottom?
If your robots.txt looks identical to what you set up three years ago, update it this week.
XML Sitemaps
- Does your sitemap contain only indexable, canonical URLs?
- Are 404s, redirects, and noindexed pages excluded?
- Is it segmented by content type for large sites (articles, products, categories)?
- Is it auto-updating when new content is published? Content freshness is a real signal now -- for both crawlers and LLMs.
Crawl Budget
Crawl budget used to be an enterprise problem. It isn't anymore. AI bots are aggressive, and even mid-sized sites are seeing server strain from the combined load of Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, and a dozen lesser-known scrapers hitting the same pages.
Run a log file analysis. Look at what's being crawled and by what. You may find a significant percentage of crawl activity is going to pages that contribute nothing -- paginated archives, duplicate filtered views, old staging URLs that never got cleaned up.
Every crawler cycle wasted on junk is a cycle not spent on your content that matters.
Indexing
- Run a site:yourdomain.com check against your actual page count. A big discrepancy means indexing issues.
- Check Search Console's Pages report for "Discovered but not indexed" and "Crawled but not indexed" -- these are often symptoms of crawl budget issues or quality signals Google doesn't like.
- Verify that canonical tags are implemented correctly and pointing to where you intend. Canonical adoption across the web is still only around 67% -- and misconfigured canonicals silently funnel authority to the wrong pages.
Section 2: Rendering and JavaScript
If your site is JavaScript-heavy, this section can silently invalidate everything else in your audit.
Google renders JavaScript, but it does so in a two-wave process that can delay indexing by days or weeks. More critically, AI retrieval bots are often less capable than Googlebot at rendering JavaScript at all. If your content is dependent on client-side rendering to appear, parts of your site may be effectively invisible to the systems you most want to read it.
Check for:
- Is critical content (headings, body copy, internal links) present in the raw HTML before JavaScript executes?
- Use Google's Rich Results Test and URL Inspection tool to compare rendered vs. raw HTML. If they look different in meaningful ways, that's a rendering problem.
- After Google's December 2025 Rendering Update, pages returning non-200 status codes may be entirely excluded from the rendering pipeline. If your site uses client-side JavaScript to display anything on error pages -- recommended products, redirects, helpful content -- Googlebot may never see it. Audit your status code behavior.
- Are you using server-side rendering (SSR) or static generation for key landing pages? If not, it's worth evaluating. SSR is the clearest path to making your content accessible to the widest range of crawlers, AI, and otherwise.
Section 3: Site Architecture and Internal Linking
Structure tells machines how your content relates to itself. It's one of the clearest signals you have for establishing topical authority -- and it's free.
- Is your site hierarchy logical and shallow? Pages should be reachable within three clicks from the homepage. Deep architectures mean crawlers prioritize pages higher up.
- Are your most important pages receiving the most internal links? Run a crawl and look at internal link distribution. Orphaned pages -- pages with no internal links pointing to them -- are invisible to crawlers regardless of how good the content is.
- Do your internal links use descriptive anchor text? "Click here" and "learn more" waste the opportunity to communicate topic relevance.
- Is there a logical relationship between your pillar content and supporting pages? Content clusters don't just help with topical authority -- they help AI systems understand the conceptual relationships between your pages, which influences citation behavior.
- Check for redirect chains. Any internal link pointing to a URL that redirects to another URL is leaking equity. Clean these up systematically.
Section 4: Core Web Vitals and Page Experience
Google has used Core Web Vitals as a ranking signal since 2021. They're table stakes at this point, not a differentiator -- but failing them is still a handicap, and a surprising number of sites still do.
Focus on the three metrics that matter:
LCP (Largest Contentful Paint) -- should be under 2.5 seconds. The most common culprits are large, unoptimized hero images, render-blocking resources, and slow server response times.
INP (Interaction to Next Paint) -- replaced FID in March 2024. Measures responsiveness to user interactions. JavaScript-heavy sites tend to struggle here. If your INP is above 200ms, dig into your JavaScript execution.
CLS (Cumulative Layout Shift) -- should be under 0.1. Usually caused by images without defined dimensions, late-loading ads, or web fonts that swap after load.
Use PageSpeed Insights and the Chrome User Experience Report (CrUX) for field data. Don't rely solely on lab data -- real-world performance is what Google actually measures.
A fast, stable, responsive page isn't just good for rankings. It's good for downstream signals (dwell time, return visits, engagement) that feed into the broader authority picture, which AI systems are increasingly weighing.
Section 5: Schema Markup (The Part Most Audits Skip)
Here it is. The section that turns a solid technical audit into a 2026-ready one.
Schema markup is structured data that tells machines—search engines and AI systems—explicitly what your content is. Not what it looks like. Not what words it uses. What it is. And in an era where AI search is assembling answers from content it can confidently interpret and attribute, "what it is" matters more than ever.
Most audits treat schema as a nice-to-have. Check if there's any markup, note a few missing types, and move on. That's not enough anymore. Schema is infrastructure, and it needs to be audited with the same rigor as crawlability or Core Web Vitals.
Work through each type methodically:
Article / BlogPosting
Every content page on your site should have Article or BlogPosting markup. At minimum, it should include: headline, author (as a Person entity, not just a name string), datePublished, dateModified, and publisher.
Why dateModified matters: AI systems weigh content recency. A page with a recent dateModified in its structured data is a more citable object than one with no date signal.
Check: Are author fields populated with Person schema, or just a plain text name? Plain text name strings don't build entity associations. A Person markup with a name, URL, and sameAs links to authoritative profiles does.
Person
If your site has contributors, authors, or subject matter experts, every one of them should have a Person schema entity -- ideally on a dedicated author page. Name, jobTitle, URL, and sameAs links to LinkedIn, Wikipedia if applicable, and other authoritative profiles.
This is how you build machine-readable credibility. E-E-A-T isn't just a content quality framework—it also includes a structural data component. Anonymous content is invisible to AI systems trying to decide whether a source is trustworthy.
Organization
Your homepage or About page should include Organization markup with name, URL, logo, contactPoint, and sameAs links to your verified social profiles, as well as any authoritative external references (Crunchbase, Wikipedia, industry directories).
Organization schema builds entity recognition. It's how AI systems learn that your website, your LinkedIn page, your news mentions, and your Google Business Profile are all the same entity. Without it, you're a collection of disconnected signals rather than a coherent brand.
FAQPage -- Read This Carefully
The FAQ schema was demoted in late 2025. Google restricted rich FAQ results in SERPs to government and health sites. This catches many SEOs off guard because FAQPage markup was a staple recommendation for years.
Here's what this means for your audit: FAQPage schema is no longer worth implementing for SERP rich results. But it's still worth evaluating for AI search. The FAQ format -- question followed by a direct answer -- maps cleanly to how AI systems construct responses. Structured Q&A content remains among the most cited in AI-generated answers. Whether you keep FAQPage markup on existing pages or remove it is a judgment call; just don't implement it expecting SERP rich results that no longer exist for most sites.
HowTo
If you have step-by-step tutorials or instructional content, the HowTo schema should be on those pages. Step-by-step content is heavily cited in AI Overviews. Mark it up so the structure is explicit, not inferred.
BreadcrumbList
Simple, low-effort, high-value. The BreadcrumbList schema clarifies the site hierarchy to crawlers and reinforces your internal architecture signal. If you don't have it, add it.
Product (if applicable)
For e-commerce or product-adjacent sites, a Product schema with complete attributes—name, description, price, availability, reviews—is critical. AI shopping agents and product-focused AI answers pull directly from structured product data. Incomplete markup means incomplete representation in AI-generated product answers.
Validation
Once you've audited coverage, validate everything. Use Google's Rich Results Test on a representative sample of page types. Check Search Console's Enhancements reports for markup errors and warnings. Fix errors before anything else -- invalid schema can suppress rich results and confuse the very systems you're trying to signal.
Section 6: Security and Accessibility Basics
These are table stakes but worth confirming in every audit.
- HTTPS is now at 91%+ adoption across the web. If any pages are serving over HTTP or generating mixed content warnings, fix them.
- Mobile-friendliness: test with Google's Mobile-Friendly Test. Mobile-first indexing has been the default for years -- a site that doesn't render well on mobile is being indexed from a degraded mobile version.
- Core accessibility basics (proper heading hierarchy, alt text on images, descriptive link text) overlap meaningfully with both SEO and AI legibility. A heading hierarchy that makes semantic sense to a screen reader also makes semantic sense to an LLM.
Running the Audit: A Practical Order of Operations
If you're working through this for the first time on a site, prioritize in this sequence:
- Crawlability and indexing -- nothing else matters if bots can't get in
- Rendering -- confirm content is accessible in raw HTML before JavaScript runs
- robots.txt update -- make deliberate decisions about AI crawlers now, not later
- Schema coverage audit -- map what you have against what you should have
- Schema implementation and validation -- close the gaps, fix the errors
- Core Web Vitals -- address LCP and INP issues with dev support
- Internal linking -- fix orphans, clean redirect chains, improve anchor text
- Site architecture review -- confirm hierarchy, cluster relationships, crawl depth
The schema section sits higher than most audits would put it. That's intentional. In 2026, schema is no longer the polish you apply after everything else is clean. It's part of the foundation.
A technically clean site that machines can't interpret is an opportunity missed. A technically clean site with complete, accurate structured data is one that's positioned to show up not just in search results, but in the answers those results are increasingly being replaced by.
