Skip to content

Agent-Native: Making Your Website Readable for AI Agents

Agent-Native: Making Your Website Readable for AI Agents
Michael Jauk
· AI & Strategy

When someone asks ChatGPT or Claude about a company today, what happens is rarely what the CEO thinks. The agent does not google, it fetches a handful of pages directly from the site. Then one of three things happens:

  1. It gets a Cloudflare bot challenge back and renders “Turnstile” HTML into the answer.
  2. It gets the full HTML including navigation, footer, cookie banner and scripts, parses out what is essential and burns half its token budget along the way.
  3. It gets a clean Markdown version of the page, directly consumable.

Case three is the exception. We rebuilt dectria.com last week to make it the rule. This post describes what the rebuild looks like, what pitfalls lurk in it and why the pattern transfers to almost any static website.

Why HTML is the wrong format for agents

A human skims a layout. An agent counts in tokens. A typical marketing page is roughly 80% boilerplate that humans take for granted: navigation, footer links, social icons, script loaders, comment widgets. None of it is relevant for an agent, and every byte raises the chance that some of that boilerplate ends up in the user’s answer.

Markdown inverts the ratio. No scripts, no CSS, no layout containers. Just headings, lists, links and prose. An agent that receives a page as Markdown produces measurably better answers at a fraction of the cost.

Two approaches, one of them wrong

The obvious route is to publish a second URL per page. /about becomes /about.md. Easy to ship, but long-term painful: two URLs per piece of content create a duplicate-content problem, force canonical tags and leave the agent guessing whether to switch paths.

The better route is a convention that has been in the HTTP standard since 1999 and is still barely used: content negotiation. A client sends an Accept header, the server picks the format. One URL, two representations.

GET /about HTTP/2
Accept: text/markdown

The server returns Markdown. If a browser calls the same URL with Accept: text/html, HTML comes back. Same URL, same canonical, no SEO conflict.

Implementation in three parts

Part 1 - The build step. After the normal site build, a post-processor walks through every generated HTML file, extracts the main content and writes a same-name .md file next to it. For dectria.com, this is roughly 40 lines of JavaScript using cheerio to parse and turndown to convert. Result: 133 Markdown files alongside 133 HTML files.

Part 2 - The middleware. Before every request, an edge function checks the Accept header. If the client explicitly names text/markdown, the function serves the .md shadow file. Otherwise it falls through to HTML. Important: generic Accept headers like */* must be treated as an HTML preference, not a Markdown wish. Otherwise every cURL user accidentally receives Markdown.

Another detail many implementations get wrong: Vary: Accept must appear on every response. Without it, Cloudflare or any reverse proxy caches the HTML version and then serves it to agents asking for Markdown.

Part 3 - Discovery. For agents to know the site offers this feature at all, three signals need to be in place:

  • /llms.txt and /llms-full.txt as a machine-readable site summary.
  • Link: headers on every response pointing at those files plus the sitemap-index.xml.
  • /.well-known/agent-skills/index.json with a SKILL.md documenting the Accept-negotiation convention.

The latter two follow established standards (RFC 8288 Link header, Agent Skills Discovery v0.2.0) and are actively consumed by modern agent frameworks.

Training yes, training no

Opening a website to agents does not automatically mean opening it to foundation-model training. That distinction maps precisely into robots.txt via Cloudflare’s Content-Signal spec, which builds on Article 4 of the EU CDSM Directive 2019/790:

User-agent: *
Content-Signal: search=yes, ai-input=yes, ai-train=no
Allow: /

Translated: search indexing allowed, real-time fetching for user answers allowed, use for training forbidden. On top of that, hard blocks remove crawlers that only collect training corpora: GPTBot, Bytespider, CCBot, Amazonbot, Applebot-Extended, Meta-ExternalAgent. The real-time crawlers ClaudeBot, ChatGPT-User and PerplexityBot remain open.

This is the distinction that gets blurred most often. ClaudeBot and GPTBot are not the same thing with different names. ClaudeBot fetches when Anthropic’s Claude is answering a user question right now. GPTBot collects data for the next training run. Treat them the same and you either sacrifice reach or give away training data.

The Cloudflare trap

If you host on Cloudflare, you run into a duplication that is never shown together in the dashboard. Cloudflare has two independent crawler-blocking layers:

  • Bot Fight Mode (Security → Bots → Settings) issues an HTML challenge to AI crawlers. This is why ChatGPT occasionally quotes “Turnstile” pages.
  • AI Crawl Control (Security → Bots → AI Crawl Control) has a per-crawler Allow/Block table. By default it blocks ClaudeBot and PerplexityBot.

Both must be configured for an agent-native site. Disable just one and you still get 403 responses, possibly in a different format. The diagnosis takes exactly as long as it takes to realise they are two separate features.

What this measures

Whether a site is agent-ready is objectively testable. isitagentready.com runs a checklist scan and returns a score from 0-100. A realistic target:

LevelCheckStatus
Basicsllms.txt presentMust
DiscoveryLink header on every responseMust
NegotiationAccept: text/markdown returns MarkdownMust
PolicyContent-Signal in robots.txtShould
Skills.well-known/agent-skills/index.jsonNice to have

The first three are enough to serve 80% of agent traffic cleanly. The last two are future-facing but can be added today with little effort.

Why this is worth doing

The number of user queries answered live by AI agents is growing faster than any other web-traffic channel right now. Ship Markdown today and you have a clean representation inside the answers produced by Claude, ChatGPT and Perplexity. Skip it and the choice is between HTML parsing with an error rate or a Turnstile challenge inside the agent’s response.

Retrofitting an existing static site takes roughly half a day. The architecture is framework-agnostic: what works for Astro works for Next, Nuxt, Hugo or 11ty. The pattern costs are one-off, the effect is permanent.

What comes next

The approach is new enough today that standards and conventions are forming in parallel. Agent Skills Discovery is at version 0.2.0, Content-Signal is a Cloudflare proposal on its way to IETF, llms.txt is a grassroots proposal without a formal RFC. So: anyone adopting it now should expect to update a version in six months. A small price for belonging to the first wave of sites that show up cleanly in AI answers rather than as parsing accidents.

To check how your own site looks to agents today, one command is enough:

curl -I -H "Accept: text/markdown" https://your-domain.example/about

If the result is 200 text/markdown, the setup works. If it is 200 text/html or 403, there is work to do.

Check your site

Is your website already agent-ready?

Get an objective score from 0 to 100 in 30 seconds, with a clear checklist of what’s in place today and where work is waiting.

Test on isitagentready.com

Share this post

Technologies

Every project starts with a conversation.

Let us talk about your individual needs and goals.

Start a project