llms.txt: Do AI Engines Actually Read It?

Every few months a new file lands at the root of websites, promising to fix how machines read the web. The latest is llms.txt. The pitch is simple and appealing: drop a Markdown file at yourdomain.com/llms.txt, list your most important pages in a clean, summarized form, and AI models will use it to understand your site. No more guessing whether ChatGPT or Claude parsed your navigation correctly. You hand them a curated map.

It is a good idea on paper. The problem is the gap between the proposal and the reality. As of 2026, there is no public evidence that any major AI engine fetches llms.txt at crawl time. The companies behind ChatGPT, Claude, Gemini, and Perplexity have not announced support. Server logs from sites that added the file show, in many cases, zero hits from AI bots. This guide walks through what llms.txt claims to do, where it came from, whether the big models read it, what the server log evidence actually shows, why some people add it anyway, and what AI engines genuinely rely on when they cite your site.

What llms.txt Claims to Do

The llms.txt proposal asks website owners to publish a Markdown file at the domain root. The format is loose but conventional: an H1 with the site or project name, a short blockquote summary, then sections of links to key pages, each link optionally followed by a one line description. A companion convention, llms-full.txt, holds the actual expanded content rather than just links.

The stated purpose is to give large language models a clean, low noise version of a site. A normal web page is wrapped in navigation, ads, cookie banners, scripts, and boilerplate. An LLM with a limited context window has to wade through all of that to find the substance. llms.txt, the argument goes, is the substance with the wrapper removed: here are the pages that matter, here is what each one is about, here is the canonical text if you want it.

There is a real problem underneath this. Context windows are finite. Rendering and cleaning HTML is expensive. A curated index would, in principle, help a model spend its budget on signal rather than markup. The proposal is not solving a fake problem. It is proposing a fix that, so far, the parties who would have to implement it have not adopted.

It is worth being precise about what llms.txt is not. It is not part of the Robots Exclusion Protocol. It does not block or allow anything. It is not robots.txt for AI. It does not influence training data collection. It is purely a hint file, and a hint only works if the recipient is listening.

Where It Came From

The llms.txt proposal was published in September 2024 by Jeremy Howard, co-founder of Answer.AI and fast.ai, and a well known figure in the machine learning community. The proposal site, llmstxt.org, lays out the format and the rationale. Howard’s framing was pragmatic: LLMs are increasingly used to read and reason about websites, the web is built for browsers not models, so let site owners offer a model friendly version.

The idea spread quickly through developer tooling. Documentation platforms like Mintlify added automatic llms.txt generation. Frameworks and static site generators shipped plugins. Within months you could find llms.txt files on the docs sites of major developer products. A directory of sites publishing the file appeared. The convention had momentum in the tooling layer almost immediately.

What it did not get, in the months that followed, was adoption by the consumers it was designed for. OpenAI, Anthropic, Google, and Perplexity did not announce that their crawlers or retrieval systems fetch llms.txt. No documentation from those companies references it. The proposal exists, the files exist, the tooling exists. The reading end of the pipe is the part that has not connected.

This is not unusual for web standards. robots.txt took years to become universal. Schema.org needed Google, Bing, Yahoo, and Yandex to jointly back it before it mattered. A file format proposed by one person, however respected, becomes meaningful only when the large platforms decide to honor it. So far, with llms.txt, they have not said they do.

Do the Big Models Actually Read It?

Four CRT terminal screens, each showing a status line about whether an AI engine fetches llms.txt, risograph retro style

Here is the state of play as of 2026, engine by engine. The honest answer for every one of them is some version of “no confirmed support.”

Engine	Crawler	Fetches llms.txt?	Notes
ChatGPT (OpenAI)	GPTBot, OAI-SearchBot, ChatGPT-User	No confirmed support	OpenAI’s bot docs describe `robots.txt` handling, not `llms.txt`. No statement that retrieval reads the file.
Claude (Anthropic)	ClaudeBot, Claude-User	No confirmed support	Anthropic’s crawler documentation references `robots.txt`. No mention of `llms.txt`.
Gemini (Google)	Googlebot, Google-Extended	No confirmed support	Google representatives have publicly said Google does not use `llms.txt`. Search and AI Overviews rely on the normal crawl.
Perplexity	PerplexityBot, Perplexity-User	No confirmed support	Perplexity documents `robots.txt` behavior. No `llms.txt` support announced.
Copilot (Microsoft)	Bingbot	No confirmed support	Bing’s crawling docs do not mention `llms.txt`.

A few clarifications, because this table gets misread.

First, “no confirmed support” is not the same as “the file is forbidden.” Nothing bad happens if you publish llms.txt. It just sits there.

Second, an AI agent acting on a user’s behalf is a different case from a crawler building an index. If you tell ChatGPT or Claude “read yourdomain.com/llms.txt and summarize it,” it will fetch that exact URL because you asked it to, the same way it would fetch any URL you name. That is not the model discovering and preferring the file on its own. It is the model following an explicit instruction. People sometimes cite that behavior as proof that “Claude reads llms.txt.” It is proof that Claude can fetch a URL you hand it, which was never in doubt.

Third, Google has been the most direct. Google representatives have stated that Google does not use llms.txt and has no plans to. For a search engine that also runs the most widely used AI answer surface, that is a strong signal about where the format stands.

What the Server Logs Show

The cleanest test of whether AI engines read llms.txt is to add the file, wait, and look at who fetched it. Several site owners have done exactly this and published the results. The pattern is consistent and unflattering for the format.

The “zero hits” case is the common one. A site adds llms.txt, leaves it in place for weeks or months, then greps the access logs for requests to /llms.txt. The result, repeatedly reported on developer forums and in blog write ups, is that the only fetches come from a few sources, and AI training and search crawlers are usually not among them.

When /llms.txt does get requested, the requesters tend to fall into predictable buckets:

Curiosity traffic. Developers who heard about the file and want to see if a given site has one. Browsers, curl, the occasional headless tool.
Directory and aggregator bots. Services that catalog which sites publish llms.txt. They fetch the file to list it, not to feed a model.
SEO and monitoring tools. Crawlers that check for the file as a box on a checklist, the same way they check for sitemap.xml or humans.txt.
The site’s own monitoring. Uptime checks, the owner testing the URL.

What is usually missing from that list is GPTBot, ClaudeBot, PerplexityBot, or Googlebot fetching /llms.txt as part of their normal crawl. They fetch robots.txt. They fetch your pages. They fetch sitemap.xml. They do not, in the logs people have published, reliably fetch llms.txt.

This is not a definitive proof of a universal negative, and crawler behavior can change without announcement. But the burden of evidence runs the other way. If the major engines were quietly reading llms.txt, it would show up in logs across many sites. It does not. The simplest reading is that they are not. If you want to settle the question for your own domain, the method is trivial: add the file, then watch your logs. A regular crawl and log audit is the same routine you should already be running to see which bots reach which URLs.

So Why Do People Add It Anyway?

Given all that, plenty of sites still ship llms.txt. The reasons are not all irrational.

Optionality and low cost. Generating the file is often a one click feature in the docs platform or a small build plugin. If the format ever does get adopted, the file is already there. The cost of being early is close to zero. The cost of being late, if it matters, is a config change. People hedge.

Documentation hygiene. Producing a clean, link rich, summarized index of a site is a useful exercise regardless of who reads it. Some teams find the llms.txt they generated is a better sitemap for humans than their actual sitemap. The file has value as a byproduct even if no model fetches it.

Marketing and signaling. Publishing llms.txt says “we are thinking about AI” to a certain audience. For a developer tools company, that signal has some value. It is the same logic that put humans.txt on a wave of sites a decade ago.

Misunderstanding. Some sites add it because a blog post or a vendor implied that AI engines read it and that not having it hurts visibility. That premise is not supported by current evidence. This is the category worth being careful about, because it leads to time spent on the file that would do more good spent elsewhere.

The honest framing: llms.txt today is a bet on a future convention, plus a side benefit as a clean index. It is not, in 2026, a working channel into AI answers. Calling it a “scam” is too strong. It was proposed in good faith by a credible person to solve a real problem. But anyone selling it as a current ranking or visibility lever in AI search is overstating what the evidence shows.

What AI Engines Actually Use

Schema markup, sitemap, and robots.txt icons wired together with an electric current flowing toward an AI engine, risograph retro style

If llms.txt is not the channel, what is? The same infrastructure that has worked for search, with a few AI specific wrinkles. Four things carry real weight.

robots.txt. This is the file AI crawlers genuinely fetch first. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and the rest read robots.txt and adjust their behavior. If you want to influence what AI models can access, this is the lever that exists today. The block versus allow decision, and the full list of AI bot names, are covered in the complete guide to robots.txt and AI crawlers. Note the asymmetry: robots.txt is read and honored, llms.txt is, at best, ignored.

Structured data. Schema.org markup gives machines explicit, unambiguous facts about a page: this is an article, here is the author, here is the publish date, this is a product, here is the price and the rating. AI answer engines lean on structured data the same way search does, because it removes interpretation from the equation. A page that states its facts in JSON-LD is easier for a model to cite correctly than a page where the same facts are buried in prose. The mechanics are in the schema markup guide for SEO and AI search.

XML sitemaps. Sitemaps are how crawlers, AI ones included, discover the full set of URLs you want known. A complete, current, valid sitemap is a far more reliable way to make sure a model sees your important pages than a llms.txt it does not fetch. This is the discovery layer that actually functions.

Readable, well structured content. The most underrated factor. AI models read your actual pages. A page with a clear heading hierarchy, a direct answer near the top, short scannable sections, and minimal junk between the user and the substance is easier to extract from and more likely to be quoted. This is, ironically, exactly what llms.txt was trying to provide as a separate file. The thing is, you can just build your real pages that way. Clean HTML, semantic headings, content that answers the question without making the reader dig. That is the version of “model friendly content” that works today, because it lives at the URLs the crawlers already fetch.

The broader playbook for being cited in AI answers, ranking in Google and AI surfaces at the same time, is the Generative Engine Optimization guide. And if you want to see which AI bots actually reach your site and which URLs they fetch, that is a crawl and log question, the kind the complete guide to SEO crawlers walks through.

The Verdict

Putting the pieces together:

The claim: AI engines read llms.txt to understand your site. Status: unsupported by current evidence. No major engine has confirmed it. Google has explicitly said it does not.
The history: proposed by Jeremy Howard in September 2024, adopted fast by docs tooling, not adopted by the AI platforms it targets.
The logs: sites that add the file commonly see zero fetches from AI training or search crawlers. The traffic that does arrive is curiosity, directories, and monitoring.
The cost of adding it: near zero. It will not hurt you. It might help if the convention ever lands.
The risk: treating llms.txt as a working visibility lever and skipping the things that actually work, robots.txt, structured data, sitemaps, clean content.

So: add llms.txt if it is a one click option and you like keeping a clean index. Do not add it expecting AI engines to read it. Do not pay anyone to “optimize” it for you. And do not let it distract from the four things that genuinely move the needle in AI search, all of which live in files and pages the crawlers already fetch.

“Is llms.txt a scam?” No. It is a sincere proposal for a real problem that has not been adopted by the people who would need to adopt it. It is a bet, not a tool. Treat it like one.

Conclusion

llms.txt is the kind of idea that should work and does not, yet. The web genuinely is built for browsers, not models, and a curated, clean index would genuinely help. But a hint file is worthless without a reader, and as of 2026 the major AI engines are not reading it. Google has said so directly. Server logs across many sites back that up. The file is harmless and cheap to publish, and it doubles as a tidy site index, so there is no reason to fight it. There is also no reason to believe it is doing anything for your AI visibility.

What does the work is unglamorous and familiar: a correct robots.txt, structured data on every page that has facts worth stating, a complete and valid XML sitemap, and pages built so a machine can read them without digging through clutter. Those are the files and pages AI crawlers actually fetch. If you want to verify which bots reach your site, which URLs they hit, and whether your structured data and sitemaps are in order, download Seodisias and run a crawl on your own machine. It runs locally, has no URL limit, and reports the signals that AI engines actually use, no upload, no sign up, all data stays with you.

llms.txt: Do AI Engines Actually Read It?

What llms.txt Claims to Do

Where It Came From

Do the Big Models Actually Read It?

What the Server Logs Show

So Why Do People Add It Anyway?

What AI Engines Actually Use

The Verdict

Conclusion

Related Posts

The Complete Guide to robots.txt: Rules, Examples, and AI Crawlers

Schema Markup for SEO and AI Search: A Practical Guide

Generative Engine Optimization (GEO): How to Rank in AI Search and Google at the Same Time