What is llms.txt? The complete guide to llms.txt files

Deniz Ozcan

April 7, 2026

10 mins

Article

llms.txt is a markdown file placed at your website's root that tells AI models which pages matter most and where to find clean, readable versions of your content.

Key takeaways

llms.txt is a curated markdown index hosted at yoursite.com/llms.txt that guides AI systems to your most important content
It is not the same as robots.txt (which blocks access) or sitemap.xml (which lists all pages). llms.txt is inclusive and selective.
Adoption is still early, which means there's a real first-mover advantage for teams that implement it now
AI coding tools like Cursor, documentation platforms like Mintlify, and frameworks like Google's A2A protocol already parse llms.txt, and adoption is accelerating across developer and enterprise ecosystems
Implementation costs nothing, takes under an hour, and positions your content for the direction AI retrieval is heading
Standalone generators create the file once but can't track whether it's working. An AEO platform handles generation, ongoing audits, and visibility measurement in one workflow.

Introduction

AI systems don't crawl your website the way Google does. They pull content on demand during a conversation, and if your best pages are buried behind JavaScript layouts, navigation menus, or gated sections, they get skipped. The result: AI gives users incomplete or inaccurate information about your brand.

The llms.txt file is a proposed fix for this problem. It gives AI models a single, structured document that points them to your most important content. Think of it as a curated reading list for machines.

But here's the honest picture. Adoption is still early. Most websites haven't implemented llms.txt yet, and the standard is young and still evolving. But momentum is building fast: AI coding tools like Cursor actively parse it, documentation platforms like Mintlify generate it for all customers, and Google included it in their Agents to Agents (A2A) protocol.

So why bother? This guide covers what llms.txt actually is (sometimes written as llms txt or llm txt in search), how AI models interact with it, how to set it up properly, and how it connects to your broader answer engine optimization strategy.

What is llms.txt and what does it do

An llms.txt file is a plain-text markdown document hosted at your website's root directory (e.g., yoursite.com/llms.txt) that gives AI systems a curated index of your most important content. It tells language models what your site is about, what pages matter most, and where to find clean, readable versions of that content.

The concept was proposed by Jeremy Howard, co-founder of Answer.AI, in September 2024. His argument was simple: site authors know their content best, and giving them a way to guide AI retrieval would produce better results than letting models figure it out on their own.

The file follows a specific markdown structure:

An H1 header with the site or brand name
A blockquote summarizing what the site does
Freeform text for additional context (key terms, product categories, anything that helps AI understand scope)
H2-delimited sections grouping links to important pages, each with a short description
A reserved "Optional" section for secondary content that can be dropped when context window space is tight

This is different from the other standard files you might already have on your site. robots.txt controls which crawlers can access which parts of your site. It's restrictive. sitemap.xml lists all your indexable pages for search engines. It's comprehensive. llms.txt does neither of those things. It's a curated guide that says "start here" to any AI system reading your site.

There is also llms-full.txt, a companion file that contains the complete text content of your key pages in a single markdown document. While llms.txt provides an index with links, llms-full.txt gives AI systems the full content without needing to follow any links. This is useful for large-context models that can ingest more information at once.

How LLMs use llms.txt files during inference

When an LLM or an AI-powered agent needs information from a website, it can fetch the site's llms.txt to figure out which pages are most relevant to the user's query. This lets the model skip the slow, messy process of crawling raw HTML and go straight to clean, structured content.

The process works in three stages:

User asks a question

Stage 1: Fetch /llms.txt

Get the site map of what content exists and where

Stage 2: Retrieve linked markdown

Pull clean content for sections that match the query

Stage 3: Filter by context window

Drop "Optional" resources if space is limited

Generate answer from clean sources

Example: "How do I set up auth with Cloudflare Workers?"

Fetch llms.txt → find Workers section → pull auth docs in markdown → answer

First, the model or its orchestration framework (like a RAG system) fetches /llms.txt from the site's root. This gives it a map of what the site offers and which pages cover which topics.

Second, it retrieves the linked markdown files for the sections that match the query. These files strip out nav menus, ads, cookie banners, and JavaScript, leaving only the actual content.

Third, if the model's context window is limited, it drops anything marked as "Optional" in the llms.txt file and focuses on the prioritized resources.

For example, if a developer asks an AI coding assistant "How do I set up authentication with Cloudflare Workers?", the assistant could fetch Cloudflare's llms.txt, find the Workers section, pull the authentication docs in markdown, and generate an answer from clean source material instead of parsing a complex HTML page.

The strongest confirmed use cases today are in AI coding assistants like Cursor and in documentation platforms like Mintlify, where the file is actively parsed during development workflows. Google included llms.txt in their Agents to Agents (A2A) protocol, signaling growing institutional interest in the standard. As agentic AI workflows become more common, the number of systems that parse llms.txt is only going to grow.

The takeaway: llms.txt already has real traction in developer tools and AI agent frameworks, and the trajectory points toward broader adoption. Having the file in place now means your content is ready as more systems start looking for it.

Where to place your llms.txt file

Place your llms.txt file in your website's root directory so it's accessible at yoursite.com/llms.txt. The file should be UTF-8 encoded and written in plain markdown.

Here's what the setup process looks like in practice:

llms.txt Guide

Step	What to do
Create the root file	Write your llms.txt following the structure from the specification at llmstxt.org. Keep it focused. Aim for 20 to 50 links maximum. This is a curated index, not a duplicate of your sitemap.
Consider llms-full.txt	If you want to give AI systems the full text of your key pages in one document, create a llms-full.txt alongside the root file. Especially useful for smaller sites where everything fits in a single context window.
CMS-specific options	WordPress: Yoast SEO supports llms.txt natively, or use the Website LLMs.txt plugin (30,000+ installs). Mintlify: generates llms.txt automatically. Webflow: create the file manually and upload to your site root.
Common mistakes to avoid	Listing every page defeats the purpose. Content behind login walls is pointless. Letting the file go stale points models to outdated content. Vague descriptions like "Information about our product" waste your one chance to tell AI what each page covers.

Real llms.txt examples from live websites

The best way to understand llms.txt is to look at how companies that actually use it have structured their files. Here are three implementations that show different approaches to the same problem.

Cloudflare runs one of the most comprehensive llms.txt setups on the web. Their root file at developers.cloudflare.com/llms.txt is organized by product vertical: Workers, Pages, R2, AI Gateway, Agents, and dozens more. Each product section links to that product's own llms.txt, which then lists every documentation page in markdown format. They also offer per-product llms-full.txt files and support markdown content negotiation through HTTP headers. This is the enterprise-scale approach: deep, hierarchical, and built for a platform with hundreds of products.

Mintlify takes a documentation-first approach. As a docs platform, they generate llms.txt files automatically for all customers and maintain their own at mintlify.com/llms.txt. Their file is tighter and more focused than Cloudflare's because the scope is narrower. It's a good model for companies with a single product and strong documentation.

Cursor, the AI coding IDE, structures its llms.txt around developer workflows. The file prioritizes integration docs, keyboard shortcuts, and configuration guides because that's what AI coding assistants actually need to reference during a session. It's a good example of building the file around your users' real questions rather than your own site architecture.

The pattern across all of these: the best llms.txt files are specific in their descriptions, organized by how users (or AI agents) actually look for information, and ruthlessly curated. They all keep the "Optional" section for content that's useful but not essential.

How llms.txt fits into your SEO and AEO strategy

llms.txt works best when it's part of a broader answer engine optimization strategy, not treated as a standalone SEO tactic. It's an access and discovery layer that makes your optimized content easier for AI to find and process. Think of it this way:

Schema markup (JSON-LD) tells search engines what your content means
Structured content with direct answers makes your pages citable by AI
llms.txt tells AI systems where your best content lives

Each layer solves a different problem. Schema handles semantics. Structured content handles quality. llms.txt handles discovery. None of them work as well in isolation as they do together.

The biggest gap most teams have after implementing llms.txt is measurement. They create the file, upload it, and then have zero visibility into whether it's actually making a difference. There's no built-in analytics, no tracking, and no feedback loop.

This is where an AEO platform changes the equation. Instead of a standalone generator that creates the file once and leaves you on your own, an AEO platform handles the full lifecycle. It can generate the file based on which pages AI is already citing or ignoring, so the llms.txt reflects real visibility data, not guesswork. It continuously audits whether your file is up to date, flags when content goes stale or pages get removed, and tracks changes in your AI visibility after implementation.

The practical difference: with a standalone generator, you're setting up llms.txt as a one-time project that slowly decays. With an AEO platform, you set it up once and the ongoing maintenance, auditing, and measurement happen automatically. You know exactly which pages AI is picking up, which ones it's missing, and whether the file is actually moving the needle on your AI visibility.

Best llms.txt generators and tools

An llms.txt generator (also called an llms.txt file generator) is a tool that scans your website and outputs a formatted llms.txt file based on your pages, metadata, and content structure. Several options exist, ranging from free one-time generators to full AEO platforms that handle the ongoing lifecycle.

AEO platforms. Cognizo generates your llms.txt informed by real AI visibility data, so the file reflects which pages AI is already citing or missing rather than guessing based on metadata alone. It then continuously audits whether the file is up to date, flags when content changes, and tracks visibility impact after implementation. This is the difference between a one-time file and an ongoing workflow.

Standalone generators. Firecrawl's generator (llmstxt.firecrawl.dev) crawls your site and uses AI to generate descriptions for each page. SiteSpeakAI offers a free generator that requires no signup. LLMrefs provides a generator that performs a deep site scan and groups pages by type. Writesonic also has a free generator with similar functionality.

CMS plugins. WordPress users can use Yoast SEO's built-in llms.txt feature or the dedicated Website LLMs.txt plugin. Mintlify generates the file automatically for documentation sites. Docusaurus, Astro Starlight, VitePress, Hugo, and MkDocs all have community plugins that support llms.txt generation.

Open-source tools. The official llms_txt2ctx CLI tool from the specification's creators parses and validates llms.txt files. There are also reference implementations in JavaScript and PHP.

These tools are good starting points. They save time on the initial file creation and handle the formatting automatically. But they share a common limitation: they generate the file once and then stop. They can't tell you whether the file is actually improving your AI visibility. They can't audit for gaps over time. They can't warn you when your content changes and the file needs updating. And they can't make editorial decisions about which pages truly matter most for AI retrieval.

If you just need a file to exist at /llms.txt, a free generator gets the job done in minutes. If you need to know whether that file is working, keep it current, and connect it to a measurement system, you'll need something more.

FAQ

What is llm.txt?

llm.txt is a common shorthand for llms.txt, which is the official name of the standard. The "llms" stands for Large Language Models (plural). The role of llm.txt (or more accurately, llms.txt) is to provide AI systems with a structured, curated index of your website's most important content so they can retrieve and process it efficiently. If you've been searching for llm.txt, you're in the right place.

Is llms.txt the same as robots.txt?

No. They serve different purposes. robots.txt controls which crawlers can access which parts of your site. It blocks or allows access. llms.txt guides AI systems toward your most important content. It's inclusive, not restrictive. Both can and should exist on the same site.

Do AI models actually use llms.txt?

Yes, and adoption is growing. AI coding assistants like Cursor actively parse it, documentation platforms like Mintlify generate it for all customers, and Google referenced it in their A2A protocol. The strongest use cases today are in AI agents and developer tools, with broader adoption expected as agentic workflows become standard.

Does llms.txt improve AI visibility or citations?

llms.txt is a discovery layer, not a content quality signal. It helps AI systems find your best content, but the content itself still needs to be structured, accurate, and citable. The real impact comes when llms.txt is part of a broader AEO strategy alongside schema markup, structured content, and AI visibility tracking.

What is llms-full.txt?

It's a companion file that contains the complete text of your key pages in a single markdown document. While llms.txt links to individual pages, llms-full.txt gives AI systems the full content without following any links. Useful for smaller sites or when working with large-context models.

How often should I update my llms.txt file?

Whenever you publish major new content, restructure your site, or retire old pages. For most sites, a quarterly review is enough. CMS plugins can automate regeneration, but manual review ensures the file stays editorially sharp. An AEO platform can handle this automatically by flagging when tracked content changes.

Does llms.txt help with Google AI Overviews?

Google has not confirmed that AI Overviews use llms.txt. Google's systems rely on their existing crawling and indexing infrastructure. The file is more relevant for non-Google AI systems like ChatGPT, Claude, and Perplexity, which handle content retrieval differently.

Conclusion

llms.txt is one of the simplest things you can do to make your content accessible to AI systems. It takes under an hour to implement, costs nothing, and puts you ahead of the vast majority of companies that haven't started yet. But the real value isn't in the file itself. It's in what the file forces you to do: decide which pages actually matter, write clear descriptions of them, and think about how AI systems interact with your content.

If you're just getting started, grab a free generator, create the file, and upload it to your root directory. That takes less than an hour. If you want to go further, plug it into an answer engine optimization strategy where the file is one layer alongside schema markup, structured content, and AI visibility tracking. That's where the compounding returns are.

‍

Ready to see where your brand stands in AI search? Track your visibility, citation sources, sentiment, and share of voice across the AI engines your buyers are already using. Start your free Cognizo trial today.