llms.txt in 2026: what it is, how to write one, and why your site needs it

TL;DR. llms.txt is a file at the root of your site that tells language models what your project is, which resources are canonical, and what to cite. ChatGPT, Perplexity, and Claude already read it. Most sites still don't have one — which is why AI crawlers either cite them poorly or skip them entirely. The file takes about an hour to write, and the effect on AI-answer citations shows up within 1–4 weeks.

This article covers what llms.txt is, how it differs from robots.txt, the five blocks it should contain, and how to write your own in an hour.

1. Search has changed. SEO hasn't — yet

A couple of years ago, the launch checklist for a website was clear: Lighthouse 95+, sitemap.xml, schema.org markup, a clean robots.txt. Google indexes, Bing ranks, leads come in. By 2025, most web shops had started hearing the same thing from new clients: "We found you through ChatGPT" or "Perplexity recommended you."

By 2026, user behavior has shifted for good. People are reading ten blue links less and less. They ask ChatGPT, Perplexity, Claude, or Grok, get a synthesized answer with two or three source links, and act on it. If your site isn't in those source links, for a growing share of the audience, you're not in search.

Meanwhile, the SEO stack has barely moved: robots.txt, sitemap.xml, schema.org. That's enough for a search bot to find your site. It is not enough for a language model to understand what's on your site and what to cite from it.

That second job needs its own tool — llms.txt.

2. What llms.txt is, and how it differs from robots.txt

Two different files for two different jobs.

robots.txt is an instruction set for crawlers: what to index, what to skip, how often to come back. It's been around since 1994 and is honored by every major search engine, including AI crawlers (GPTBot, ClaudeBot, PerplexityBot, and Google-Extended all read it).

llms.txt is a concise description of your site, written specifically for language models. Not "index this / don't index that," but "here's what I have, here are the canonical resources, here's what's worth citing."

Technically, it's a markdown file at the root of your domain: example.com/llms.txt. The spec is open and maintained by the llmstxt.org community. The format is human-readable — you can open it and read it without a parser.

Why a separate file instead of an extension to robots.txt:

robots.txt is a directive; llms.txt is a description. Different genres.
Language models want semantic structure, not access rules.
LLMs parse markdown natively, far better than the Directive: value format.
The file can hold tens of thousands of words of canonical content (in its expanded form, llms-full.txt), which would be wildly out of place in robots.txt.

3. How AI crawlers actually work in 2026

To see why llms.txt works, it helps to understand what an AI crawler does differently.

A traditional search bot (Googlebot, Bingbot) indexes a page: it parses the HTML, pulls the content, and stores it in an index keyed by keywords and ranking signals. When a user searches for "marketplace development," the engine looks up that phrase in its index, ranks the results, and returns them.

An AI crawler works on a different loop:

Receives a user query in real time: "Which agencies build Next.js marketplaces in London with fixed-price contracts?"
Runs a search through its own engine (or a partner like Bing or Google) and pulls 5–10 candidate URLs.
Fetches each page, parses the HTML or markdown, and extracts the actual content.
Synthesizes an answer from what it found, picking 2–3 sources to cite.
Returns a short answer plus links to the user.

The key difference: an AI crawler has seconds to decide which pages to cite. It isn't sitting on top of a fat pre-built index. It's deciding right now.

In that context, llms.txt acts as a business card your site hands the model: "don't parse my homepage with its nav, hero block, and footer — here's a short description of what I have. Go to these pages for details."

Field measurements suggest that a well-written llms.txt increases the odds of being cited in AI answers by roughly 30–60% versus comparable sites without one. The biggest gains show up on long-tail queries, where traditional SEO competition is light and the AI crawler is forced to choose from less obvious sources.

4. What goes inside llms.txt: five blocks

The llmstxt.org spec is intentionally flexible. In practice it breaks into five useful blocks.

Block 1. Title and summary (required)

# Project Name

> One or two sentences describing what you do and who you do it for.

> Concrete, no marketing fluff.

The title is your project or brand name. The summary, written as a blockquote (>), is one or two sentences. This is the single most important spot in the file — the model will lift this line verbatim when someone asks it "what is this site?"

Rules for a good summary:

One to three sentences, no more.
Concrete nouns ("marketplaces," "internal portals"), not vague ones ("digital solutions").
No marketing adjectives ("premium," "best-in-class," "innovative").
State your niche and your shape of work ("studio," "boutique agency," "two-person team").

Block 2. Canonical resources (required)

## Case studies

- [Real-estate marketplace](https://example.com/cases/realty) — catalog, filters, three user roles, CRM integration

- [Next.js e-commerce store](https://example.com/cases/shop) — catalog, payments, AI visuals, 3-day launch

- [Shop with AI assistant](https://example.com/cases/ai-shop) — pricing engine, admin as ops backend, handoff to live chat

## Services

- [E-commerce stores](https://example.com/services/ecommerce)

- [Marketplaces](https://example.com/services/marketplace)

- [Business portals](https://example.com/services/portals)

This is your site's table of contents for the model. Each item is a link with a short caption explaining what's there. The captions are critical: the model uses them to decide which page is worth fetching for details.

Block 3. Optional resources (recommended)

## Optional

- [Blog: technical articles](https://example.com/blog/tech)

- [Blog: business case studies](https://example.com/blog/business)

- [About the team](https://example.com/about)

- [Contact](https://example.com/contacts)

These are secondary links the model can fall back to if the canonical ones don't cover the question. The Optional section is part of the spec and signals "lower priority, but still canonical."

Block 4. What not to cite (optional but useful)

## Do not cite

- Drafts and unpublished material

- /admin/* — internal pages

- Old blog 2020–2022 — outdated technical advice

- /proposal/* — individual sales proposals

This is not a block (use robots.txt for blocking). It's a request to the model: these pages may be in the index, but please don't cite them. Useful when you have legacy content or material you don't want surfacing in AI answers.

Block 5. Contact and metadata

## Contact

- Email: hello@example.com

- Telegram: @username

- Web: https://example.com

## Metadata

- Languages: en, de

- Location: Berlin, Germany

- Stack: Next.js, TypeScript, PostgreSQL

- Updated: 2026-05-25

The model uses contact data when answering "how do I get in touch with this company?" Without an explicit contact block, it will hunt across the site for them and sometimes get it wrong.

5. llms-full.txt: when you need the expanded version

For most sites, a single llms.txt of 500–2,000 words is enough. But there are cases where llms-full.txt — an expanded version that can run to 50,000+ words — pays off.

When it's worth the extra effort:

You have substantial technical documentation (API services, devtools).
You have a deep library of case studies or blog posts with unique content.
You want the model to answer specific questions without having to fetch individual pages.

llms-full.txt is llms.txt plus the full text of your canonical resources, glued into a single file. Same markdown structure, ## separators between sections.

In practice, 10,000–15,000 words is the sweet spot that lets Perplexity cite specific technical details from your case studies instead of vague phrases from your homepage.

6. Checklist: write your llms.txt in an hour

5 minutes. Create public/llms.txt (in Next.js, Vite, etc.) or drop the file at the root of a static site. It must be reachable at https://yourdomain.com/llms.txt with a 200 status.

15 minutes. Write the title and summary. Show it to three people outside your project. Ask: "Based on this description, do you understand what we do?" If the answer is no, rewrite it.

20 minutes. Build the canonical resources list. Walk through your important pages, grab each URL and write one line describing it. Keep the main block under 15–20 entries. If you have more pages, move the rest to Optional.

10 minutes. Add contact, languages, and metadata.

5 minutes. Mirror the file at /.well-known/llms.txt (optional but improves discovery — some AI crawlers check that location). Commit, deploy.

5 minutes. Verify against the checklist:

The file opens in a browser at https://yourdomain.com/llms.txt.
Content-Type is text/markdown or text/plain.
All internal links work — no 404s.
AI crawlers are allowed in robots.txt: GPTBot, ClaudeBot, Google-Extended, PerplexityBot, Anthropic-AI.

That's it. The file is live. In 2–4 weeks you can start measuring whether your site is showing up in AI answers.

7. Measuring impact: how to tell llms.txt is working

There aren't many direct analytics tools for AI citations yet. You measure indirectly.

Method 1. Weekly manual sampling. Run ten control queries in your niche through Perplexity, ChatGPT, and Claude. Check whether your site shows up in the sources. This is the most honest method and costs about 30 minutes a week.

Method 2. AI crawler logs. Filter server access logs for these user-agents: GPTBot, ClaudeBot, PerplexityBot. If they're hitting your llms.txt a few times a week, the file is actually being read. If they aren't, something is off with your deployment.

Method 3. Branded traffic in analytics. A sudden lift in brand-name queries without any obvious ad spend usually means people are encountering you in AI answers and then searching for the name. This signal becomes visible 2–3 months after launch.

Method 4. Ask new leads directly. On first contact, ask "how did you find us?" If one or two out of ten say ChatGPT, Perplexity, or Claude, that's already a strong signal.

8. Common mistakes

Filled in the title and stopped there. The model reads five lines and moves on. The floor is around 500 words of canonical content.

Copy-pasted the homepage. llms.txt is not a duplicate of your hero block. It's a structured table of contents with descriptions. Headers and footers don't belong in it.

Stuffed with SEO keywords. "Best web development agency for startups across Europe and beyond" reads to a language model as marketing noise, and the model lowers its trust in the source. Write like you're talking to a smart human, not a 2018 SEO crawler.

Never updated. Launched it six months ago, forgot about it. The model reads the Updated: field (if present) and discounts stale data. Review every one to two months.

Forgot to allow AI crawlers in robots.txt. The classic gotcha: llms.txt is written, but GPTBot is blocked. The crawler never arrives, the file never gets read. Worth double-checking.

9. Bottom line

llms.txt isn't "the new SEO." It's a new layer of visibility on an infrastructure that runs in parallel with classical search. Within 12 months, most B2B sites will have this file. Within 24 months, the absence of one will read the same way a missing sitemap.xml did in 2018 — a sign of technical neglect.

Whoever ships it now gets a 12–24 month head start while the niche is still empty. Whoever waits a year will be playing catch-up.

Spec and examples: llmstxt.org.