Predli Blog - SEO for Generative AI: Why llms.txt is the New robots.txt

Optimizing for AI Search

‍

For years, companies have invested heavily in SEO to win visibility on search engines. But that investment is losing its payoff. Search is undergoing a major shift: instead of browsing links, users increasingly receive direct answers from AI engines like ChatGPT, Claude, or Perplexity. This means traditional SEO delivers less value - visibility is no longer about ranking on search engines, but about being correctly interpreted by, and hopefully mentioned in responses on, large language models.

Existing standards such as robots.txt and sitemap.xml were created to guide search engine crawlers, ultimately feeding into how pages were ranked. But AI models operate differently: they don’t simply index, they consume, compress, and reason over content. Most websites are not designed with this in mind, which makes them difficult for AI systems to parse.

This is where llms.txt comes in - a simple proposal to make websites AI-readable, in the same way earlier standards once made them search-engine friendly. To understand why this matters, we need to look closer at the obstacles LLMs face when trying to read today’s websites.

‍

The Invisible Barrier LLMs Face

‍

AI tools like ChatGPT or Claude promise impressive information comprehension like deep-research, search etc to provide up-to-date & factually accurate information. However, they often stumble when navigating modern websites.

Endless scripts, menus, ads, and complex HTML code dilute the content, consuming precious context tokens and limiting utility. Most of the time, LLMs face a limitation, that context windows are too small to handle most websites. This barrier isn’t just technical. It impacts usability, accuracy, and trust. Without intentional design for AI-readability, even the most technically sound websites risk being lost in translation.

The core issue is that websites are optimised for human viewing and not for machine reasoning. Key challenges include:

‍

• Cluttered HTML: Navigation bars, JavaScript assets, and advertisements obscure the main content.

• Token Waste: LLMs waste valuable context absorbing irrelevant code and layout data.

• Ambiguous Discovery: Without guidance, AI must search aimlessly through content, increasing the risk of incomplete or outdated responses.

‍

Unlike traditional SEO files like robots.txt or sitemap.xml, which focus on crawling and indexing, there’s no standard helping LLMs find precision in the noise. Without such a standard, even critical business content risks being lost - from product information in e-commerce to key insights from thought leaders.

‍

Why Does It Matter?

Solution: llms.txt

‍

Enter /llms.txt: a simple yet powerful proposal for making websites AI-readable - much like robots.txt once did for search engines. It is a root-level markdown file offering a curated, structured overview of key site content designed for both humans and models. It provides links, context, and structure optimised for AI agents.

‍

Format

‍

What makes this file unique is its use of Markdown, rather than traditional web formats like XML. This choice is intentional, as the file is designed to be lightweight, human-readable, and, most importantly, easily digestible by agents, LLMs, and their applications. It also enables consistent processing with classical programming techniques such as parsers and regex.

The llms.txt specification defines a file that should be placed at the root of a website (/llms.txt), though it can also exist within a subpath if needed. A compliant file is written in Markdown and follows a specific structure, in this order:

‍

• H1 header: The site name (this is the only required element).

• Blockquote: A concise summary of the site, highlighting the essential context for interpreting the rest of the file.

• Optional descriptive sections: Additional markdown content (paragraphs, lists, etc., but not headings) that provide more background or guidance.

• File list sections: One or more sections introduced by H2 headers, each containing lists of relevant resources.

• File lists: Each entry is a Markdown bullet containing a required hyperlink ([name](url)), optionally followed by a colon (:) and notes describing the file.

‍

An example of the format is as follows

# Project Name
> Brief summary

[Optional notes]

## Core Documentation
- [Quick Start](URL): Setup guide
- [API Reference](URL): Detailed docs

## Optional
- [Blog](URL): Less critical but useful

‍

The llms.txt proposal does not prescribe a specific method for processing the file, as the approach will vary by application. For instance, the FastHTML project automatically expands llms.txt into two markdown files containing the linked content, structured in a way that works well with LLMs like Claude. These are llms-ctx.txt (excluding optional URLs) and llms-ctx-full.txt (including them). Both are generated via the llms_txt2ctx command-line tool, with accompanying documentation to guide users on how to work with them. This clean structure plays a similar role for AI systems as structured data and sitemaps did for search engines, it helps them find what matters most.

‍

Implementation Guide

‍

Getting started with llms.txt doesn’t require complex setup, yet it can directly influence how your site appears in AI search results.. Just follow these simple steps to create, deploy, and maintain the file so AI systems can quickly understand and use your content:

‍

1. Identify what’s crucial: e.g., product documentation, key policies, API reference.‍

2. Create llms.txt in markdown with a clear header, summary & links.

3. Optionally generate llms-full.txt encompassing full markdown content.

4. Deploy both files at root - for example: https://yoursite.com/llms.txt and /llms-full.txt.

5. Test with AI tools by manually loading the file into prompts to check clarity and relevance.

2. Maintain regularly with updates as your content evolves.

‍

You can explore the full code and usage instructions in the project’s GitHub repo.

‍

Conclusion

‍

In an era where AI interfaces increasingly mediate how users discover, interpret, and engage with web content, being easily understandable by LLMs is no longer optional; it's essential. The llms.txt proposal bridges the gap between traditional web design and AI-native accessibility, offering a straightforward, lightweight standard for content owners.

By implementing both llms.txt and llms-full.txt, oneself is not just future-proofing, one is ensuring that your most important content is clear, token-efficient, and ready to be read by AI-driven systems without any bottleneck. Just as SEO once evolved to meet the demands of Google’s crawlers and page ranking algorithms, it must now adapt to generative models. Those who act early won’t just be future-proofing their sites, they’ll be shaping how their content ranks in the emerging era of AI search.

‍

Learn more