November 19, 2024

How to generate llms.txt and llms-full.txt with Astro

I didn’t find a simple example of how to generate llms.txt and llms-full.txt with Astro, so I decided to quickly write this short guide on how to do it.

What are llms.txt and llms-full.txt?

Websites today serve not only human users but also LLMs, which might use documentation located on a website in the tools like coding editors, chatbots, chat interfaces, and similar.

Since LLMs process information differently, it’s useful to present concise, centralized content in a simple format, making it easier for AI helpers to access key information effectively despite context window limitations and web page complexities.

That’s what llms.txt are supposed to be for.

Usage

For example, I generated for ScreenshotOne both llms.txt and llms-full.txt files. Then I can try to use them in any editor like Cursor or in ChatGPT like this:

A example of using llms.txt with ChatGPT

How to generate llms.txt and llms-full.txt with Astro

For ScreenshotOne, I use Astro. It is one of the best platforms to manage content-oriented websites, and not only.

I added two new routes—src/pages/docs/llms.txt.ts and src/pages/docs/llms-full.txt.ts.

For llms.txt, I used the following code to generate all the links:

import { getCollection } from "astro:content";
import type { APIRoute } from "astro";

const docs = await getCollection("docs");

export const GET: APIRoute = async ({ params, request }) => {
    return new Response(
        `## ScreenshotOne.com Documentation\n\n${docs
            .map((doc) => {
                return `- [${doc.data.title}](https://screenshotone.com/${doc.slug}/)\n`;
            })
            .join("")}`,
        { headers: { "Content-Type": "text/plain; charset=utf-8" } }
    );
};

And for llms-full.txt, I used the following code to generate the complete content of all the pages in one file:

import { getCollection } from "astro:content";
import type { APIRoute } from "astro";

const docs = await getCollection("docs");

export const GET: APIRoute = async ({}) => {
    return new Response(
        `## ScreenshotOne.com Full Documentation\n\n${docs
            .map((doc) => {
                return `# ${doc.data.title}\n\n${doc.body}\n\n`;
            })
            .join("")}`,
        { headers: { "Content-Type": "text/plain; charset=utf-8" } }
    );
};

I use my docs collection defined by the Astro Starlight extension to get all the pages. But it works with any other collection, too.

You don’t need to do anything to exclude it from your sitemap if you use the Astro sitemap integration, it by default exclude any endpoints. But if you think it should be indexed, you can add custom entries to your sitemap.

From the one side, it contains the duplicate content from all your website, but from the other side, it would be great if somebody could find your llms.txt by searching in Google or Bing.

I would start from preventing indexing, but maybe later once we get more information about that, I would consider to open it.

You can do that by adding the following headers to your response:

X-Robots-Tag: noindex, nofollow

But it won’t work in the static generation mode, unfortunately. You will either need to do it on the server side or through a proxy like Cloudflare.

Conclusion

Don’t sleep on more opportunities to help your users work with your documentation and use it in a more convenient way.