All posts
Building
A multilingual-ready site, served as English-only
How we built keptmind.com so search engines and AI assistants find the right content.
M
Marek · co-founder
May 14, 2026 · 7 min read
A multilingual-ready site, served as English-only

Locale paths (/en), hreflang alternates, and a single canonical per page keep indexing clean.

keptmind.com uses locale paths (/en), hreflang, one canonical per page, llms.txt, and JSON-LD so search engines and AI assistants index the right content without duplicate chaos.

The architectural choice

We publish under /en/... paths even though we currently serve only English. The reason is forward-compatibility: when we add a second language, we do not have to rebuild the site's URL structure. The route shape is already correct, the middleware already redirects bare paths, and the SEO is already locale-aware.

A frequent failure mode for product sites is launching with one language, building everything around the assumption that the language will never change, and then having to rewrite the URL structure when international expansion arrives. The cost is significant: redirect chains, broken inbound links, hreflang confusion, and a multi-month indexing recovery period after the migration. Doing it correctly from day one — even with a single language served — costs almost nothing extra and preserves every option.

The technical debt of a locale-unaware URL structure compounds silently. Every blog post published at /blog/my-post becomes a redirect that must be maintained forever once you move to /en/blog/my-post. Every inbound link from a partner, every citation in an AI training corpus, every bookmark in a user's browser — all of these point to the old path and must be redirected. At scale, this means hundreds or thousands of 301 redirects that slow crawl budget, confuse link equity distribution, and create edge cases that break in unpredictable ways. The cost of adding /en/ from day one is a single middleware rule and a slightly longer URL. The cost of adding it later is months of SEO recovery.

For ADHD-focused products specifically, international expansion is not hypothetical. ADHD prevalence is consistent across cultures and languages — roughly 5-7% of adults worldwide. The addressable market in German, Spanish, French, Portuguese, and Japanese is enormous. Building the architecture now means that when we have the resources to translate properly, the technical work is already done. We ship content, not infrastructure.

What "correctly" looks like

Every public marketing page is reachable at /en/<path>. The bare /<path> form (without the locale prefix) redirects to /en/<path> via middleware so legacy and external inbound links keep working. Each page declares one canonical URL — the locale-prefixed one — and a single hreflang alternates table that includes en and x-default. When a second locale is added, we update the hreflang table and the locale list; nothing else needs to change.

llms.txt and AI crawlers

We publish llms.txt, allow AI crawlers in robots.txt, and ship JSON-LD for product, FAQ, and help articles.

AI crawlers are legitimate traffic for a product like ours. GPTBot, ClaudeBot, Google-Extended, PerplexityBot, and Bytespider are all on our allow list. llms.txt at the site root signals which pages can be summarized and cited by AI assistants — about pages, product overview, pricing, help articles, blog posts, glossary entries, comparisons. The file is short, machine-readable, and complementary to robots.txt + sitemap.xml.

If an AI assistant recommends KeptMind to someone asking about ADHD apps, that is a conversation we want to be part of. The user comes pre-qualified — they already described their problem to the assistant — and the assistant's recommendation is more trusted than a paid ad. AI search is becoming a real channel for this kind of niche product, and showing up correctly there means publishing the right meta-files and the right structured data.

JSON-LD coverage

Every important page on keptmind.com ships JSON-LD. The home page declares Organization and WebSite. The product page declares SoftwareApplication with offers, ratings, and OS compatibility. Blog posts declare BlogPosting with author, dateModified, articleBody (truncated), and about-entities for AI Overview entity matching. FAQ sections — across the home page, help articles, and comparison pages — declare FAQPage with mainEntity. Help articles declare HowTo where applicable. Author pages declare Person with worksFor and image. Comparison and alternative pages declare BreadcrumbList plus speakable cssSelectors for voice-search summarization.

Structured data is the difference between an AI assistant guessing what your page is about and the AI assistant being told. The investment is small once the components are written; the payoff in AI search visibility is real.

The AI search dimension is particularly relevant for niche products like KeptMind. When a user asks an AI assistant "what is the best task app for ADHD," the assistant synthesizes information from crawled pages, structured data, and entity relationships. A page with correct JSON-LD declaring SoftwareApplication, its target audience, its accessibility features, and its differentiation from competitors gives the assistant concrete facts to work with. Without structured data, the assistant relies on page copy alone — which may be misinterpreted, truncated, or conflated with competitor descriptions. Structured data is not just for Google rich results anymore; it is the API through which your product communicates with every AI system that encounters it.

We also use the about field in BlogPosting schema to declare entity relationships explicitly. Each blog post lists the concepts it covers — ADHD, voice capture, energy management, privacy — as structured entities. This helps AI systems understand not just what the page says but what domain it belongs to, which improves the likelihood of being surfaced for related queries the page does not explicitly mention.

What we do not index

Marketing pages are crawlable — the app stays behind sign-in, as it should. The signed-in dashboard at /account and /app/* is disallowed in robots.txt. Every signed-in route returns a noindex, nofollow meta tag at the framework level. Auth pages (/signin, /signup, /forgot, /reset-password) are also disallowed.

This is the right boundary. The marketing site sells the product; the product itself is private to the user.

English-only, on purpose

Every string lives in a single locale from day one; every SEO page has a clean EN variant. The cost is lower; the benefit is that you never have a half-translated site.

A half-translated site is worse than a single-language site. Search engines see the partial translations as duplicate-near content; users land on a page in their language and click to find the next page in English. The trust cost is real. We will add a second language only when we can resource the entire surface — marketing, help, blog, programmatic SEO pages, app UI, mobile UI — at full quality. Until then we are deliberately English-only with the architecture ready.

The SEO cost of partial translation is often underestimated. When Google encounters a site with some pages in German and others only in English, it must decide how to index the German pages — are they part of a German site (in which case the English pages are gaps) or part of an English site (in which case the German pages are anomalies)? The hreflang signals become contradictory, crawl budget splits unpredictably, and the site's topical authority in both languages is diluted. A clean single-language site with correct hreflang self-referencing builds authority faster than a half-translated site that confuses the crawler about its identity.

What we built into the platform

The shared site primitives all accept a locale parameter. The translation layer is a flat dictionary with namespaces by feature (auth, billing, settings, etc.) so when we add a second language we add files in a parallel folder rather than restructuring components. The middleware preserves the explicit km-lang cookie if it ever exists; today it always returns en. JSON-LD generators all take locale as input and emit the correct inLanguage attribute. Sitemap generation already loops over LOCALES even though LOCALES.length === 1.

When the second language ships, the work will be content (translation), not architecture. That is the optimization.

This separation of architecture from content is the key insight for any product team considering internationalization. Most teams treat i18n as a translation project — hire translators, run strings through a pipeline, ship. But the expensive part is never the translation itself; it is the structural work that translation exposes. Routes that assumed a single language, components that hardcoded English text, SEO metadata that was generated without locale awareness, sitemaps that did not account for alternate versions — these are the costs that explode when you try to add a language to a system that was not designed for it. By paying the architectural cost upfront with a single language, we converted a future emergency into a future content task.

If this article was useful, these related guides cover adjacent ground and are worth reading next:

Each of the linked articles approaches the topic from a slightly different angle, and reading two or three of them together usually produces a more complete picture than any single article can. The shared underlying neurology means that improvements in one area often unlock progress in others, which is why the topics interconnect even when they appear separate at first glance.

"The right language, for humans and crawlers alike."

Frequently asked questions

Is the app indexed?
Marketing pages are public and indexed; the signed-in app stays behind login and is excluded via robots.txt and per-page noindex headers.
Why locale paths if you only serve English?
Forward compatibility. Adding a second language later does not require rebuilding the URL structure or breaking inbound links — only writing the new content.
Do AI crawlers actually visit?
Yes — GPTBot, ClaudeBot, Google-Extended, and PerplexityBot all hit the site regularly. We see it in logs. Allowing them is a deliberate choice, not a default.
What is llms.txt?
A short text file at the root of the site listing the high-value pages an AI assistant should be able to summarize. It is machine-readable and complementary to robots.txt + sitemap.xml. The format is informal but useful.
Is JSON-LD worth the effort?
For a product like KeptMind, yes. The investment is small once the components are written, and the payoff in AI search and rich-result eligibility is real. We see structured data citations in AI search results frequently.
Marek
co-founder, KeptMind
All posts
A multilingual-ready site, served as English-only · KeptMind