Francesco Di Costanzo
Back to articles

(18) Karpathy's LLM Wiki Pattern Is the First Practical AI Second Brain

The Problem Bush Couldn't Solve

On April 4, 2026, Andrej Karpathy published a GitHub Gist he called an "idea file" for building a personal knowledge base with a large language model. The document was designed to be pasted into an LLM agent, which would then collaborate with the user to instantiate a three-layer system: an immutable folder of raw sources, an LLM-generated markdown wiki, and a schema file that teaches the agent how to behave across sessions. The gist spread quickly — within days, implementations had appeared on GitHub, including a pgvector-backed version by Y Combinator president Garry Tan, a CLI tool, and an Obsidian-native variant. By the time commentary had settled, the framing that circulated was that Karpathy had described the first AI second brain.

That framing is wrong in one important way and right in a more precise way. Karpathy himself is clear about where the idea comes from. He opens the gist by citing Vannevar Bush's 1945 essay "As We May Think," in which Bush imagined the Memex — a personal, associative knowledge store built around trails through documents rather than hierarchical filing. Karpathy writes that Bush's vision was closer to what he has in mind than to what the web became, and then identifies exactly what Bush couldn't solve: who does the maintenance. The LLM, Karpathy argues, handles that. That single sentence contains the actual argument. The architecture is not new. The maintenance economics are.

What the Second Brain Was Actually For

To evaluate whether the LLM wiki pattern changes anything fundamental, it helps to be precise about what Tiago Forte's second brain framework actually promised. The Building a Second Brain methodology, developed since approximately 2017 and codified in a 2022 book, rests on a four-step process called CODE: Capture, Organize, Distill, and Express. The organizing principle is actionability, not subject matter. Notes are filed under four categories — Projects, Areas, Resources, Archive — according to how soon they will be used. The system's core promise is that a second brain converts information into output, not merely into a better-organized pile of saved links.

Forte's own 2024 analysis of what AI changes in this framework is unusually direct. He argues that AI radically disrupts the middle two steps — Organize and Distill — because AI can extract structure from unstructured text and summarize large volumes of material rapidly. What AI does not change, on his analysis, is Capture, where the human must still decide what matters and collect it, and Express, where the human must still add voice, judgment, and finishing direction. His conclusion is that AI concentrates human creativity at the first and last steps and automates the middle, but does not eliminate the need for a personal knowledge system as a staging area for inputs and outputs.

This framing maps onto the Karpathy pattern more closely than it initially appears. Karpathy's stated division of labor assigns sourcing, exploration, and asking the right questions to the human, and assigns summarizing, cross-referencing, filing, and bookkeeping to the LLM. That is structurally identical to Forte's 2024 AI analysis of CODE. The LLM wiki does not replace the second brain methodology. It operationalizes the part of it that human knowledge workers have historically been worst at sustaining.

What Is Genuinely New

The vocabulary confusion in most coverage of this topic is worth resolving before going further. Retrieval-augmented generation is a technique for improving the grounding of individual LLM answers by supplying retrieved external passages at query time. It does not, by itself, create a durable knowledge base. A knowledge graph is a directed, labeled graph with semantically typed nodes and edges — a structured representation of entities and relations. An Obsidian graph view is a visualization of file links; it is not automatically a semantic knowledge graph and does not enforce typed relationships. A linked-note system or wiki is a corpus of human-readable pages connected by explicit links. Karpathy's pattern sits in this last category, not in knowledge-graph territory, unless additional structured semantics are imposed.

What is not new about the pattern is the concept of externalized, associative personal memory. Bush imagined the Memex in 1945. Ward Cunningham built the first functioning wiki in 1995, demonstrating that linked, editable pages could become usable knowledge infrastructure. Niklas Luhmann maintained a slip-box of approximately 90,000 handwritten note cards connected by explicit links over several decades, crediting the system as an intellectual co-author. Personal information management, cognitive offloading, and linked notes all have extensive prior art. AI-assisted personal knowledge products also predate the Karpathy gist: Google launched NotebookLM in July 2023 as an AI-first notebook grounded in user documents; Mem, Reflect, Notion AI, and Capacities all offered AI over personal notes before April 2026. "First practical AI second brain" as a broad market claim is wrong.

What is new is the economics of upkeep, and that distinction turns out to matter more than it initially seems. Risko and Gilbert's 2016 review in Trends in Cognitive Sciences establishes that external memory systems reduce internal cognitive demand, and Kiewra's 1989 review of note-taking research shows that reviewing stored notes has well-documented benefits in every study examined. But neither body of work solved the maintenance problem: keeping summaries current, updating cross-references when new sources arrive, revising overview pages, logging operations. That labor cost is precisely what kills linked-note systems in practice. A 2020 study by Alon and Nachmias surveying 465 participants on 25 personal information management practices found significant aspirational-to-actual gaps in 22 of the 25 practices, with larger gaps correlated with negative feelings and lower self-efficacy. MITRE's case study on enterprise wiki adoption found reluctance driven by perceived extra effort, uncertainty about what should be shared, and cultural barriers around contribution norms. The failure mode is consistent: complex personal knowledge systems collapse under their own maintenance requirements.

Karpathy's central claim is that today's LLM agents lower that bookkeeping cost sharply enough that a maintained personal wiki becomes feasible where a manual one typically fails. That claim is supported by what agents can now actually do. Anthropic's Claude Code documentation describes an agentic tool capable of reading and writing files, running commands, following CLAUDE.md schema files loaded at the start of every session, managing version control, and spawning sub-agents. OpenAI documents AGENTS.md as a mechanism for layering global and project-specific instructions for Codex. Those capabilities — local file access, schema-governed operation, multi-file updating, structured logging — are exactly what the LLM wiki requires. The bottleneck was not technical capability. It was the absence of a clear, reproducible reference architecture. That is what April 2026 changed.

The Hallucination Infrastructure Problem

The most serious objection to the LLM wiki is not philosophical but operational, and it deserves more space than it usually receives in discussions of the pattern. Belém and colleagues, in a 2025 NAACL paper on multi-document LLM summarization, find hallucination rates of up to 75% in the conversational domain and up to 45% in news. GPT-4o still generates summaries roughly 44% of the time even when summarizing non-existent topic-related information. A Nature analysis published April 1, 2026, in collaboration with Grounded AI, finds that at least tens of thousands of 2025 scholarly publications may contain invalid AI-generated references, with the problem spanning journal articles, conference proceedings, and book chapters. Dahl and colleagues' 2024 work in the Journal of Law and the Biosciences documents systematic hallucination of legal case citations by public-facing LLMs.

The implication for an LLM-maintained wiki is categorically different from the implication for a disposable chat session. When a chatbot hallucinates, the error appears in a response that the user typically evaluates immediately and discards. When a wiki agent hallucinates during an ingest operation, the error is written to a page that persists across sessions, is referenced by subsequent pages, and may eventually be treated as authoritative source material for future queries. The wiki's core strength — persistent, compounding synthesis — becomes its most dangerous property when the compounding includes errors. Steph Ango, Obsidian's co-founder and CEO, publicly warned following the gist's circulation that personal vaults should be kept "clean," recommending a separate vault for agent-generated content to prevent contamination of the personal knowledge base. That is a design-level caution from the maker of the primary recommended substrate for the pattern.

The long-context alternative deserves honest treatment as well. Li and colleagues' 2024 EMNLP study shows that long-context LLMs consistently outperform RAG on average quality when sufficiently resourced. Liu and colleagues' 2024 TACL paper on "Lost in the Middle" finds that performance degrades substantially when relevant information is not at the beginning or end of a long context. Hsieh and colleagues' RULER benchmark shows that almost all models exhibit large performance drops as context length increases, with multi-hop reasoning and synthesis tasks showing the largest degradation. The practical conclusion is straightforward: plain search or long-context prompting is adequate for one-off questions over a small corpus. For sustained synthesis work over a growing archive — the exact use case the LLM wiki is designed for — structured intermediate layers retain meaningful value. The choice is calibrated to use case, not architecture absolutism.

Design Principles for a Trustworthy System

Both the NIST Generative AI Risk Management Profile (AI 600-1, 2024) and UK ICO guidance on generative AI share a common demand: provenance, version history, human review at appropriate checkpoints, and documentation of AI-generated content modifications. Those requirements map directly onto the design decisions an LLM wiki needs to make. Synthesizing from Karpathy's gist, NIST and ICO guidance, and the hallucination risk literature, eight principles emerge. Since no controlled studies yet compare LLM wiki outcomes against simpler alternatives, these principles represent the best-supported current inference rather than empirically validated requirements.

Sources and synthesis must be separated immutably: raw sources in a write-protected directory, LLM-generated content in a separate wiki layer. NIST calls for tracing the origin and modifications of digital content with sources, timestamps, and metadata. Every wiki page should carry its source documents and ingest date; high-value pages should include claim-level traceability. A schema file — CLAUDE.md for Claude Code, AGENTS.md for Codex — specifying page types, citation format, naming conventions, confidence labeling, and prohibited actions converts the agent from a generic chatbot into a governed operator. Version history through Git and an append-only log.md provide rollback and auditability. Page types must be explicitly labeled: source summaries, entity pages, synthesis pages, and speculative inference pages are distinct epistemic categories, and mixing them in unmarked prose is how errors become durable infrastructure. Low-risk operations — index updates, backlink additions — can be automated; deletions, merges, and factual claim changes require human approval. NIST notes that organizations using generative AI may warrant additional human review as capability increases. Local hybrid search over markdown, using tools like qmd which combines BM25 and vector reranking, is often preferable to expanding prompt context indefinitely. For professional archives, local-first storage aligned with Obsidian's file system model provides stronger alignment with confidentiality requirements than hosted AI note-chat stacks.

The Conditional Verdict

The LLM wiki should not be called the first practical AI second brain in the broad market sense. That claim is too strong. It should be called the first widely shared and practically reproducible architecture for an inspectable, file-native, schema-governed, agent-maintained personal knowledge base — and that distinction is a meaningful contribution. The value proposition is conditional, and the conditions are specific: large source accumulation, repeated need for synthesis across time, and willingness to govern the system editorially. For knowledge workers without those three characteristics, simpler retrieval — NotebookLM-style grounded chat, plain search over local files, or long-context prompting — will remain adequate and substantially cheaper to operate.

What the evidence supports, and what the commentary usually misses, is that the innovation here is architectural rather than philosophical. The concept of a second brain has been described since Bush imagined the Memex in 1945. The concept of linked, maintained notes dates to Luhmann. The concept of AI over personal documents dates to NotebookLM's 2023 launch. What Karpathy contributed is a precise articulation of how to make the maintenance problem tractable: three layers with defined roles, three operations with defined outputs, a schema file that governs agent behavior, an append-only log that makes every change auditable. The knowledge base, as Karpathy puts it, is the codebase; the LLM is the programmer; and Obsidian is the IDE. That metaphor is not decorative. It describes a software engineering mindset applied to personal knowledge management, one that treats provenance, schema, version control, and review as first-class concerns rather than optional additions.

Whether that mindset will prove durable depends on questions the current evidence cannot resolve: whether users sustain the practice longer than manual note systems, whether hallucination accumulation can be managed by lint passes alone, and whether the governance principles translate from individual use to professional deployment. Those are empirical questions with no current answers. What can be said with confidence is that the architecture is sound enough to build on, the risks are well-identified, and the moment for doing so — when frontier models have the agentic capabilities the pattern requires and open-source tooling is rapidly maturing — has arrived.

Sources

Karpathy's LLM Wiki — Primary Source

  1. Karpathy, Andrej. "LLM Wiki." GitHub Gist, April 4, 2026. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

Tiago Forte and Second Brain Methodology

  1. Forte Labs. "Building a Second Brain: The Definitive Introductory Guide." https://fortelabs.com/blog/basboverview/

  2. Forte Labs. "Will Artificial Intelligence Replace the Need for Second Brains Entirely?" 2024. https://fortelabs.com/blog/will-artificial-intelligence-replace-the-need-for-second-brains-entirely/

  3. Forte, Tiago. Building a Second Brain. Atria Books, 2022. https://www.buildingasecondbrain.com/

  4. Forte Labs. "The PARA Method: The Simple System for Organizing Your Digital Life in Seconds." https://fortelabs.com/blog/para/

Academic — Cognitive Science and Note-Taking

  1. Risko, Evan F. and Sam J. Gilbert. "Cognitive Offloading." Trends in Cognitive Sciences, September 2016, PMID 27542527. https://pubmed.ncbi.nlm.nih.gov/27542527/

  2. Kiewra, Kenneth A. "A Review of Note-Taking: The Encoding-Storage Paradigm and Beyond." Educational Psychology Review, 1989, vol. 1, no. 2. https://link.springer.com/article/10.1007/BF01326640

  3. Alon, Lilach and Rafi Nachmias. "Gaps between Actual and Ideal Personal Information Management Behavior." Computers in Human Behavior, 2020. https://www.sciencedirect.com/science/article/abs/pii/S0747563220300480

Academic — Hallucination and Factual Accuracy

  1. Belém, Catarina G. et al. "From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization." Findings of NAACL 2025. https://aclanthology.org/2025.findings-naacl.293/

  2. Dahl, Matthew et al. "Profiling Legal Hallucinations in Large Language Models." Journal of Law and the Biosciences, 2024, vol. 16, no. 1. https://academic.oup.com/jla/article/16/1/64/7699227

  3. Charlotin, Damien. AI Hallucination Cases Database. https://www.damiencharlotin.com/hallucinations/

  4. Nature. "Hallucinated citations are polluting the scientific literature." April 1, 2026. https://www.nature.com/articles/d41586-026-00969-z

Academic — Retrieval, RAG, and Long-Context

  1. Lewis, Patrick et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020. https://dl.acm.org/doi/abs/10.5555/3495724.3496517

  2. Liu, Nelson F. et al. "Lost in the Middle: How Language Models Use Long Contexts." Transactions of the Association for Computational Linguistics, 2024. https://aclanthology.org/2024.tacl-1.9/

  3. Hsieh, Cheng-Ping et al. "RULER: What's the Real Context Size of Your Long-Context Language Models?" COLM 2024. https://arxiv.org/abs/2404.06654

  4. Li, Zhen et al. "Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach." EMNLP Industry 2024. https://aclanthology.org/2024.emnlp-industry.66/

Academic — Zettelkasten and Smart Notes

  1. Zettelkasten.de. "Introduction to the Zettelkasten Method." https://zettelkasten.de/introduction/

  2. Ahrens, Sönke. How to Take Smart Notes. 2017. https://www.soenkeahrens.de/en/takesmartnotes

Regulatory and Governance

  1. NIST. Artificial Intelligence Risk Management Framework (AI RMF 1.0), 2023. https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf

  2. NIST. Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, 2024. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf

  3. UK ICO. "Human review." https://ico.org.uk/for-organisations/advice-and-services/audits/data-protection-audit-framework/toolkits/artificial-intelligence/human-review/

  4. UK ICO. "Generative AI: eight questions that developers and users need to ask." https://ico.org.uk/about-the-ico/media-centre/blog-generative-ai-eight-questions-that-developers-and-users-need-to-ask/

Agentic Tools and Product Documentation

  1. Anthropic. "Claude Code Overview." https://docs.anthropic.com/en/docs/claude-code/overview

  2. Anthropic. "Files API documentation." https://docs.anthropic.com/en/docs/build-with-claude/files

  3. OpenAI Developers. "Custom instructions with AGENTS.md." https://developers.openai.com/codex/guides/agents-md

  4. OpenAI. "Codex documentation." https://platform.openai.com/docs/guides/codex

  5. Obsidian Help. "How Obsidian stores data." https://help.obsidian.md/Files+and+folders/How+Obsidian+stores+data

  6. Obsidian Help. "Internal links." https://obsidian.md/help/links

  7. Obsidian. "Graph view." https://help.obsidian.md/Plugins/Graph+view

  8. Google Blog. "Introducing NotebookLM." July 12, 2023. https://blog.google/innovation-and-ai/technology/ai/notebooklm-google-ai/

  9. Google Support. "Learn about NotebookLM." https://support.google.com/notebooklm/answer/16164461

Historical Precedents

  1. Bush, Vannevar. "As We May Think." The Atlantic, July 1945. https://www.w3.org/History/1945/vbush/vbush7.shtml

  2. Luhmann-Archiv, Bielefeld University. Digitized Zettelkasten. https://niklas-luhmann-archiv.de/bestand/zettelkasten/zettel/ZK_1_NB_1_1_V

  3. Wikipedia. "Wiki." https://en.wikipedia.org/wiki/Wiki

Obsidian Design and Community

  1. Ango, Steph. "How I use Obsidian." https://stephango.com/vault

  2. Konik, Eleanor. "It's Not Just a Pretty Gimmick: In Defense of Obsidian's Graph View." September 2021. https://www.eleanorkonik.com/p/its-not-just-a-pretty-gimmick-in-defense-of-obsidians-graph-view

Open-Source Implementations

  1. garrytan/gbrain. https://github.com/garrytan/gbrain

  2. Tan, Garry. "I got inspired by Karpathy's LLM Wiki." X (Twitter), April 9, 2026. https://x.com/garrytan/status/2042300939525402875

  3. danielmiessler/Personal_AI_Infrastructure. https://github.com/danielmiessler/Personal_AI_Infrastructure

  4. Miessler, Daniel. "Personal AI Infrastructure." https://danielmiessler.com/blog/personal-ai-infrastructure

  5. tobi/qmd (local markdown hybrid search). https://github.com/tobi/qmd

  6. kytmanov/obsidian-llm-wiki-local. https://github.com/kytmanov/obsidian-llm-wiki-local

  7. doum1004/llmwiki-cli. https://github.com/doum1004/llmwiki-cli

Technical and Industry Media

  1. VentureBeat. "Karpathy shares 'LLM Knowledge Base' architecture that bypasses RAG." April 3, 2026. https://venturebeat.com/data/karpathy-shares-llm-knowledge-base-architecture-that-bypasses-rag-with-an

  2. Antigravity Codes. "Karpathy's LLM Knowledge Bases: The Post-Code AI Workflow." April 3, 2026. https://antigravity.codes/blog/karpathy-llm-knowledge-bases

Enterprise Knowledge Management Research

  1. MITRE. "Factors Impeding Wiki Use in the Enterprise: A Case Study." https://www.mitre.org/sites/default/files/pdf/09_3961.pdf

AI Market — Personal Knowledge Products

  1. Mem. "AI Thought Partner for Ideas and Research." https://mem.ai/

  2. Reflect. "Notes with an AI Assistant." https://reflect.app/

  3. Capacities. "Linked-Object Personal Knowledge Management." https://capacities.io/

  4. Notion. "Notion AI." https://www.notion.so/product/ai