Francesco Di Costanzo
Back to shorts

(3) How We Built a Whitepaper in a Morning

Every executive is experimenting with AI, but most interactions remain single-turn and forgettable. A 2023 Harvard/BCG field experiment with 758 consultants found that consultants using GPT-4 completed tasks 25% faster and improved output quality by 40%. The lowest-skilled workers saw a 43% performance improvement. Yet the real insight is not the percentage gain but the pattern: productivity emerges from delegation and iteration, not from better prompts in isolation. Treating AI like a magic tool—ask once, expect miracles—produces disappointment. The unlock is managerial.

The Briefing Problem

The productivity trap starts earlier, with how we manage what we already know. Knowledge workers spend an average of 2.5 hours per day searching for information. A 2023 survey found that 82% of knowledge workers had tried multiple productivity systems and abandoned all of them, cycling through three to five setups per decade. Capture is easy. Keeping knowledge current is what breaks every system. Your second brain is already out of date. The same pattern repeats with AI: adoption stalls not because the model is weak but because the briefing is. A poorly scoped request produces a poorly scoped output, and the user blames the tool rather than the management.

The Three-Tier Framework

The difference between a toy demo and productive collaboration is architectural, and it maps cleanly onto three managerial disciplines. First, memory: modern agent frameworks distinguish episodic, semantic, and procedural memory. A 2025 LangChain survey found that 32% of organisations cite output quality as the top barrier to production agent deployment, traced directly to agents starting each session without context. Second, iteration: a 2025 field experiment on the Pairit platform found human-AI teams were 73% more productive per worker, with participants delegating 17% more work to AI agents and performing 62% fewer direct edits. Memory-backed iteration produces compounding returns that stateless chatbots cannot replicate. Third, scope discipline: the agent executes, the human directs, and the boundaries are explicit. The agent does not widen the argument, invent sources, or make judgment calls beyond its remit. Each tier is a managerial decision, not a technical one.

The Underrated Risk

Memory and iteration improve quality, but they do not eliminate every risk. Human-AI teams produce higher average quality yet reduced output diversity, a pattern researchers call diversity collapse. Outputs homogenise around AI-generated patterns. The Pairit experiment confirmed this jagged frontier: teams scored higher on text quality but lower on tasks where AI capabilities remain uneven. The risk is not that the agent fails but that the collaboration converges too smoothly on a predictable mean. Good managers notice this and course-correct. Bad managers accept the output and move on.

The Real Moat

The deeper tension is between retrieval and recombination. Most current AI tooling optimises for retrieval through RAG, search, and memory layers. A memory-backed agent that remembers your preferences is still fundamentally a retrieval system. The real leap is when memory becomes generative, when the agent starts making non-obvious connections across sessions. Productivity research shows retrieval is table stakes; recombination is the moat. Until our agents move from recall to genuine recombination, we have built faster filing cabinets, not smarter collaborators. The organisations that will compound returns are not those with the best models. They are those with managers who know how to brief, iterate, and scope.

Sources

Dell'Acqua, F. et al. (2023). "Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality." Harvard Business School Working Paper. https://www.gabormelli.com/RKB/Dell'Acqua_et_al.,_2023

Pairit Field Experiment (2025). "Collaborating with AI Agents: A Field Experiment on Teamwork, Productivity, and Performance." arXiv:2503.18238. https://arxiv.org/html/2503.18238

LangChain (2025). State of Agent Engineering Survey. (Cited in research memo.)

Soda (2026). "The Maintenance Problem: Why Your Second Brain Is Already Out of Date." https://getsoda.app/hub/the-knowledge-maintenance-problem

myah.dev (2026). "The second brain that never actually worked." https://www.myah.dev/blog/second-brain-failed

Atlan (2026). "Memory Layer for AI Agents: Definition, Types, How It Works." https://atlan.com/know/memory-layer-for-ai-agents/