Knowledge Collapse and Cognitive Debt: Two MIT Papers on AI

I keep getting the same question from people I respect, phrased a few different ways but always the same shape: am I getting dumber while my team ships faster?They've been working with large language models (LLMs) for two years. The models have gotten better. The output has gotten faster. Some of it has gotten better. But something underneath feels off, and they can't name what.

Two papers from the Massachusetts Institute of Technology (MIT), both from the last twelve months, name what they're feeling. The first is a 69-page formal model with a 30-page mathematical appendix, written by Daron Acemoglu (MIT Department of Economics, 2024 Nobel laureate in economic sciences), Dingwen Kong, and Asuman Ozdaglar (head of MIT's Electrical Engineering and Computer Science department). It's titled “AI, Human Cognition and Knowledge Collapse” and it dropped as National Bureau of Economic Research (NBER) Working Paper 34910 on February 20, 2026.^[1] The second is an electroencephalography (EEG) study from MIT Media Lab. Fifty-four college students wrote essays with and without ChatGPT, and the brain-connectivity numbers went viral last summer.^[2]

The Acemoglu paper makes one structural claim worth taking seriously, and it isn't “AI makes us dumber.” It's that the welfare-maximizing level of agentic AI accuracy is not 100%, and that there's a closed-form ratio between the optimal level and the level at which the public stock of human knowledge collapses to zero. The Kosmyna et al. EEG study is the early empirical evidence that the assumption Acemoglu's model needs, that human effort actually does fall when AI gets accurate enough, is showing up at the level of individual brains.

Summary

Acemoglu et al. (Feb 2026) build a dynamic economic model in which agentic AI substitutes for the costly human effort that produces the public stock of general knowledge. Kosmyna et al. (June 2025) measure 32-channel EEG while 54 participants write essays with or without an LLM. The connection is the parameter on which the collapse equilibrium hinges, called effort elasticity, and whether real humans actually exhibit it.

Two things up front. First, Acemoglu's paper is a possibility result, not a forecast. It says “here's the parameter region in which the system collapses.” It doesn't say “we're in that region.” Anyone who tells you otherwise is misreading theory. Second, the Kosmyna study has been criticized on real methodological grounds, and the strongest critique I can find is from clinical psychologists at the University of Vienna.^[3]I'll come back to both.

The model: general knowledge is a commons

The setup goes like this, scaffolded for readers who don't read NBER PDFs for fun.

You are an agent in a community. Each period, you make a decision. To make it well you need two things. You need to know something general about the world that everyone shares. Call it the common state. And you need to know something about your specific situation that nobody else has. Call it your idiosyncratic state. In the paper's notation these are θ_t for the common state and θ_i,tfor the individual one. They are complements, not substitutes. Knowing only one without the other doesn't help you make a good decision. The paper formalizes this with a binary-tolerance function that gives positive output only when both predictions land within a tolerance band.^[1]

You produce both kinds of knowledge by paying for them with effort. The cost of effort is e^α/α with α > 1. That convex cost curve corresponds to a constant-elasticity supply of effort with elasticity ε = 1/(α-1), and that elasticity is the single parameter the rest of the paper turns on. The crucial detail is that costly effort produces two signals, not one. It gives you a precise private signal about your own situation. And it generates a thin public signal, a small noisy contribution to the community's stock of general knowledge. This is the learning externality. When you bother to figure something out, some piece of what you learned leaks into the commons.^[1]

Now agentic AI enters the model. Agentic AI is the paper's term for systems that act on a user's behalf, not just answering questions, the kind of product the frontier labs have spent the last two years racing to ship. In the model it delivers a signal about your context with precision τ_A. It substitutes for the private signal you'd otherwise have produced through effort. What it does not do is add to the public stock. The community-level general knowledge is built from human effort or it is not built at all. This is not an arbitrary modeling choice. The synthetic-data literature, including the famous Shumailov et al. paper in Nature on model collapse, has shown that AI trained on AI doesn't preserve the tails of the original distribution.^[4] Synthetic public knowledge runs into the same problem the labs already know about.

The structural asymmetry

Human effort produces two signals. AI substitutes for one.

What you put in

Costly human effortReasoning from scratch, debugging, drafting

Learning

Joint production with economies of scope

What comes out

Private signalYour own context. Substitutable by AI.
Thin public signalAdds to the community stock of general knowledge.

Quietly uncreated when AI substitutesThe thin public signalAgentic AI delivers the private signal. The contribution to the commons is collateral damage.

The whole paper hinges on this asymmetry. AI replaces what you would have produced for yourself. The crumbs you would have left for everyone else are not replaced.

“While agentic AI can improve contemporaneous decision quality, it can also erode learning incentives that sustain long-run collective knowledge.”

Acemoglu, Kong, and Ozdaglar, NBER WP 34910, p. 2

What's clever about the model is that it gets the asymmetry to tip into a self-reinforcing loop. The richer the stock of general knowledge, the more useful your effort becomes, because complementarity raises the marginal return on your private signal. The thinner the stock, the lower the return on effort, the less effort gets supplied, the thinner the stock gets next period. That's the dynamic. It's not a static “AI replaces work” claim. It's a feedback loop on the supply side of human learning.

Knowledge collapse: when better AI is worse

I expected the model to produce a slow erosion. It produces an absorbing barrier. The paper proves that under a specific condition on effort elasticity, there is a stable equilibrium in which general knowledge converges to zero in the limit. Not asymptotically small. Actually zero.

Recall ε = 1/(α-1) from the cost curve. The paper shows that the knowledge-collapse steady state is locally stable when α-1 < 1/4, equivalent to ε > 4.^[1]In plain English: if a 1% drop in the marginal return on effort produces more than a 4% drop in the effort people supply, the system is fragile to AI. Once AI precision crosses a threshold, you fall into the collapse basin and you don't come back.

The paper gives an asymptotic closed-form for two precision thresholds. Call τ^c_A the precision at which the high-knowledge equilibrium disappears entirely, the full-collapse threshold, and call τ^*_A the precision that maximizes welfare. As the community size I scales, the two thresholds both grow logarithmically in I, with a clean ratio:

Receipt

τ^*_A / τ^c_A → 1/αThe welfare-maximizing AI precision is a fraction 1/α of the precision at which the system tips into knowledge collapse. If α is 1.5, optimal precision is two-thirds of the collapse-triggering precision. Past that, welfare falls.

You don't see the welfare falling, because contemporaneous decisions still look great. You see it later, when the public stock has thinned enough that the marginal return on human effort isn't worth paying anymore, and by then you're inside the basin. The paper writes the welfare derivative as ∂Ū/∂τ_A= DE + IE. DE is the direct effect: better AI, better decisions today, positive. IE is the indirect effect: better AI crowds out effort, the public stock thins, and that's negative. At low precision DE dominates. At high precision IE wins. The peak is in the interior.

Takeaway

The welfare-maximizing level of agentic AI precision is not the maximum. It's some interior value. Past it, marginal precision gains get eaten by the collateral damage to the public stock that complements your effort. The frontier labs are racing along an axis where, after a point, more is worse.

The brain study supplies the elasticity, sort of

Acemoglu's model collapses when effort elasticity exceeds 4. The empirical question is whether real-world effort actually drops sharply when AI gets accurate enough. This is where Kosmyna et al. come in.

The setup. Eight authors at MIT Media Lab. Fifty-four college students from the Boston area, drawn from MIT, Wellesley, Harvard, Tufts, and Northeastern. Three groups of eighteen: an LLM group allowed to use ChatGPT-4o freely, a Search Engine group with Google but no AI summaries, and a Brain-only group with no tools. Each participant wrote three SAT-style essays, twenty minutes each, across three sessions. The team recorded 32-channel EEG throughout. A subset of 18 came back four months later for a fourth session. LLM users had to write Brain-only, and Brain-only users had to write with the LLM.^[2]

Kosmyna et al. study design

Same students, different sessions, watch the brain react

Sessions 1–3N=54
Same condition each week
Three groups of 18: LLM, Search, Brain-only. Twenty-minute SAT-style essays. EEG recorded throughout.
Headline neural finding~55% lower in LLM group
Brain connectivity drops with tool use
Directed Phase Lag Index across alpha and beta bands. Brain-only showed the most distributed networks. LLM showed the weakest coupling, especially in frontal-parietal pathways.^[2]
Headline behavioral finding83% failure
LLM users couldn’t quote their own essays
In Sessions 1–3, 83% of LLM users could not produce a single sentence verbatim from the essay they had just submitted. Brain-only failure was near zero.
Session 4 (~4 months later)78% still couldn’t quote
Crossover: 18 participants, switched conditions
LLM-to-Brain participants still showed reduced alpha and beta connectivity. Brain-to-LLM participants re-engaged occipito-parietal and prefrontal regions. The authors call the gap “cognitive debt.”

Takeaway

Within-task connectivity differs sharply by tool. The persistent session-4 gap is the headline-grabbing finding, and it's the one most worth interrogating: it rests on 18 people and one crossover.

The Kosmyna paper does not, strictly, measure what Acemoglu calls effort elasticity. It measures within-task brain activity in 54 college students writing essays. That's not the same object. The Vienna critique team made the point politely: “some results by Kosmyna et al. (2025) could be interpreted more conservatively.”^[3] A rougher critique is that LLM users were typing and clicking and reading, while Brain-only users were composing from memory. Some of the connectivity gap could just reflect different motor and visual activity, not effort substitution.

Stronger behavioral evidence sits in two other studies. Noy and Zhang in Science, 2023. 444 college-educated professionals, occupation-specific writing tasks, ChatGPT or no AI. Time fell by 0.8 standard deviations. Quality rose by 0.4 standard deviations.^[5]The authors' own characterization of the mechanism is exactly the substitution-not-complementarity assumption Acemoglu's model needs:

“ChatGPT mostly substitutes for worker effort rather than complementing worker skills, and restructures tasks towards idea-generation and editing and away from rough-drafting.”

Noy & Zhang, Science 381 (July 2023)

That's effort substitution at population scale. Not a brain scan, not 18 students, 444 working professionals. The Dell'Acqua et al. study at Harvard Business School adds the sharper edge. 758 Boston Consulting Group consultants, working with GPT-4. Inside the AI's frontier, consultants completed 12.2% more tasks, 25.1% faster, and quality rose 40%. Outside the frontier, consultants using AI were 19 percentage points less likely to produce correct solutions than the no-AI control. The paper's framing: they “fell asleep at the wheel.”^[6]

Three pieces of evidence on the same parameter, none of them a clean estimate. Together they suggest effort elasticity isn't low.

N=54

~55% lower connectivity

Kosmyna et al. (EEG)

N=444

−0.8 SD time

Noy & Zhang (Science)

N=758

−19 pp outside frontier

Dell’Acqua (BCG, GPT-4)

What the structural argument gets right

Stack Overflow is the poster child. A peer-reviewed PNAS Nexus paper by del Rio-Chanona, Laurentsyeva, and Wachs found that Stack Overflow's posting activity dropped roughly 25% within six months of ChatGPT's release, measured relative to Russian and Chinese counterpart forums where ChatGPT had no access. Stack Overflow's own data explorer shows monthly question volume back to 2009 levels.^[7]The public stock of “how do I solve this Postgres problem at 2 a.m.” is being built more slowly than the answers are being consumed. The thin public signal really is getting thinner.

The author matters too. This is the same Acemoglu who in May 2024 estimated AI's total-factor productivity contribution at no more than 0.66% over a decade.^[8]That paper was widely treated as a contrarian take. The 2026 paper is a sharpening of the same skepticism. He's no longer saying “AI doesn't matter for gross domestic product (GDP).” He's saying “AI matters in a way the GDP accounting can't see.” If you ignored his earlier work because you thought 0.66% was wrong, this is the version of the argument that's harder to dismiss. The mechanism is structural, not parametric.

Why this matters

The interesting claim isn't “AI is bad for the commons.” It's that there's a measurable parameter, effort elasticity, whose empirical value determines whether agentic AI lands you in a welfare-improving regime or a collapse one. The fight is over the parameter. That's a much narrower argument than the press cycle, and it's the one engineers and product leads can actually do something about.

Where I push back

The paper itself flags the first thing worth pushing back on. The authors flag the analysis as purely theoretical in the conclusion.^[1]The model proves a knowledge-collapse equilibrium exists under specific parameter regions. It does not estimate whether we're in those regions. Anyone who tells you Acemoglu just proved AI will end human knowledge is reading the paper backwards.

Second, Kosmyna's most viral finding (the persistent connectivity gap in Session 4) leans on 18 people. The Vienna critique is real. Independent reviewers have flagged a 1000-comparison repeated-measures analysis of variance corrected only by false discovery rate, not family-wise error rate. The motor-and-visual confound is real. The four-month follow-up is shorter than what would establish actual lasting change.^[3]Take it as a yellow flag, not a red one. (And if the “83% can't quote their own essay” number gives you a chuckle, side note: most of us couldn't quote a Slack message we sent ten minutes ago either. The interesting part is the connectivity gap, not the recall gap.)

Third, the elasticity isn't uniform. METR ran a randomized controlled trial in July 2025 with 16 experienced open-source maintainers. AI tools made them 19% slower, despite the developers themselves predicting a 24% speedup.^[9]When the human is the expert, the relationship inverts. AI degrades performance, not enhances it. And Brynjolfsson, Li, and Raymond's customer-support study, 5,179 agents, found AI assistance acting as a learning multiplier, not a substitute. High performers' patterns got disseminated to low performers through the AI, and the productivity distribution narrowed.^[10]That is a positive learning effect, not a substitution effect. It's evidence that “effort goes down on contact with AI” isn't universal.

Effort elasticity isn't a single number. It's high in domains where the AI is competent and the human is a novice (Noy-Zhang). It's low or even negative in domains where the AI is uncertain and the human is an expert (METR). Acemoglu's model assumes one. The world has many.

−19%

AI made them slower

METR (16 expert OSS maintainers, 2025)

+14%

AI dispersed top performers’ patterns

Brynjolfsson, Li, Raymond (5,179 agents)

Takeaway

The same parameter the model assumes as a scalar shows up in the empirics as a function of who's using the AI and what for. That doesn't kill the result. It narrows it.

What this means for builders, labs, and policy

The paper proposes an information-design regulation that throttles agentic AI precision. The mechanism is a two-phase policy: full suppression in phase one to let the public stock recover, then a permanent cap on effective precision in phase two.^[1]As policy this is hard to enforce. As a design principle for AI products it's interesting. Frontier labs racing to push precision higher might be racing past the welfare peak on every benchmark that doesn't measure downstream commons replenishment, which is all of them.

The result that builders should treat as a hard design constraint, not a regulatory hypothetical, is the aggregation result. The paper shows that increasing aggregation capacity, meaning making it cheaper and easier to pool human-generated signals into the commons, raises welfare unambiguously.^[1]There's no parameter region in the paper where better aggregation is worse. If you build with LLMs, this is the lever to pull. Agentic systems should generate public signal, not just suppress it. When a developer solves a hard bug with an AI coding agent, what's the equivalent of the Stack Overflow answer they would have written? When a consultant uses a frontier model to draft a market-sizing model, what's the version of that artifact that gets pooled into the firm's collective intelligence? Most products today don't ask this question. The math says they should. (For a recent example of a product that explicitly chose to be an on-ramp rather than a destination, see my piece on Claude Design: that scoping decision is, accidentally or not, the commons-respecting move in this paper's framing.)

The labs are not, currently, treating the open web as a commons they're stewarding. Anthropic settled Bartz v. Anthropic for roughly $1.5 billion in August 2025 over training-data sourcing, the largest US copyright settlement on record.^[11] OpenAI signed Reddit for an estimated $70 million per year in May 2024 to license what was, until that moment, a public archive of questions and answers.^[12]The current frame is “the open web is raw material,” not “the open web is a commons we have to keep replenishing.” The Acemoglu paper is a structural argument for why that frame is short-sighted, even on commercial grounds. (For where I think the frontier-model accuracy race actually matters, see my piece on the GPT-5.5 vs Claude Opus 4.7 comparison; the relevant point here is that those benchmarks measure decision quality and not the externality this paper is about.)

Pay the cost yourself

For Acemoglu, Kong, and Ozdaglar, the load-bearing assumption is effort elasticity. Three behavioral studies and one EEG study point in the direction of “yes, in the domains where AI is competent and humans are novices, effort really does collapse.” That's not the same as proving collapse. It's the same as showing the model isn't a math exercise.

The math is on the page. The empirics are starting to come in. Don't be the elasticity.

Sources and further reading

1.PrimaryAcemoglu, Kong, Ozdaglar (2026): AI, Human Cognition and Knowledge Collapse (NBER WP 34910). Full PDF, including the 30-page mathematical appendix, on the NBER and MIT Economics websites.
2.PrimaryKosmyna et al. (2025): Your Brain on ChatGPT, Accumulation of Cognitive Debt (arXiv:2506.08872, v2 Dec 2025). MIT Media Lab. 54 participants, 32-channel EEG, four sessions over four months.
3.PrimaryStankovic et al. (2025): Comment on Your Brain on ChatGPT (arXiv:2601.00856). University of Vienna critique covering sample size, reproducibility, EEG methodology, reporting consistency, and transparency.
4.PrimaryShumailov et al. (2024): AI models collapse when trained on recursively generated data (Nature 631). The canonical model-collapse paper. Synthetic-on-synthetic training erases the tails of the input distribution.
5.PrimaryNoy & Zhang (2023): Experimental evidence on the productivity effects of generative artificial intelligence (Science 381). 444 college-educated professionals. Time fell 0.8 SD, quality rose 0.4 SD; ChatGPT substitutes for effort rather than complementing skills.
6.PrimaryDell’Acqua et al. (2023): Navigating the Jagged Technological Frontier (HBS WP 24-013). 758 BCG consultants. Inside-frontier productivity gains, outside-frontier 19-pp accuracy loss.
7.Primarydel Rio-Chanona, Laurentsyeva, Wachs (2024): Large language models reduce public knowledge sharing on online Q&A platforms (PNAS Nexus 3(9), pgae400). Stack Overflow activity fell ~25% within six months of ChatGPT’s release, relative to Russian and Chinese counterpart forums.
8.PrimaryAcemoglu (2024): The Simple Macroeconomics of AI (NBER WP 32487). The earlier Acemoglu paper that put a 0.66% TFP ceiling on a decade of AI productivity gains.
9.PrimaryBecker et al. (METR, 2025): Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. 16 experienced maintainers, 246 tasks. AI tools produced a 19% slowdown despite predicted 24% speedup.
10.PrimaryBrynjolfsson, Li, Raymond (2023): Generative AI at Work (NBER WP 31161). 5,179 customer-support agents. AI assistance raised productivity 14%, with strongest gains for less-experienced workers.
11.ReportingReuters (2025): Anthropic agrees to pay $1.5 billion to settle authors’ lawsuit. Bartz v. Anthropic. Reported as the largest publicly disclosed copyright settlement in US history.
12.ReportingOpenAI/Reddit data licensing deal (TechCrunch, May 2024). Estimated ~$70M per year, reverse-engineered from Reddit COO disclosure that licensing is ~10% of revenue.

The model: general knowledge is a commons

Knowledge collapse: when better AI is worse

The brain study supplies the elasticity, sort of

Same condition each week

Brain connectivity drops with tool use

LLM users couldn’t quote their own essays

Crossover: 18 participants, switched conditions

What the structural argument gets right

Where I push back

What this means for builders, labs, and policy

Pay the cost yourself