The OECD's Digital Education Outlook 2026 from earlier this year documents something most learning professionals have suspected and few have wanted to state outright: students using general-purpose AI tools produce better outputs but develop less actual competence. Remove the AI in a closed assignment, and the performance advantage disappears, and in some cases, it's a performance disadvantage. The report distinguishes between general-purpose AI – which produces an output advantage with no corresponding learning gain – and pedagogically designed educational AI, which does not sustain genuine improvement in knowledge and argumentation. Most institutions are not making that distinction, and neither are organizations from a learning and development (L&D) perspective. Most tend to be buying general-purpose AI, but call it pedagogically designed AI.
This is not necessarily an AI problem, but rather, a measurement problem that AI has simply exposed. L&D has always struggled to distinguish between people who completed the training and people who developed the capability. Completion rates, satisfaction scores, output quality on assessed tasks – these have always been proxies, because they are convenient and measurable. Indeed, actual capability transfer, was inconvenient because it's more difficult to measure effectively. So organizations measured what they could see and called it what they needed to believe. What AI has done is collapse the distance between those two things to zero. Most people are now able to produce excellent work with no capability development at all. Not that they don't have the capacity to do so. But the confusion that was already present has become structurally undeniable and more and more obvious.
The TalentLMS 2026 Annual L&D Benchmark Report makes the scale of the problem visible from a slightly different angle. Accordingly, 79% of HR managers say their organizations have adopted skills-based approaches to hiring, training, and development. Fewer than 40% report they have a reskilling strategy to match. That exact gap – between the mandate and the method – exists precisely because organizations have never been able to reliably measure the thing that matters. That is, whether people can perform a relevant skill, under pressure, without the support structure. The skills-based pivot is the right instinct. Yet, the measurement infrastructure to execute it does not yet exist in most organizations and the diffusion of AI has made that absence impossible to ignore.
The reckoning is coming for corporate L&D on roughly an 18th-month lag from what is happening in the education sector now. EdTech investors are already requiring verifiable proof of learning outcomes – logic models, third-party research, demonstrated improvements in actual mastery – before signing contracts. And that pressure is likely to reach procurement departments sometime soon. The organizations currently investing in AI-enhanced L&D platforms are making budget commitments they cannot verify against the outcome that actually matters. They are paying for the appearance of capability development. The OECD evidence suggests they may be receiving better-looking outputs in exchange.
Most people are now able to produce excellent work with no capability development at all.
The standard for genuine learning is not better outputs. Instead, it's performance under pressure when the AI is not available, when the impacts are real, and when what people actually know is the only thing that counts. Like many things, it's not a technology question, but a design question. Specifically, about what environments organizations build for learning, and what they are willing to test.
The measurement failure isn't a bug in a few L&D programs, it's being systematized at the level of how we assess readiness itself. The pattern extends beyond corporate training programs. When the College Board redesigned the SAT in 2024, it replaced reading passages of 500-750 words with passages of 25-150 words – its own Digital SAT Suite assessment framework describes the new length as comparable to "a social-media post" – with a singular passage. Previously, extended passages required sustained inference, synthesis, and interpretation across a complex text. The new format does not, and focuses on factual understanding only. The College Board's state justification: the ability to read and analyze extended texts is "not an essential prerequisite for college," and shorter passages better serve "students who might have struggled to connect with the subject-matter."
What the College Board is describing is accommodation, not measurement. The instrument has been recalibrated to fit current student capacity rather than to surface the gap between that capacity and what genuine college-level work demands. As a result, scores stabilize but the readiness gap doesn't close, it just stops appearing in the data. The same logic operates in corporate L&D: when completion rates and output quality stay high despite declining capability transfer, the temptation is to simply trust the metric, not to redesign the learning environment. The metric confirms what the organization needed to believe, but the capability is somewhere else.
Here is the diagnostic every CLO and VP of L&D should be sitting with: Can your organization distinguish between a team that has learned something and a team that has learned to produce outputs that look like they have? If the answer depends on completion rates, satisfaction score, or the quality of AI-assisted assessments, it's probably a no. And it's not a problem AI created. It's a problem that AI has made more apparent.

