Completion vs Proficiency

Your people completed the training. Do you know if it worked?

Most enterprises measure AI readiness through completion rates, course hours, and certificates. None of these reliably predict whether employees can actually use AI in their work. The distinction between tracking training activity and measuring professional capability is the most consequential gap in enterprise AI adoption today.

Data from BCG-Harvard, Stanford, Section AIPeer-reviewed evidence
The Completion Trap

Completion rates tell you who finished a course. They tell you nothing about who can use AI.

Enterprises have invested heavily in AI training over the past three years. The programmes are well-intentioned. The completion rates look healthy. The certificates are prominently displayed. And the underlying assumption — that completing training produces competence — goes largely unexamined.

The research does not support that assumption.

The BCG-Harvard study of 758 consultants, published in Organization Science in March 2026, found that AI training alone showed no statistically significant performance advantage over simple tool access. The consultants who received training and those who simply received the tool performed indistinguishably. The variance in outcomes was driven by individual proficiency — a characteristic that training completion cannot detect.

Section AI's 2026 Proficiency Report, which combined surveys with hands-on skill testing across 5,000 knowledge workers, confirmed the pattern at scale: 97% of the workforce are using AI poorly or not at all. Among them, employees who had completed AI training programmes still scored only 40 out of 100 on proficiency assessments. They remained firmly in the “experimenter” category — capable of basic prompting but unable to reliably evaluate AI output, identify when AI use is inappropriate, or manage the risks of AI-generated content in professional settings.

Measuring training completion is like measuring gym memberships instead of fitness levels. The data is easy to collect, impressive to report, and almost entirely uncorrelated with the outcome that matters.
What Completion Actually Measures

What completion tracks, what it measures — and what it misses

Training completion metrics record that an employee watched the videos, clicked through the slides, and passed a recall-based quiz. They can tell you who engaged with the content and who did not. This information is not worthless — it reveals who showed up.

What completion metrics cannot tell you is whether the employee can now formulate a prompt that produces usable output on the first attempt, identify a selectively accurate statement in an AI-generated analysis, determine when a task falls outside AI's capability boundary, or navigate the disclosure and privacy obligations of using AI with client data. These are the capabilities that determine whether AI use creates value or creates risk in professional work. And they require a different kind of measurement entirely.

The old way
Tracking whether someone attended the lecture
The new way
Testing whether they can diagnose the patient
The old way
Asking someone how healthy they feel
The new way
Running a blood panel

The distinction is not academic. Classic transfer-of-training research estimates that only 10–15% of training effectively transfers to workplace application (Georgenson 1982, Ford et al. 2018). The Association for Talent Development found only 12% of employees effectively apply new skills on the job. These transfer rates were established before AI — where the distance between watching a tutorial and applying judgment under uncertainty is particularly wide.

The Hidden Cost

The gap between completion and proficiency has a price — and someone is paying it

$2,232
annual rework cost per employee from AI-generated “workslop”
BetterUp Labs & Stanford, HBR Sep 2025
4%
of learning leaders can communicate tangible business outcomes of their programmes
CEB research
92%
of business leaders fail to see the impact of learning initiatives
CEB research

Consider the 4% figure. In 96% of organisations, the executive team cannot see what L&D delivers. When the board asks “how AI-ready is our workforce?” and the only available answer is a completion percentage, the L&D function is evaluated on the wrong metric — and the investment is protected by faith rather than evidence.

Meanwhile, the cost of unmeasured AI use accumulates. BetterUp Labs and Stanford's Social Media Lab found that 40% of employees receive AI-generated “workslop” each month — low-quality content that requires an average of 1 hour and 56 minutes to resolve per incident. For a 1,000-person organisation, that translates to over $2.2 million per year in invisible rework. The completion rate of the training programme that was supposed to prevent this looks fine. The spreadsheet doesn't show what completion failed to produce.

The Comparison

What changes when you measure proficiency instead of completion

DimensionCompletion TrackingProficiency Measurement
What it measuresWho finished the courseWho can apply AI effectively in their role
Underlying scienceClassical Test Theory at most — percentage-correct scoringItem Response Theory — the methodology behind major standardised assessments worldwide
Question difficultyAll items treated equallyEach item calibrated for difficulty and discrimination — harder questions tell you more
Score precisionPercentage correct — no error estimateConfidence interval on every score — you know how precise the estimate is
ComparabilityScores depend on which quiz version was takenScores comparable across different forms — because ability is estimated independently of specific items
What it detectsWho engaged with contentWho can evaluate AI output, identify errors, exercise judgment, and manage risk
Gaming resistanceLow — fixed questions, no adaptationHigh — unique forms, adaptive difficulty, response pattern analysis, timing monitoring
Growth measurementRepeated completion measures re-engagement, not growthPre/post designed on the same psychometric scale — growth reported only when it exceeds measurement error
Board presentation“87% completed the programme”“Advisory is at Competent level. Tax is Developing. Here's where to invest next quarter.”
The analogyCounting gym membershipsMeasuring fitness levels
Why Self-Assessment Fails Too

The people with the largest gaps are the least accurate at identifying them

The instinctive response to the completion gap is often “we'll survey employees on their AI confidence.” The evidence shows why this produces worse data, not better.

Aalto University researchers published findings in Computers in Human Behavior (February 2026) that upend the assumption behind self-assessment. In two studies with approximately 500 participants, they found a reverse Dunning-Kruger effect: higher AI literacy correlated with greater overconfidence, not better self-calibration. Participants using ChatGPT overestimated their correct answers by 4 points out of 20 — a gap larger than the actual performance improvement from using AI. Financial incentives for accurate self-assessment did not correct the bias.

The industry data confirms it at scale. 79% of tech workers admit to pretending they know more about AI than they do (Pluralsight 2025). 81% profess confidence in their AI skills, but only 12% have significant hands-on experience. And 64% of workers pass off AI-generated content as their own (Salesforce 2024) — a behaviour that self-assessment by definition will not reveal.

Asking employees how well they use AI is the equivalent of asking patients to self-diagnose. The people who most need the diagnosis are the least equipped to provide it.

Kruger and Dunning's foundational 1999 research found that bottom-quartile performers rate themselves at the 58th–62nd percentile on average. The gap is structural, not motivational — people lack the very skills needed to recognise their own deficiency. In the context of AI, where outputs appear fluent and authoritative regardless of their accuracy, this metacognitive blind spot is particularly dangerous.

Completion tracking misses the problem. Self-assessment misrepresents it. Standard LMS quizzes — built on Classical Test Theory where all items count equally — lack the precision to detect it. The difference between an LMS quiz and psychometric proficiency measurement is the difference between a pop quiz and a medical board exam: one checks recall, the other measures whether you can practise. Performance-based psychometric assessment measures the capability that matters — with known precision, calibrated difficulty, and scores that mean the same thing regardless of which questions were asked.

The Evidence

Three findings that close the argument

The BCG-Harvard study (Dell'Acqua et al., Organization Science, March 2026) enrolled 758 consultants in a pre-registered randomised controlled trial. AI-proficient workers produced 40% higher-quality output on suitable tasks. Workers who misjudged AI's capability boundary performed 19 percentage points worse than colleagues using no AI at all. And approximately 10% — the “Sleeping Drivers” — passively delegated to AI without exercising judgment, producing the worst outcomes of any group. Training completion could not distinguish proficient users from Sleeping Drivers. Proficiency measurement can.

Gartner's 2025 Strategic Predictions forecast that 75% of hiring processes will include AI proficiency certifications and testing by 2027 — while simultaneously predicting that 50% of organisations will require “AI-free” skills assessments to counter critical-thinking atrophy from generative AI use. Both predictions point in the same direction: the era of treating AI readiness as a training checkbox is ending.

A 2024 Nature systematic review (npj Science of Learning) evaluated 16 AI literacy measurement scales across 22 studies and concluded that no psychometrically validated gold standard for measuring AI literacy exists. Most scales demonstrated adequate structural validity, but very few had been tested for cross-cultural validity, measurement error, or criterion validity. The gap between AI adoption and validated proficiency measurement is documented at the highest levels of academic research.

The Shift

From tracking activity to measuring capability

AI proficiency measurement applies the same psychometric science that has been trusted for 60 years in the highest-stakes assessments — from graduate admissions to medical licensing to military selection — to the specific question of how effectively professionals use AI in their work.

The approach differs from completion tracking in three fundamental ways. First, it accounts for question difficulty. Answering a hard question correctly reveals more about proficiency than answering an easy one — a principle that percentage-correct scoring ignores entirely. Second, it produces comparable scores across different test forms. Because ability is estimated independently of the specific items asked, two employees who take different versions of the assessment receive scores on the same scale. Third, every score includes a confidence interval — an explicit estimate of how precise the measurement is, preventing managers from over-interpreting small differences that may be noise.

The result is organisational visibility that completion metrics cannot provide: which teams can deploy AI effectively today, which need targeted development, where the risks of unsupervised AI use are highest, and whether training investments are producing measurable change over time. The board presentation shifts from “87% completed the programme” to “Advisory is at Competent level, Tax is Developing, and here is where the next quarter's investment should go.”

The question is not whether to invest in AI training. It is whether you can verify that the investment is producing the capability your organisation needs.
Common Questions

Completion, proficiency, and measurement

Why doesn't training completion predict AI proficiency?
Completion measures whether someone finished a course — not whether they can apply what they learned. The BCG-Harvard study found no significant performance advantage from training alone. Section AI's hands-on testing found trained employees scored only 40 out of 100. Transfer-of-training research estimates only 10–15% of training transfers to workplace application.
What is the completion trap?
The completion trap describes the enterprise practice of measuring AI readiness through completion rates, course hours, and certificates — metrics that track activity rather than capability. Only 4% of learning leaders can communicate tangible business outcomes of their programmes, and 92% of business leaders cannot see the impact. High completion rates create a false sense of readiness.
What is the difference between Classical Test Theory and Item Response Theory?
Classical Test Theory (used in most LMS quizzes) treats all items equally. Item Response Theory accounts for item difficulty and discrimination, produces ability estimates independent of the specific questions asked, and provides confidence intervals on every score. IRT is the methodology behind major standardised assessments worldwide.
Can self-assessments measure AI proficiency?
Self-assessments are unreliable for AI skills. Aalto University research (February 2026) found higher AI literacy correlates with greater overconfidence, not better calibration. Separately, 79% of tech workers admit to pretending they know more about AI than they do. Performance-based psychometric measurement is the evidence-based alternative.
Can LMS quizzes measure AI proficiency?
Standard LMS quizzes use Classical Test Theory with basic percentage-correct scoring. They cannot account for question difficulty, adapt to the test-taker's level, produce comparable scores across versions, or provide confidence intervals. For a multidimensional capability like AI proficiency — spanning prompting, evaluation, judgment, and responsible use — the measurement instrument needs to match the complexity of the construct.

References

BetterUp Labs & Stanford Social Media Lab. (2025). The hidden toll of AI-generated work. Harvard Business Review, September 2025.

Dell'Acqua, F., McFowland, E., Mollick, E.R., et al. (2026). Navigating the jagged technological frontier. Organization Science. DOI: 10.1287/orsc.2025.21838.

Fernandes, D., Welsch, R., et al. (2026). The effects of AI on metacognitive accuracy. Computers in Human Behavior.

Ford, J.K., Baldwin, T.T., & Prasad, J. (2018). Transfer of training: The known and the unknown. Annual Review of Organizational Psychology and Organizational Behavior, 5, 201–225.

Gartner. (2025). Top Strategic Predictions for 2026 and Beyond. Gartner Research.

Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it. Journal of Personality and Social Psychology, 77(6), 1121–1134.

Nature. (2024). Systematic review of AI literacy measurement instruments. npj Science of Learning.

Pluralsight. (2025). 2025 AI Skills Report: Mind the Confidence Gap.

Salesforce. (2024). Trends in AI for CRM. Salesforce Research.

Section AI. (2026). 2026 AI Proficiency Report. Section.

See what proficiency measurement looks like in practice

The methodology page explains how psychometric proficiency measurement works in practice — and how it differs from every other approach to assessing AI readiness. The research page presents the full evidence base.