Assessment After Proxy Collapse
Assessment after proxy collapse is a developing problem in education assessment. It describes what happens when a traditional assignment artifact stops working as reliable evidence of student understanding because generative AI makes the artifact easier to produce without the intended learning.
The phrase "proxy collapse" is useful practitioner language, but the underlying issue is older than generative AI. Assessment designers have long distinguished between the visible task a student completes and the construct the task is meant to measure. A written reflection, essay, quiz answer, or project report may stand in for understanding, judgment, or skill. When the link between the artifact and the intended construct weakens, the assessment interpretation becomes less secure.
Generative AI changes this relationship because it can help produce plausible written, visual, analytic, or conversational outputs. The central question is not whether students use AI. It is what evidence an assessment now provides, what claims instructors can reasonably make from that evidence, and what review, policy, and design changes are needed.
Background
Educational assessment depends on evidence. An assessment task asks students to do something observable so instructors can infer something less directly observable, such as understanding, reasoning, fluency, judgment, or skill.
The Standards for Educational and Psychological Testing frame assessment quality around evidence for the intended interpretation and use of scores, including validity, reliability, fairness, administration, and scoring. These concepts matter more when technology changes how students produce assessment artifacts.
Authentic assessment offers one response. Grant Wiggins argued for assessment based on meaningful performance tasks rather than only indirect indicators. Authentic tasks can make learning more visible, but they are not automatically immune to generative AI. If an AI system can produce the visible artifact, the instructor still has to decide what evidence remains: the final product, the process, the defense, the revision history, the live explanation, or some combination of these.
Generative AI and Proxy Assignments
A proxy assignment is not necessarily a bad assignment. Many assignments use a visible artifact as evidence for a broader learning goal. A reflection paper may stand in for engagement with a reading. A business memo may stand in for analysis and judgment. A programming assignment may stand in for systems understanding. A project report may stand in for design reasoning.
Proxy collapse occurs when the cost of producing the artifact falls faster than the evidence value of the artifact can be maintained. Generative AI can create fluent text, summarize material, draft code, simulate analysis, or produce plausible explanations. When students can submit these outputs without doing the intended cognitive work, the artifact alone becomes a weaker basis for inference.
This does not make written assignments useless. It does mean that some assignments need a clearer answer to what is being assessed and how the instructor will know. In some cases, the response may be to redesign the task. In others, it may be to add process evidence, oral defense, supervised work, version history, peer critique, or explicit AI-use documentation.
Assessment Responses
Assessment reform in the age of generative AI is not one technique. It is a set of design choices.
One response is to make assessment more authentic: ask students to perform tasks that more closely resemble the actual use of knowledge or skill. Another is to make the process visible through drafts, logs, annotated decisions, oral explanation, or live work. A third is to separate learning activity from certification activity, allowing AI-supported practice while reserving certain demonstrations for controlled or defended settings.
Institutional guidance from TEQSA, UNESCO, and the U.S. Department of Education points toward assessment designs that preserve human responsibility, align AI use with learning goals, and address equity, privacy, and transparency. These sources do not imply that every course needs the same policy. They do suggest that assessment decisions should be explicit rather than hidden inside inherited assignment formats.
AI-Mediated Oral and Interview Assessments
Oral exams and interview-style assessments are one possible response to proxy collapse. They can ask students to explain reasoning in real time, respond to follow-up questions, and demonstrate whether they understand the submitted work.
A 2026 preprint describes scalable voice-AI oral assessments that use dynamic questioning from instructor-defined rubrics. Teaching centers have also discussed renewed interest in oral exams as generative AI changes the trust model for written work. These examples are early. They show that AI-mediated oral assessment is possible, not that it is settled.
Oral and interview assessments introduce their own design problems. They can create stress for students, require accessibility accommodations, generate sensitive records, and raise questions about consistency across students. If an AI interviewer is involved, instructors also need to decide how prompts are written, how transcripts are stored, how scoring is reviewed, and what happens when a student challenges the result.
Rubrics, Calibration, and Human Review
Rubrics become more important when AI assists with assessment. A rubric can clarify what counts as evidence, how different performance levels are distinguished, and how an instructor or AI system should respond to student work. Rubrics do not remove judgment. They structure it.
Instructor calibration is the process of aligning scoring criteria, examples, feedback, and review practices. For AI-supported assessment, calibration may include benchmark responses, sample interviews, instructor overrides, prompt revisions, and audit trails. If the AI system asks questions or suggests scores, the instructor still needs a basis for trusting, correcting, or rejecting those outputs.
Human review is especially important for high-stakes decisions. AI can assist with questioning, summarizing, feedback, or preliminary scoring, but final grades, misconduct decisions, accommodations, and appeals should remain tied to accountable human processes.
Policy, Privacy, Fairness, and Accessibility
Assessment after proxy collapse is not only a teaching-design problem. It is also a policy and rights problem.
AI-mediated assessment can create new data: prompts, transcripts, recordings, model outputs, scores, feedback, and reviewer notes. In U.S. educational settings, student privacy obligations and education-record rights may be relevant. In any setting, institutions need to decide what is stored, who can inspect it, how long it is retained, and how students can challenge errors.
Fairness and accessibility also matter. A live oral exam may advantage some students and disadvantage others. A voice interface may create barriers for students with speech, hearing, language, anxiety, or disability-related needs. Digital assessment systems should be designed with accessibility principles and learner variability in mind, not treated as neutral by default.
The NIST AI Risk Management Framework provides a broader language for AI system risk: validity and reliability, safety, accountability, transparency, explainability, privacy, and fairness. These categories map well onto AI-supported assessment, where the harm is not only a wrong answer but a wrong educational judgment.
Source Session Anchor
This topic was sparked by a June 2026 RaidGuild fireside session with Adam Kerpelman, a University of Virginia McIntire School of Commerce professor and assistant director of student entrepreneurship. Kerpelman described how written and reflective assignments can stop functioning as reliable proxies when AI makes plausible output cheap to produce. He also described experimenting with Claude-based oral or interview-style assessments to evaluate whether students understand course material at classroom scale.
The session is a useful practitioner anchor. It should not be treated as the only source for the topic. The broader page depends on assessment literature, institutional guidance, AI risk frameworks, privacy guidance, accessibility guidance, and emerging examples of AI-mediated oral assessment.
Open Questions
Several questions remain open.
Should proxy collapse become the page title, or should it be treated as a practitioner phrase inside a broader page on assessment validity after generative AI?
Which assessment formats provide credible evidence of understanding without creating excessive surveillance, stress, bias, or accessibility burdens?
When AI is used to interview, score, summarize, or give feedback, what parts of the process require human review?
How should instructors document prompts, model behavior, rubrics, overrides, and appeals?
How should students inspect or challenge AI-generated assessment records?
Related Topics
- AI-mediated oral exams
- Instructor calibration for AI assessment
- Academic integrity policy after generative AI
- Authentic assessment
- AI-assisted grading
- Student privacy in AI assessment
- Accessible assessment design