RaidGuild Cohort
Back to wiki

Wiki page

Assessment After Proxy Collapse

A source-backed wiki draft on how generative AI weakens some artifact-based assessment proxies and pushes educators toward clearer evidence of student understanding, including oral assessment, calibration, privacy, fairness, and accessibility concerns.

ReviewedConfidence: mediumpublic

Assessment After Proxy Collapse

Assessment after proxy collapse is a developing problem in education assessment. It describes what happens when a traditional assignment artifact stops working as reliable evidence of student understanding because generative AI makes the artifact easier to produce without the intended learning.

The phrase "proxy collapse" is useful practitioner language, but the underlying issue is older than generative AI. Assessment designers have long distinguished between the visible task a student completes and the construct the task is meant to measure. A written reflection, essay, quiz answer, or project report may stand in for understanding, judgment, or skill. When the link between the artifact and the intended construct weakens, the assessment interpretation becomes less secure.

Generative AI changes this relationship because it can help produce plausible written, visual, analytic, or conversational outputs. The central question is not whether students use AI. It is what evidence an assessment now provides, what claims instructors can reasonably make from that evidence, and what review, policy, and design changes are needed.

Background

Educational assessment depends on evidence. An assessment task asks students to do something observable so instructors can infer something less directly observable, such as understanding, reasoning, fluency, judgment, or skill.

The Standards for Educational and Psychological Testing frame assessment quality around evidence for the intended interpretation and use of scores, including validity, reliability, fairness, administration, and scoring. These concepts matter more when technology changes how students produce assessment artifacts.

Authentic assessment offers one response. Grant Wiggins argued for assessment based on meaningful performance tasks rather than only indirect indicators. Authentic tasks can make learning more visible, but they are not automatically immune to generative AI. If an AI system can produce the visible artifact, the instructor still has to decide what evidence remains: the final product, the process, the defense, the revision history, the live explanation, or some combination of these.

Generative AI and Proxy Assignments

A proxy assignment is not necessarily a bad assignment. Many assignments use a visible artifact as evidence for a broader learning goal. A reflection paper may stand in for engagement with a reading. A business memo may stand in for analysis and judgment. A programming assignment may stand in for systems understanding. A project report may stand in for design reasoning.

Proxy collapse occurs when the cost of producing the artifact falls faster than the evidence value of the artifact can be maintained. Generative AI can create fluent text, summarize material, draft code, simulate analysis, or produce plausible explanations. When students can submit these outputs without doing the intended cognitive work, the artifact alone becomes a weaker basis for inference.

This does not make written assignments useless. It does mean that some assignments need a clearer answer to what is being assessed and how the instructor will know. In some cases, the response may be to redesign the task. In others, it may be to add process evidence, oral defense, supervised work, version history, peer critique, or explicit AI-use documentation.

Assessment Responses

Assessment reform in the age of generative AI is not one technique. It is a set of design choices.

One response is to make assessment more authentic: ask students to perform tasks that more closely resemble the actual use of knowledge or skill. Another is to make the process visible through drafts, logs, annotated decisions, oral explanation, or live work. A third is to separate learning activity from certification activity, allowing AI-supported practice while reserving certain demonstrations for controlled or defended settings.

Institutional guidance from TEQSA, UNESCO, and the U.S. Department of Education points toward assessment designs that preserve human responsibility, align AI use with learning goals, and address equity, privacy, and transparency. These sources do not imply that every course needs the same policy. They do suggest that assessment decisions should be explicit rather than hidden inside inherited assignment formats.

AI-Mediated Oral and Interview Assessments

Oral exams and interview-style assessments are one possible response to proxy collapse. They can ask students to explain reasoning in real time, respond to follow-up questions, and demonstrate whether they understand the submitted work.

A 2026 preprint describes scalable voice-AI oral assessments that use dynamic questioning from instructor-defined rubrics. Teaching centers have also discussed renewed interest in oral exams as generative AI changes the trust model for written work. These examples are early. They show that AI-mediated oral assessment is possible, not that it is settled.

Oral and interview assessments introduce their own design problems. They can create stress for students, require accessibility accommodations, generate sensitive records, and raise questions about consistency across students. If an AI interviewer is involved, instructors also need to decide how prompts are written, how transcripts are stored, how scoring is reviewed, and what happens when a student challenges the result.

Rubrics, Calibration, and Human Review

Rubrics become more important when AI assists with assessment. A rubric can clarify what counts as evidence, how different performance levels are distinguished, and how an instructor or AI system should respond to student work. Rubrics do not remove judgment. They structure it.

Instructor calibration is the process of aligning scoring criteria, examples, feedback, and review practices. For AI-supported assessment, calibration may include benchmark responses, sample interviews, instructor overrides, prompt revisions, and audit trails. If the AI system asks questions or suggests scores, the instructor still needs a basis for trusting, correcting, or rejecting those outputs.

Human review is especially important for high-stakes decisions. AI can assist with questioning, summarizing, feedback, or preliminary scoring, but final grades, misconduct decisions, accommodations, and appeals should remain tied to accountable human processes.

Policy, Privacy, Fairness, and Accessibility

Assessment after proxy collapse is not only a teaching-design problem. It is also a policy and rights problem.

AI-mediated assessment can create new data: prompts, transcripts, recordings, model outputs, scores, feedback, and reviewer notes. In U.S. educational settings, student privacy obligations and education-record rights may be relevant. In any setting, institutions need to decide what is stored, who can inspect it, how long it is retained, and how students can challenge errors.

Fairness and accessibility also matter. A live oral exam may advantage some students and disadvantage others. A voice interface may create barriers for students with speech, hearing, language, anxiety, or disability-related needs. Digital assessment systems should be designed with accessibility principles and learner variability in mind, not treated as neutral by default.

The NIST AI Risk Management Framework provides a broader language for AI system risk: validity and reliability, safety, accountability, transparency, explainability, privacy, and fairness. These categories map well onto AI-supported assessment, where the harm is not only a wrong answer but a wrong educational judgment.

Source Session Anchor

This topic was sparked by a June 2026 RaidGuild fireside session with Adam Kerpelman, a University of Virginia McIntire School of Commerce professor and assistant director of student entrepreneurship. Kerpelman described how written and reflective assignments can stop functioning as reliable proxies when AI makes plausible output cheap to produce. He also described experimenting with Claude-based oral or interview-style assessments to evaluate whether students understand course material at classroom scale.

The session is a useful practitioner anchor. It should not be treated as the only source for the topic. The broader page depends on assessment literature, institutional guidance, AI risk frameworks, privacy guidance, accessibility guidance, and emerging examples of AI-mediated oral assessment.

Open Questions

Several questions remain open.

Should proxy collapse become the page title, or should it be treated as a practitioner phrase inside a broader page on assessment validity after generative AI?

Which assessment formats provide credible evidence of understanding without creating excessive surveillance, stress, bias, or accessibility burdens?

When AI is used to interview, score, summarize, or give feedback, what parts of the process require human review?

How should instructors document prompts, model behavior, rubrics, overrides, and appeals?

How should students inspect or challenge AI-generated assessment records?

Related Topics

- AI-mediated oral exams

- Instructor calibration for AI assessment

- Academic integrity policy after generative AI

- Authentic assessment

- AI-assisted grading

- Student privacy in AI assessment

- Accessible assessment design

Key Claims

Traditional assignments can function as indirect evidence or proxies for student understanding rather than direct demonstrations of the intended construct.

Standards for Educational and Psychological Testing; Wiggins, The Case for Authentic Assessment

Generative AI can weaken the evidentiary value of some unsupervised written or reflective assignments by making plausible artifacts cheaper to produce.

Kerp fireside session; TEQSA assessment reform; BJET generative AI and authentic assessment article

The phrase proxy collapse is useful practitioner framing but should be tied to established assessment concepts such as validity, reliability, fairness, and assessment consequences.

Kerp fireside session plus assessment standards

Oral or interview-style assessment is one possible response, but AI-mediated oral exams remain an emerging practice with unresolved reliability, privacy, accessibility, and human-review questions.

Voice AI oral assessment preprint; Duke Kunshan and Penn teaching resources; NIST AI RMF

AI-supported assessment requires explicit rubric design, instructor calibration, human accountability, privacy review, and accessibility design.

Frontiers rubric article; UNESCO guidance; U.S. Department of Education AI report; NIST AI RMF; FERPA and WCAG/UDL resources

Source Sessions

Open Questions

  • Should proxy collapse remain the title, or should the final title use a more established assessment-validity phrase?
  • Which assessment formats provide credible evidence without creating disproportionate surveillance, stress, bias, or accessibility burdens?
  • What level of human review is required before AI-mediated interview evidence affects grades?
  • How should students inspect or challenge AI-generated assessment records?

Prompts

Course assessment redesign prompt

Given a course learning objective and a current written assignment, identify which parts of the assignment are direct evidence, which are proxy evidence, and how generative AI changes the evidentiary value of each part.

AI oral exam design prompt

Design a low-stakes oral assessment plan that tests understanding without turning the interview transcript into the only grading evidence. Include rubric criteria, human review, accommodation paths, and appeal handling.

Topic Context

topic

Assessment After Proxy Collapse

How generative AI changes artifact-based assessment and evidence of understanding.

Open in graph

Deeper Topics

No topics linked yet.

Nearby Topics

No topics linked yet.

Sibling Topics

topicseed

LLM-as-Judge Evaluation

Using language models as evaluators while preserving calibration and review.

topicseed

Human-Calibrated Assessment Workflows

Calibration, review, and reliability in human-guided AI assessment.

topicseed

AI-Assisted Grading

Rubrics, educator review, privacy, fairness, and grading reliability.

Possible Articles

No topics linked yet.

Further Reading

Papers

Tools

Tool

Tool

Tool

Related Topics

AI-Mediated Oral ExamsInstructor Calibration For AI AssessmentAcademic Integrity Policy After Generative AIAuthentic AssessmentAI-Assisted GradingStudent Privacy in AI AssessmentAccessible Assessment Design

Possible Topics

AI-Mediated Oral ExamsInstructor Calibration For AI AssessmentAcademic Integrity Policy After Generative AI

Source Artifacts

session

Portal event: June Cohort Fireside Chats (Adam Kerpelman)

Open source

Related Posts

Related Projects

No related projects have been linked yet.

Related Threads

No related threads have been linked yet.

Related Profiles

No related profiles have been linked yet.

Related Activity

No related activity has been linked yet.