New Delhi: Law professors mostly prefer AI-generated answers for their students over responses written by their peer instructors approximately 75 per cent of the time, states a new study. The findings came as a surprise to the researchers themselves.
The study, Law Professors Prefer AI Over Peer Answer, was published on 27 May. It was led by Stanford Law School professor Julian Nyarko.
“This study challenges important assumptions about AI’s role in legal education. We focused on law precisely because it requires judgment, nuanced reasoning, and the ability to navigate ambiguity—not just factual recall,” said Nyarko.
‘Surprised by the results’
The study enrolled sixteen contract law professors from across fourteen different US law schools, all of whom teach from the same casebook. Each professor had to write short answers to office-hours-style questions. The same questions were then presented to two AI systems, Google’s Gemini 2.5 Pro and its casebook-grounded variant, NotebookLM. Participant professors then evaluated 2,918 anonymised answer pairs, selecting which response they would rather give to a student.
The results were decisive. Professors rated the Large Language Models (LLM) far higher than their peers by 75 per cent, with AI performing as well as the best-rated instructor. All the professors preferred AI answers over human ones more than half the time.
“We were frankly surprised by the magnitude of the results. These weren’t just simple questions with obvious answers. Many of them required synthesizing complex material, applying it to new situations, and explaining legal concepts in ways that would help students develop their own analytical skills,” Nyarko added.
AI outperforming in hypotheticals, and not just on straightforward recall questions, is what makes the findings particularly significant. These are considered the most intellectually challenging among law schools, and require applying legal rules to novel facts, as opposed to simply recalling doctrines. LLMs proved effective in the critical application of said doctrines.
Furthermore, AI responses were notably safer. Professors flagged LLM answers as likely to harm student learning only about 3.5 per cent of the time, a figure that averaged 12 per cent for human instructors.
The study also examined whether the LLM preference was on account of writing styles, length or clarity. While the textual features proved to be an advantage, the influence was minimal.
Also read: Doomscrolling will steal 5 years of your life, UK study warns
Responses aligned to legal profession
One of the most theoretically interesting findings lies in what the study calls a “shared professional standard.” When multiple professors evaluated the same comparisons, they were in agreement with each other. The study argues that this proves that the AI responses were genuinely aligned with the norms and standards of the legal profession.
To extend their comparison beyond the two models evaluated by human judges, the researchers used Meta’s Llama-4 Maverick, another AI model, as an automated evaluator. Across eleven AI systems tested, Claude Opus 4.7 ranked highest overall, followed by ChatGPT 5.4 and Gemini 2.5 Pro. Every AI model outperformed human instructors on average.
The research noted NotebookLM’s poor performance in comparison to other LLMs. The Google AI model, which analyses the specific entered documents to answer questions in order to prevent off-topic hallucinations, performed worse than the stock version of Gemini on which it was built. The researchers suggest that feeding the model a casebook may have introduced noise rather than focus.
While the research probes the question of whether AI might have surpassed professors, the authors note that the quality of response differs from the learning impact.
“We therefore treat our results as an encouraging first indication that LLMs can reflect a shared professional standard in short-answer. Contracts pedagogy—not as proof of improved student outcomes, and not as evidence regarding richer tutoring interactions,” reads the research discussion.
(Edited by Saptak Datta)

