- OT
- View all news
- Australian researchers test performance of ChatGPT on optometry exam questions
Australian researchers test performance of ChatGPT on optometry exam questions
Scientists find the latest version of the AI tool “excelled” across a range of optometry and vision science written questions
06 August 2025
Researchers from the University of New South Wales have described the performance of a large language model (LLM) across a variety of optometry and vision science written response questions.
Writing in Ophthalmic and Physiological Optics, scientists highlighted that earlier models of ChatGPT (GPT-3.5 and GPT-4) demonstrated “variable but generally passable performance” across the set of sample questions – which included past written exam questions.
The latest version of ChatGPT (o1) “excelled across all questions,” the authors noted.
“The results of the study have shown that LLMs are able to generate satisfactory responses to various assessment questions in the field of optometry and vision science, and in many cases excel at these,” the researchers highlighted.
“Subsequent models showed significantly greater capabilities over preceding models,” they added.
The authors also assessed the performance of ChatGPT as a grader of written questions, by exploring the concordance between the AI tool and a human grader.
They found that while ChatGPT graders generally awarded higher marks than human graders, this was only statistically significant for GPT-3.5.
“The result of the study suggests there is an urgent need for optometry and vision science educators to adopt new learning and teaching strategies in the ‘ChatGPT-era’,” the researchers stated.
- Explore more topics
- Research
- Artificial intelligence
- Students
- Universities
Comments (1)
You must be logged in to join the discussion. Log in
Don Williams13 August 2025
This is a valuable and timely study. It confirms what many of us have observed at the coalface: model quality has moved from variable to consistently strong, with o1 now producing coherent, clinically plausible written answers to optometry and vision science questions. That is encouraging for educators who want richer explanations, rapid feedback and new ways to support learning.
The caution is that written response performance is not the same as safe clinical reasoning. Real practice involves incomplete data, image interpretation, atypical presentations and choices under uncertainty. There is also a prompt effect. The model’s best work often reflects a well structured prompt. If the authors used excellent prompts, results may overstate what an average student or educator will obtain. Uneven prompt skill can introduce equity issues and inflate perceived competence.
For education, the response should be pragmatic. Redesign assessments to reveal thinking and judgement through viva style questioning, data interpretation with OCT and fields, and supervised case work. If AI is used for formative marking, keep human moderation, clear rubrics and version transparency. Teach students to critique model outputs against primary sources and local policy, disclose any AI assistance and show their reasoning.
In short, this is good news for teaching and learning. Treat the models as accelerators for explanation and practice, not as substitutes for expertise or accountability.
ReportLike0