Generative AI in assessing written responses of geography exams: challenges and potential - UTU Research Portal

A1 Refereed original research article in a scientific journal

Generative AI in assessing written responses of geography exams: challenges and potential

Authors: Jauhiainen, Jussi S.; Gagagorry Guerra, Agustín; Nylén, Tua; Mäki, Sanna

Publisher: Informa UK Limited

Publication year: 2025

Journal: Journal of Geography in Higher Education

ISSN: 0309-8265

eISSN: 1466-1845

DOI: https://doi.org/10.1080/03098265.2025.2593484

Publication's open availability at the time of reporting: Open Access

Publication channel's open availability : Partially Open Access publication channel

Web address : https://doi.org/10.1080/03098265.2025.2593484

Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/505817244

Abstract

This article examines the application of Large Language Models (LLM) – GPT-4, Claude, Cohere, and Llama – to assess students’ open-ended responses in Geography exams. The models’ assessment scores were compared to assessment and scores by the original multi-stage human assessment as well as two additional human expert scoring. The case study considers the high-stakes national matriculation exam in Finland. The exam results play a crucial role in determining individuals’ eligibility for higher education, including a study right in Geography at the university. We selected 18 essays that had originally been given 5 (basic), 10 (good) and 15 (excellent) points on a scale from 0 to 15 points. Findings show variability between LLMs and notable differences between LLM and human evaluations. The language of responses and grading instruction influenced LLM performance. These results highlight the potential and complexities of integrating generative AI today in learning assessments to score open-ended responses. Precise control of prompts and LLM settings proved crucial for the LLM to align with original assessment scores more closely.

Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

Generative AI in assessing written responses of geography exams challenges and potential.pdf