A4 Refereed article in a conference publication
BeLLE: Exploring the Accuracy of the Task Difficulty Ratings in the Bebras Challenge
Authors: Kaarto, Heidi; Niemensivu, Timi; Dagienė, Valentina; Lehtonen, Daranee; Laakso, Mikko-Jussi; Van Hoof, Jo
Editors: Staub, Jacqueline; Singla, Adish
Conference name: International Conference on Informatics in Schools: Situation, Evolution, and Perspectives
Publisher: Springer Science and Business Media Deutschland GmbH
Publication year: 2025
Journal:Lecture Notes in Computer Science
Book title : Informatics in Schools: Fostering Problem-Solving, Creativity, and Critical Thinking Through Computer Science Education: 18th International Conference on Informatics in Schools: Situation, Evolution, and Perspectives, ISSEP 2025
Volume: 15958
First page : 167
Last page: 181
ISBN: 978-3-032-01221-0
eISBN: 978-3-032-01222-7
ISSN: 0302-9743
eISSN: 1611-3349
DOI: https://doi.org/10.1007/978-3-032-01222-7_13
Web address : https://doi.org/10.1007/978-3-032-01222-7_13
The topic of informatics has been adopted more systematically in many curricula around the world to prepare school students for the ever-changing digital world. Additionally, the annual Bebras Challenge on Informatics and Computational Thinking was created in Lithuania in 2004. Since then, the Challenge has grown into an international phenomenon, and in 2021, an international research consortium BeLLE was created by adopting the digital learning environment ViLLE for the Challenge. Prior to the Challenge, the tasks are assigned difficulty ratings (easy, medium, hard) by the Bebras Community and the organizers in each country. The Community difficulty ratings guide the organizers and the country difficulty ratings affect the points given or deducted depending on the students’ answers. It is therefore vital to get the difficulty ratings as accurate as possible. This study investigates the accuracy of these difficulty ratings by comparing them with student performance, with a particular focus on determining whether the Bebras Community or the country ratings are more accurate. Data from two years (2022 and 2023) of the Challenge is used, including 99 tasks from more than 200,000 primary and secondary school students in eleven countries that are part of the BeLLE Consortium. Item Response Theory (IRT) results show that the Challenge is generally quite difficult and there is a lot of overlap between the Community difficulty categories with only the easy category differing from the other two categories statistically. Concerning the country difficulty ratings, all three difficulty categories differ from each other, and the results from a regression model indicate that the country difficulty ratings explain the IRT difficulty estimates more accurately which implies that the organizers in each country should evaluate the difficulty of the tasks again themselves rather than use the Community ratings as they are.
Funding information in the publication:
The present study is part of the EDUCA Flagship funded by the Research Council of Finland (#358924, #358947) and the EDUCA-Doc Doctoral Education pilot funded by the Ministry of Education and Culture (Doctoral school pilot #VN/3137/2024-OKM-4).