ALTAANZ
  • About ALTAANZ
  • ALTAANZ Committee
    • Current Committee
    • Past Committees >
      • 2024 Committee
      • 2022 - 2023 Committee
      • 2021 Committee
      • 2020 ALTAANZ Committee
      • 2018 - 2019 ALTAANZ Committee
      • 2017 ALTAANZ Committee
      • 2016 ALTAANZ Committee
      • 2015 ALTAANZ Committee
      • 2014 ALTAANZ Committee
      • 2013 ALTAANZ Committee
      • 2012 ALTAANZ Committee
      • 2011 ALTAANZ Committee
  • Events
    • ALTAANZ Online conference 2025 >
      • Conference info
      • ALTAANZ conference registration 2025
      • Keynote Speakers
      • Featured sessions
      • ALTAANZ 2025 Mentor-mentee program
    • Past Conferences >
      • The Applied Linguistics ALAA/ALANZ/ALTAANZ Conference 2024
      • ALTAANZ Online Conference 2023 >
        • Program 2023
        • Plenary Sessions 2023
        • Registration 2023
        • Conference Committee 2023
      • ALANZ - ALAA - ALTAANZ 2022
      • ALTAANZ Online Research Forum 2021
      • LTRC/ALTAANZ Online Celebratory event 2020 >
        • About the event
        • Event Programme
        • LTRC Anniversary Symposium
      • ALANZ / ALAA / ALTAANZ Auckland 2017
      • ALTAANZ Conference Auckland 2016 >
        • Keynote Speakers >
          • Plenary Abstracts
        • Teachers' Day
        • Pre-conference workshops
        • Conference programme
      • ALTAANZ Conference Brisbane 2014
      • ALTAANZ Conference Sydney 2012
    • Past Workshops >
      • LTRC / ALTAANZ Workshops July 2014 >
        • Test analysis for teachers
        • Diagnostic assessment in the language classroom
        • Responding to student writing
        • Assessing Pragmatics
        • Introduction to Rasch measurement
        • Introduction to many-facet Rasch measurement
      • LTRC / ALTAANZ workshops September 2015 >
        • A Practical Approach to Questionnaire Construction for Language Assessment Research
        • Integrating self- and peer-assessment into the language classroom
        • Implementing and assessing collaborative writing activities
        • Assessing Vocabulary
        • Revisiting language constructs
  • SiLA Journal
    • About SiLA
    • SiLA Publication Policies
    • Early View Articles
    • Current Issue
    • Past Issues >
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
      • 2017
      • 2016
      • 2015
      • 2014
      • 2013
      • 2012
    • Editorial Board
    • Submission Guidelines
  • Awards
    • SiLA Best Paper Award
    • PLTA Best Paper Award 2013-2021
    • ALTAANZ Best Student Journal Article Award
    • ALTAANZ Best Student Paper Award
    • Penny McKay Award
  • Funding Opportunities
  • Newsletter: Language Assessment Matters
  • Resources
    • Best practice in language testing & assessment
  • Join ALTAANZ
  • Contact us
Picture
Automatic language proficiency assessment of written texts: Training a CEFR classifier in L2 Finnish
Jenny Tarvainen, Ida Toivanen & Ari Huhta, University of Jyväskylä, Finland
https://doi.org/10.58379/YWAV5140
Volume 14, Issue 2, 2025
Abstract: This paper explores automatic assessment of written language proficiency. We have trained a CEFR-classifier for texts that have been written by Finnish as a second or foreign language (L2) learners. The aim of the study is to investigate to what extent we can model human ratings with the available L2 data for Finnish, whether the accuracy of the predictions varies at the different CEFR levels, and whether we can explain the discovered misclassifications. The FinBERT model was trained on the largest available CEFR-annotated datasets for L2 Finnish: ICLFI, LAS2, CEFLING, and TOPLING, which represent different kinds of genres, first language backgrounds, ages, and genders. The results are promising, with an F1-score of 72.7% and a .86 Pearson correlation between machine predictions and human assessors. Learners’ gender was not related to classification accuracy but learners’ L1 background may have some effect. However, text length seemed to cause misclassifications as unusually short or long samples were often assessed lower / higher than expected. While more annotated data is needed to train a more accurate model for higher-stakes assessment purposes, the open-source model developed in the study is likely useful for formative feedback purposes and paves the way for further work in the future.
Keywords: CEFR, automatic assessment, writing, L2, AI, NLP
Click to download Full Text