Automatic language proficiency assessment of written texts: Training a CEFR classifier in L2 Finnish
Jenny Tarvainen, Ida Toivanen & Ari Huhta, University of Jyväskylä, Finland
|
https://doi.org/10.58379/YWAV5140
|
Volume 14, Issue 2, 2025
|
Abstract: This paper explores automatic assessment of written language proficiency. We have trained a CEFR-classifier for texts that have been written by Finnish as a second or foreign language (L2) learners. The aim of the study is to investigate to what extent we can model human ratings with the available L2 data for Finnish, whether the accuracy of the predictions varies at the different CEFR levels, and whether we can explain the discovered misclassifications. The FinBERT model was trained on the largest available CEFR-annotated datasets for L2 Finnish: ICLFI, LAS2, CEFLING, and TOPLING, which represent different kinds of genres, first language backgrounds, ages, and genders. The results are promising, with an F1-score of 72.7% and a .86 Pearson correlation between machine predictions and human assessors. Learners’ gender was not related to classification accuracy but learners’ L1 background may have some effect. However, text length seemed to cause misclassifications as unusually short or long samples were often assessed lower / higher than expected. While more annotated data is needed to train a more accurate model for higher-stakes assessment purposes, the open-source model developed in the study is likely useful for formative feedback purposes and paves the way for further work in the future.
Keywords: CEFR, automatic assessment, writing, L2, AI, NLP