Academic vocabulary in Wikipedia articles: Frequency and dispersion in uneven datasets




Turo Hiltunen, Jukka Tyrkkö

Carla Suhr, Terttu Nevalainen, Irma Taavitsainen

2019

From Data to Evidence in English Language Research

Digital Linguistics

282

306

978-90-04-39065-2

DOIhttps://doi.org/10.1163/9789004390652_013

https://doi.org/10.1163/9789004390652_013



Despite its popularity, the status of Wikipedia in higher education
settings remains somewhat controversial, and the linguistic
characteristics of the genre have not been exhaustively described. This
exploratory paper takes a data-driven approach to assessing the use of
academic vocabulary in Wikipedia articles. Our analysis is based on
Coxhead’s Academic Word List, and the data comes from the Westbury Lab Wikipedia Corpus.
We employ methods of statistical data analysis to classify Wikipedia
articles according to the frequencies of academic words, and apply the
same procedure to a comparable set of texts representing another genre,
published research articles. The unsupervised classification procedure
groups the articles according to academic content regardless of topic,
which allows us to measure genre-specific similarities. The findings of
the study show that academic words are common in both genres in focus,
and more interestingly, if we look at aggregate frequencies of academic
words, Wikipedia articles are not markedly different from RAs within the
same discipline. This being said, we can observe disciplinary
differences in the distribution of academic words in Wikipedia, such
that Economics writing contains more academic words than the other two
disciplines in focus. Disciplinary differences can likewise be observed
in the distribution of individual academic words.



Last updated on 2024-26-11 at 19:59