Refereed article in compilation book (A3)
Academic vocabulary in Wikipedia articles: Frequency and dispersion in uneven datasets
List of Authors: Turo Hiltunen, Jukka Tyrkkö
Editors: Carla Suhr, Terttu Nevalainen, Irma Taavitsainen
Publication year: 2019
Book title *: From Data to Evidence in English Language Research
Title of series: Digital Linguistics
Start page: 282
End page: 306
ISBN: 978-90-04-39065-2
DOI: http://dx.doi.org/10.1163/9789004390652_013
URL: https://doi.org/10.1163/9789004390652_013
Despite its popularity, the status of Wikipedia in higher education
settings remains somewhat controversial, and the linguistic
characteristics of the genre have not been exhaustively described. This
exploratory paper takes a data-driven approach to assessing the use of
academic vocabulary in Wikipedia articles. Our analysis is based on
Coxhead’s Academic Word List, and the data comes from the Westbury Lab Wikipedia Corpus.
We employ methods of statistical data analysis to classify Wikipedia
articles according to the frequencies of academic words, and apply the
same procedure to a comparable set of texts representing another genre,
published research articles. The unsupervised classification procedure
groups the articles according to academic content regardless of topic,
which allows us to measure genre-specific similarities. The findings of
the study show that academic words are common in both genres in focus,
and more interestingly, if we look at aggregate frequencies of academic
words, Wikipedia articles are not markedly different from RAs within the
same discipline. This being said, we can observe disciplinary
differences in the distribution of academic words in Wikipedia, such
that Economics writing contains more academic words than the other two
disciplines in focus. Disciplinary differences can likewise be observed
in the distribution of individual academic words.