Refereed article in compilation book (A3)

Academic vocabulary in Wikipedia articles: Frequency and dispersion in uneven datasets




List of AuthorsTuro Hiltunen, Jukka Tyrkkö

EditorsCarla Suhr, Terttu Nevalainen, Irma Taavitsainen

Publication year2019

Book title *From Data to Evidence in English Language Research

Title of seriesDigital Linguistics

Start page282

End page306

ISBN978-90-04-39065-2

DOIhttp://dx.doi.org/10.1163/9789004390652_013

URLhttps://doi.org/10.1163/9789004390652_013


Abstract

Despite its popularity, the status of Wikipedia in higher education
settings remains somewhat controversial, and the linguistic
characteristics of the genre have not been exhaustively described. This
exploratory paper takes a data-driven approach to assessing the use of
academic vocabulary in Wikipedia articles. Our analysis is based on
Coxhead’s Academic Word List, and the data comes from the Westbury Lab Wikipedia Corpus.
We employ methods of statistical data analysis to classify Wikipedia
articles according to the frequencies of academic words, and apply the
same procedure to a comparable set of texts representing another genre,
published research articles. The unsupervised classification procedure
groups the articles according to academic content regardless of topic,
which allows us to measure genre-specific similarities. The findings of
the study show that academic words are common in both genres in focus,
and more interestingly, if we look at aggregate frequencies of academic
words, Wikipedia articles are not markedly different from RAs within the
same discipline. This being said, we can observe disciplinary
differences in the distribution of academic words in Wikipedia, such
that Economics writing contains more academic words than the other two
disciplines in focus. Disciplinary differences can likewise be observed
in the distribution of individual academic words.


Last updated on 2021-24-06 at 09:47