A1 Refereed original research article in a scientific journal
Brazilian Portuguese-Russian (BraPoRus) corpus: automatic transcription and acoustic quality of elderly speech during the COVID-19 pandemic
Authors: Sekerina A. Irina , Henriques Smirnova Anna , Skorobogatova S. Aleksandra , Tyulina Natalia , Kachkovskaia V. Tatiana , Ruseishvili Svetlana , Madureira Sandra
Publication year: 2024
Journal: Linguistics Vanguard
DOI: https://doi.org/10.1515/lingvan-2021-0149
Publication's open availability at the time of reporting: No Open Access
Publication channel's open availability : Partially Open Access publication channel
Web address : https://doi.org/10.1515/lingvan-2021-0149
This article presents the Brazilian Portuguese-Russian (BraPoRus) corpus, whose goal is to collect, analyze, and preserve for posterity the spoken heritage Russian still used today in Brazil by approximately 1,500 elderly bilingual heritage Russian–Brazilian Portuguese speakers. Their unique 100-year-old variety of moribund Russian is disappearing because it has not been passed to their descendants born in Brazil. During the COVID-19 pandemic, we remotely collected 170 h of speech samples in heritage Russian from 26 participants (M age = 75.7 years) in naturalistic settings using Zoom or a phone call. To estimate the quality of collected data, we focus on two methodological challenges, automatic transcription and acoustic quality of remote recordings. First, we find that among commercially available transcription programs, Sonix far outperforms Google Transcribe and Vocalmatic on the measure of word error rate (WER). Second, we also establish that the acoustic quality of the remote recordings was adequate for intonational and speech rate analysis. Moreover, this remote method of collecting and analyzing speech samples works successfully with elderly bilingual participants who speak a heritage language different from their dominant societal language, and it can become a new norm when face-to-face communication with elderly participants is not possible.
Keywords: acoustic quality; aging; automatic transcription; corpus; moribund heritage Russian; remote data collection; word error rate (WER)