D4 Published development or research report or study
Typologies in Sequence Analysis: Practical Guidelines for Identifying Robust Cluster Solutions
Authors: Andrade, Stefan B.; Fasang, Anette Eva; Helske, Satu; Karhula, Aleksi
Publisher: Center for Open Science
Publication year: 2023
DOI: https://doi.org/10.31235/osf.io/kj8d5
Web address : http://doi.org/10.31235/osf.io/kj8d5
Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/459134509
Sequence analysis in the social sciences heavily relies on cluster techniques to identify typologies. Clustering techniques and statistical cluster cut-off criteria for selecting the optimal number of clusters have greatly improved. In contrast, we lack a systematic assessment of how data features, such as the sequence sample size, the number of time points in the sequences, and the number of distinct states in the sequence alphabet might systematically impact the identification of sequence typologies. Drawing on both simulated data from mixture Markov models and real data from the German Family Panel survey, we provide best-practice guidelines for applied researchers to gauge whether their data is sufficient for extracting robust sequence typologies, if they empirically exist. Sequence typologies are most robust for samples with at least 500 sequences, sequence lengths greater than 10 time points, and state alphabets that have at least as many states as the “true” number of clusters.
Downloadable publication This is an electronic reprint of the original article. |
Funding information in the publication:
We gratefully acknowledge funding from the project EQUALLIVES, which is financially supported by the NORFACE Joint Research Programme on Dynamics of Inequality Across the Life-course, which is co-funded by the European Commission through Horizon 2020 under grant agreement No 724363. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 724363.