D4 Published development or research report or study

Typologies in Sequence Analysis: Practical Guidelines for Identifying Robust Cluster Solutions




AuthorsAndrade, Stefan B.; Fasang, Anette Eva; Helske, Satu; Karhula, Aleksi

PublisherCenter for Open Science

Publication year2023

DOIhttps://doi.org/10.31235/osf.io/kj8d5

Web address http://doi.org/10.31235/osf.io/kj8d5

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/459134509


Abstract

Sequence analysis in the social sciences heavily relies on cluster techniques to identify typologies. Clustering techniques and statistical cluster cut-off criteria for selecting the optimal number of clusters have greatly improved. In contrast, we lack a systematic assessment of how data features, such as the sequence sample size, the number of time points in the sequences, and the number of distinct states in the sequence alphabet might systematically impact the identification of sequence typologies. Drawing on both simulated data from mixture Markov models and real data from the German Family Panel survey, we provide best-practice guidelines for applied researchers to gauge whether their data is sufficient for extracting robust sequence typologies, if they empirically exist. Sequence typologies are most robust for samples with at least 500 sequences, sequence lengths greater than 10 time points, and state alphabets that have at least as many states as the “true” number of clusters.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Funding information in the publication
We gratefully acknowledge funding from the project EQUALLIVES, which is financially supported by the NORFACE Joint Research Programme on Dynamics of Inequality Across the Life-course, which is co-funded by the European Commission through Horizon 2020 under grant agreement No 724363. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 724363.


Last updated on 2025-27-01 at 19:37