Robust data-driven identification of risk factors and their interactions: A simulation and a study of parental and demographic risk factors for schizophrenia
: Gyllenberg D, McKeague IW, Sourander A, Brown AS
Publisher: WILEY
: 2020
: International Journal of Methods in Psychiatric Research
: INTERNATIONAL JOURNAL OF METHODS IN PSYCHIATRIC RESEARCH
: INT J METH PSYCH RES
: ARTN e1834
: 29
: 4
: 1
: 11
: 11
: 1049-8931
DOI: https://doi.org/10.1002/mpr.1834
: https://doi.org/10.1002/mpr.1834
: https://research.utu.fi/converis/portal/detail/Publication/48528693
Objectives Few interactions between risk factors for schizophrenia have been replicated, but fitting all such interactions is difficult due to high-dimensionality. Our aims are to examine significant main and interaction effects for schizophrenia and the performance of our approach using simulated data.Methods We apply the machine learning technique elastic net to a high-dimensional logistic regression model to produce a sparse set of predictors, and then assess the significance of odds ratios (OR) with Bonferroni-corrected p-values and confidence intervals (CI). We introduce a simulation model that resembles a Finnish nested case-control study of schizophrenia which uses national registers to identify cases (n = 1,468) and controls (n = 2,975). The predictors include nine sociodemographic factors and all interactions (31 predictors).Results In the simulation, interactions with OR = 3 and prevalence = 4% were identified with <5% false positive rate and >= 80% power. None of the studied interactions were significantly associated with schizophrenia, but main effects of parental psychosis (OR = 5.2, CI 2.9-9.7; p < .001), urbanicity (1.3, 1.1-1.7; p = .001), and paternal age >= 35 (1.3, 1.004-1.6; p = .04) were significant.Conclusions We have provided an analytic pipeline for data-driven identification of main and interaction effects in case-control data. We identified highly replicated main effects for schizophrenia, but no interactions.