Research Article
Simon Thornley, Roger J Marsha
Abstract
By testing for conditional dependence, algorithms can generate directed acyclic graphs (DAGs), which may help inform variable selection when building models for statistical risk prediction or for assessing causal influence. Here, we demonstrate how the method may help us understand the relationship between variables commonly used to predict cardiovascular disease (CVD) risk. The sample included people who were aged 30 to 80 years old, free of CVD, who had a CVD risk assessment in primary care and had at least 2 years of follow-up. The endpoints were combined CVD events, and the other variables were age, sex, diabetes, smoking, ethnic group, preventive drug use (statins or antihypertensive), blood pressure, family history and cholesterol ratio. We used the ‘grow shrink’ algorithm, in the bnlearn library of R software to generate a DAG. A total of 6256 individuals were included, and 101 CVD events occurred during follow-up. The accepted causal associations between tobacco smoking and age and CVD were identified in the DAG. Ethnic group also influenced risk of CVD events, but it did so indirectly mediated through the effect of smoking. Drug treatment at baseline was influenced by a wide range of other variables, such as family history of CVD, age and diabetes status, but drug treatment did not have a ‘causal’ association with CVD events. Algorithms which generate DAGs are a useful adjunct to traditional statistical methods when deciding on the structure of a regression model to test causal hypotheses.