Research Article
Li Hua Yue, Wenqing He, Duncan
Abstract
Variable selection is a difficult problem in building statistical models. Identification of cost efficient diagnostic factors is very important to health researchers, but most variable selection methods do not take into account the cost of collecting data for the predictors. The trade-off between statistical significance and cost of collecting data for a statistical model is our focus. In this paper, we extend the LARS variable selection method to incorporate costs of factors in variable selection, which also works with other methods of variable selection, such as Lasso and adaptive Lasso. A branch and bound search method combined with LARS is employed to select cost-efficient factors. We apply the resulting branching LARS method to a dataset from an Assertive Community Treatment project conducted in Southwestern Ontario to demonstrate the cost-efficient variable selection process, and the results show that a “cheaper” model could be selected by sacrificing a user selected amount of model accuracy.