Research Article
Qingzhao Yu, Han Zhu1 and Xia
Abstract
Accurate cancer stage at diagnosis is essential not only for assessing quality of care and associated prognosis but also for monitoring trends in cancer stages and for assessing effectiveness of early detection interventions. Because the cancer stage is associated with many factors that are not under control of cancer registries, it is infeasible to completely record stages in all cases from registry database. It is necessary to reduce the bias in stage analysis induced by unknown stage cases through statistical adjustment. In this paper, we propose a new adaptive robust method that estimates the distribution of unknown stage cases using both essential and nonessential predictors of cancer stage. Multiple additive regression trees were used to assess the association of explanatory variables (including patient demographics, tumor characteristics, and treatment) with unknown stage. The 2004-2009 incidence data on invasive lung cancer from 38 population-based cancer registries that met NAACCR’s high data quality criteria were used to estimate the population stage distribution of lung cancer over the years. Multiple artificial incomplete datasets with unknown stages and predictors were created from the complete datasets, with varying missing data mechanism and different proportions of missingness. The simulated datasets were used to test the efficiency of the proposed method in estimating population stage distribution. In general, the proposed method is more efficient in terms of estimation accuracy and time consumption, compared with the traditional methods such as multiple imputation method and weighting method.