Research Article
Mikhail Pyatnitskiy, Maria
Abstract
Mass spectral profiling of serum or plasma is one of the tools widely used to make experimental diagnostic systems for different cancer types. In this approach, a set of discriminatory peaks serves as a multiplex cancer biomarker. Hence, adequate selection of peaks is a crucial stage in the development of diagnostic rule. In the present paper we propose using sequential filter and wrapper feature selection in a complete cross-validation scheme with feature selection performed at each run of crossvalidation separately. Filter feature selection is represented by hierarchical cluster analysis; recursive feature elimination coupled with support vector machine is utilized as a wrapper feature selection method. The method performance is demonstrated on previously obtained dataset with ovarian cancer and non-cancer sera. Application of our approach led to a slight but statistically significant increase in accuracy. Peak clustering favoured more stable results of feature selection and provided a biological meaning to selected m/z values. We recommend clustering of peaks as a filter dimensionality reduction for further use in mass spectral studies.