Research Article
Se-Eun Bae, Sunghoon Jung,
Abstract
The classification of the structures of proteins provides preliminary information for the further detailed theoretical analyses. Classified information of protein folds might be utilized for the structural alignment while fold class prediction might help ab inito prediction of protein structures. Here, prediction of structural fold class of proteins with torsion angle based secondary structure profile library and multi-class linear discriminant analysis was performed. All-versus-all method was utilized to circumvent the problem of data imbalance of one-versus-others approach. From nonredundant structure files, a tripeptide secondary structure profile library was constructed and used to calculate the probable secondary structure content of protein folds. The mean and covariance matrices of the reference classes of the training set were derived using this library. Based on this information, fold classes of test set proteins were predicted using multi-class linear discriminant analysis. The result was highly accurate according to the low error rates. This highly accurate fold class prediction might be further utilized in the application of secondary structure predictions exploiting the benefits of larger scrutinizing windows. Appropriateness of the torsion angle representation in local structure analysis has also been partly proved.