Protein Encoding: A Matlab toolbox of representing or encoding protein sequences as numerical vectors for bioinformatics

Wen Zhang* and Meng Ke

Abstract

Recently, machine learning methods are successfully applied to the problems in the bioinformatics, such as the protein function and structure prediction. The popular machine learning methods, for example support vector machine, decision tree and etc., usually require the numerical vectors as inputs. The representation of protein sequences as numerical vectors is well known as ‘encoding’. In this paper, we develop a Matlab toolbox 'Protein Encoding', which help to represent or encode protein sequences as numerical vectors for bioinformatics. This Matlab toolbox provides a user-friendly interface. More importantly, we also provide the Matlab APIs, and the researchers can easily call these APIs for their own programs. This toolbox is available at: http://proteinencoding.sourceforge.net/

Relevant Publications in Journal of Chemical and Pharmaceutical Research