A web server of reduced amino acid alphabet for sequence-dependent inference


Recently, the success of AlphaFold on creating 3D protein models demonstrates sequence-dependent inference plays a crucial role in computational proteomics. With the emergence of big data in the proteomics era, data relating to protein sequences is growing at a very rapid pace when compared to data on experimentally verified in vivo biochemical functional annotations. Comparing to nucleic acid sequence, the protein exhibits higher complexity and diversity. From a combinatorial standpoint, there is an almost endless variety of sequences that can be made from a 20-letter code, e.g. For a given sequence is consist of only 100 peptides will give rise to a total of 20100 possible sequence-order combinations. Therefore, it is imperative to compress the natural amino acid alphabet for minimizing protein complexity, which motivated us to develop RAACBook (http://bioinfor.imu.edu.cn/raacbook). This online repository establishes an integrated reduced amino acid clusters (RAAC) database and web servers, providing world-wide public and free data analysis services.

Class 1 Class 2 Class 3
Hydrophobicity P | polar
N | neutral
H | hydropho-bicity
Polarity S | 4.9~6.2
M | 8.0~9.2
L | 10.4~13.0
Charge + | positive
= | neutral
- | negative
Secondary structure α | helix
β | strand
C | coil

RAACBook incorporates the following three online services:
(i) The RAACBook repository offers 74 types of reduced amino acid alphabet, which can generate 673 reduced amino acid descriptors. A multi-layer browser tool is introduced for users to easily navigate and filter the details of the appropriate reduction descriptors;
(ii) An online analysis server is developed to achieve the visualized alignment of primary sequences.The analysis reports display visualized alignment of sequences, merge and distribution of reduced amino acids composition. Three correlation parameters (k-tuple, gap, λ-correlation) are used to define k-tuple reduced amino acid composition. The report page provids fasta, csv, svm vector files for further research;
(iii) The machine learning server supports the model training of protein classification based on k-tuple reduced amino acid compositions. The Support Vector Machine (SVM), K-Neaintrigrest Neighbor (KNN) and Random Forest (RF) algorithm are introduced to online operate the classifier. In addition, several applications of published webserver are also given in this online repository.

Cite and Contact :

Lei Zheng, Shenghui Huang, Nengjiang, Mu, Haoyue, Zhang, Jiayu Zhang, Yu Chang, Lei Yang, Yongchun Zuo*, RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using the Chou's 5-steps rule, Database, 2019, DOI: 10.1093/database/baz131.

Email: yczuo@imu.edu.cn