A web server of reduced amino acid alphabet for sequence-dependent inference
Users can ambiguously search for data by type ID, method, cluster, reference and year entries in the browser tool. The filter can greatly help users obtain entries which meet their needs among the massive information.
(1) Select more entries from the popup menu.
(2) Filter all column results by inputbox from database.
(3) Sort the column names in order.
(4) Filters single column results by inputbox from database, e.g. by method (BLOSUM62), by year (2018).
(5) Click the "Number" button to visit the INFORMATION page.
The type id, name, description and other are showed in the information page. It is important that the amino acid reduction clusters for each method are visualized by clustering with different colors. Users can easily analyze the selected RAAC by clicking "Analysis" button to enter the ANALYSIS.
(1) The information includes selected type, name, description and reduced amino acids cluster.
(2) The reduced amino acids cluster (RAAC) are shown as a bar with amino acid residues of different colors.
(3) Click the button to visit the analysis interface with selected RAAC types.
The online server was developed to reduce primary sequence of protein. Three correlation parameters (k-tuple, gap, λ-correlation) are used to define k-tuple reduced amino acid composition. Users can analyze the primary protein sequences and get analysis results in the following three steps:
This parameter is selected in a two-dimensional selection box with the alphabet types and amino acid cluster sizes, which are recorded in the PseRAACBook database.
(1) In the "Type" column, each number represents an amino acid alphabet reduction type. If you hover mouse over the type, will get a brief description of type.However, if you click on the type, you will be taken to the information page.
(2) In the "Cluster size" column, the row shows all clustering information, and the number indicates cluster size.
(3) Click the gray button to select the cluster and the button will change to blue.
(4)
Click the "Selected All" button, at the bottom of the box two-dimensional selection box, to select all the RAAC types.
The K-tuple, g-gap and λ-correlation are the three main parameters to generate the reduced amino acid compositions of protein primary sequence. The detail explanation is as follows:
The value of k-tuple in K-tuple is identical to what other researchers have defined. The K-tuple value represents the number of peptide. For example, K=1 means a monopeptide or amino acid, K=2 represents a dipeptide, K=3 represents a tripeptide, and so on. In a typical K-tuple analysis, one usually slides the window of width K amino acids along the protein by one residue at a time. That is, with K = 1, for a protein with N amino acids represented by R1, R2, . . . , RN, the frequency composition of every natural or reduced amino acids will be calculated. With K = 2, it appears we wish to count the dipeptide frequency along the protein by one residue at a time. For a protein with N amino acids, with k=2, there are totally N-1 dipeptides in the whole protein chain as follows, R1R2, R2R3, R3R4, . . . etc.
The value of g-gap represents the inter-gap number between two nearest amino acid or K-tuple peptides along the protein. For a protein with N amino acids represented by R1, R2, . . . , RN, as the reviewer understands, introducing g 1 will reduce the number of counted amino acid or K-tuple peptides by a factor of (g + 1). With k=1, g = 3, indeed, there is only 1/4 of the amino acids counted along the whole protein sequence as follows, R1, R5, R9, ... etc.. With k=2, g=1, our aim is to count the dipeptide frequency along the protein by skipping one residue in every slide as follows, R1R2, R3R4, R5R6, ... etc. In short, the g-gap represents the number of skipping residue in each slide along the protein.
The value of λ-correlation represents the gap number of every two nearest amino acid in the K-tuple peptide interval. Obviously, k>=2 is necessary for making λ meaningful. For example, for a protein with N amino acids represented by R1, R2, . . . , RN, with k=2, λ=1, our aim is to count the dipeptide frequency from the combinations of R1R3, R2R4, R3R5, . . . etc. Then when K =3 and λ = 1, our aim is to count the tripeptide frequency from the combinations of R1R3R5, R2R4R6, . . . etc. In short, the λ-correlation represents the number of skipping residue between every two nearest amino acid within the K-tuple peptide interval.
Taking K =3, λ=1, g=2 as an example, the intra-gap number of within tripeptide interval is 1, and the number of skipping residue of tripeptide in each slide is 2. In the calculation process, the combination is R1R3R5, R4R6R8, R7R9R11, ...etc.
Finally, if everything is ready, users can click the "Analysis" button to get the PseRAACBook report.
Each RAAC types provide corresponding fasta, csv and libsvm vector files for download.All these files are compressed into a zip file.
(1) Click the types to visit INFORMATION.
(2) The reduced amino acids cluster (RAAC) are shown as a bar with amino acid residues of different colors.
(3) Download three vetor files (fasta, csv, svm) from bule button for future research.
(4) Visualize the data according to the primary sequence, feature, and conservative regions of the protein.
(5) Download all results in a compressed file (ZIP format).
The machine learning server supports the model training of protein classification based on Support Vector Machine, K-Nearest Neighbor, Random Forest.
(1)Upload the fasta file as datasets.In order to avoid reducing the processing efficiency of the web server for other users, we limit the uploading of files up to 200 KB and fasta as the file extension.
(2) Click "Example" button to select example data.
(3)The k-tuple and types are used to generate k-tuple reduced amino acid composition features.
Machine learning algorithms are uesed to train the classifier models
(4)Click "Reset" button to reset all settings.
(5)Click "Submit" button to start the training program for the classifier.
Currently, RAAC has been used in many cases, such as iDPF-PseRAAAC , iHSP-PseRAAAC, Antimicrobial Peptide Scanner, Bastion6, iDNA-Prot|dis etc. A recent example of a collaborative focus within the PseRAACBook is the identification of secretory protein of Malaria Parasite using RAAC, implemented as the ISP-RAAAC. This is an online implementation of SVM method, which makes the classification based on reduced amino acid alphabet from PseRAACBook.This classifier can use protein sequences as input, and provides scores that allow users to identify a secreted protein.