RAACBook

A web server of reduced amino acid alphabet for sequence-dependent inference

Maching Learning

The machine learning tab is used for the protein classification modeling based on reduced amino acid sequence features.

The server supports the discrimination of two-category data, such as different subcellular locations, protein family and subfamily, and other functional classification.

Step1. Enter Query Dataset:
Users can upload two types of valid benchmark datasets to obtain machine learning models that can predict the categories of the new sequences.The positive dataset is secretory proteins and negative dataset is non-secretory proteins. There are the example of positive dataset and negative dataset in secretory and non-secretory proteins, respectively.

Step2. Parameters Selection:
The K-tuple and reduced amino acid type are the parameters for extracting sequence features, which are different amino acid composition frequencies. The features will automatically input into the one of selected decision making algorithms, including K-Nearest Neighbors (KNN), Random Forest (RF) and Support Vector Machine (SVM). The leave-one-out cross validation is the default evaluation test.