📝 Input
Examples:
Load Example
⚙️ Select Properties
MW, net charge, pI, hydrophobicity (Sequence only)
0 14
*Requires protein sequence input above
Current Best Models Configuration
This table shows the models and thresholds currently being used for predictions:
🔬 Permeability (Penetrance) | xgb_wt_log | 0.2801 | Transformer | 0.4343 | Classifier |
Note: Models marked as SVM, SVR, or ENET are automatically replaced with XGB as these models are not currently supported in the deployment environment.
Select Property
Input Requirements and Constraints
Supported Inputs
- Amino acid sequences: Linear peptides composed of standard 20 amino acids
- SMILES: Chemically modified peptides, including cyclization, D-amino acids, and noncanonical resiudes
Validation
- Invalid sequences or SMILES will be rejected
- Properties not supported are labeled as (Not Supported)
Training Data Collection
| Properties | Amino Acid Sequences | SMILES Sequences | ||
|---|---|---|---|---|
| 0 | 1 | 0 | 1 | |
| Classification | ||||
| Hemolysis | 4765 | 1311 | 4765 | 1311 |
| Non-Fouling | 13580 | 3600 | 13580 | 3600 |
| Solubility | 9668 | 8785 | - | - |
| Permeability (Penetrance) | 1162 | 1162 | - | - |
| Toxicity | - | - | 5518 | 5518 |
| Regression (N) | ||||
| Permeability (PAMPA) | - | 6869 | ||
| Permeability (CACO2) | - | 606 | ||
| Half-Life | 130 | 245 | ||
| Binding Affinity | 1436 | 1597 |
Our models are trained on curated datasets from multiple sources. For detailed cleaning up procedures please refer to our paper.
🩸 Hemolysis Dataset
- Primary Source: the Database of Antimicrobial Activity and Structure of Peptides (DBAASPv3)
- Secondary Source: peptide-dashboard
- Description: Probability of peptide disrupting red blood cell membranes.
- Interpretation 50% of read blood cells being lysed at x ug/ml concetration (HC50). If HC50 < 100uM, considered as hemolytic, otherwise non-hemolytic, resulting in a binary 0/1 dataset. Scores close to 1 indicate a high probability of red blood cell membrane disruption, while scores close to 0 indicate low hemolytic risk. The predicted probability should therefore be interpreted as a risk indicator, not an exact concentration estimate.
💧 Solubility Dataset
- Primary Source: PROSO-II
- Secondary Source: peptideBERT
- Description: Probability of peptide remaining dissolved in aqueous conditions.
- Interpretation: Outputs a probability (0–1) that a peptide remains soluble in aqueous conditions. Higher scores indicate lower aggregation risk and better formulation stability.
👯 Non-Fouling Dataset
- Primary Source: Classifying antimicrobial and multifunctional peptides with Bayesian network models
- Secondary Source: peptideBERT
- Description: A nonfouling peptide resists nonspecific interactions and protein adsorption.
- Interpretation: Outputs the probability (0–1) that a peptide resists nonspecific protein adsorption. Higher scores indicate stronger non-fouling behavior, desirable for circulation and surface-exposed applications.
🪣 Permeability Dataset
- Primary Source: CycPeptMPDB, PAMPA
- Secondary Source: PepLand
- Description: Probability of peptide penetrating the cell membrane.
- Interpretation: For PAMPA and CACO-2 regression, outputs are log-scaled permeability values. Following CycPeptMPDB conventions, log Pexp ≥ −6.0 indicates favorable permeability, while values below −6.0 indicate weak permeability. For penetrance prediction, the probability closer to 1 indicates higher risk of cell penetrance, and vice versa.
⏱️ Half-Life Dataset
- Primary Source: Thpdb2, PepTherDia, peplife
- Interpretation: Predicted values reflect relative peptide stability for the unit in hours. Higher scores indicate longer persistence in serum, while lower scores suggest faster degradation.
☠️ Toxicity Dataset
- Primary Source: ToxinPred3.0
- Interpretation: Outputs a probability (0–1) that a peptide exhibits toxic effects. Higher scores indicate increased toxicity risk.
🔗 Binding Affinity Dataset
- Primary Source: PepLand
- Description: Binding probability normalized in PepLand already. It's a combination of Kd, Ki, IC50.
- Description: The model predicts a continuous binding affinity score, where higher values indicate stronger binding. Scores are comparable across peptides binding to the same protein target.
- Interpretation:
- Scores ≥ 9 correspond to tight binders (K ≤ 10⁻⁹ M, nanomolar to picomolar range)
- Scores between 7 and 9 correspond to medium binders (10⁻⁷–10⁻⁹ M, nanomolar to micromolar range)
- Scores < 7 correspond to weak binders (K ≥ 10⁻⁶ M, micromolar and weaker)
- A difference of 1 unit in score corresponds to an approximately tenfold change in binding affinity.
- Scores ≥ 9 correspond to tight binders (K ≤ 10⁻⁹ M, nanomolar to picomolar range)
Model Architecture
- Sequence Embeddings: ESM-2 650M model / PeptideCLM model. Foundational embeddings are frozen.
- XGBoost Model: Gradient boosting on pooled embedding features for efficient, high-performance prediction.
- CNN/Transformer Model: One-dimensional convolutional/self-attention transformer networks operating on unpooled embeddings to capture local sequence patterns.
- Binding Model: Transformer-based architecture with cross-attention between protein and peptide representations.
- SVR Model: Support Vector Regression applied to pooled embeddings, providing a kernel-based, nonparametric regression baseline that is robust on smaller or noisy datasets.
- Others: SVM and Elastic Nets were trained with RAPID cuML, which requires a CUDA environment and is therefore not supported in the web app. Model checkpoints remain available in the Hugging Face repository.
Model Training and Weight Hosting
- More instructions can be found here at Classifier_weights
🧪 Physicochemical Properties
Net Charge Calculation
- Uses Henderson-Hasselbalch equation
- pH-dependent calculation
- Considers all ionizable groups (K, R, H, D, E, C, Y, termini)
Isoelectric Point (pI)
- Bisection method to find pH where net charge = 0
- Precision: ±0.01 pH units
Hydrophobicity (GRAVY)
- Grand Average of Hydropathy
- Uses Kyte-Doolittle scale
- Range: -4.5 (hydrophilic) to +4.5 (hydrophobic)
Citation
If you use this tool, please cite:
place holder
Contact
For questions or collaborations: yzhang@u.duke.nus.edu or pranam@seas.upenn.edu
📊 Results
PeptiVerse - A Unified Platform for peptide therapeutic property prediction.
Please cite our work if you use this tool in your research.