File size: 2,428 Bytes
8ef403e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# AbMelt Trained Models

This directory contains the trained machine learning models for predicting antibody thermostability properties.

## Model Files

### T<sub>agg</sub> (Aggregation Temperature)
- **Model:** `tagg/efs_best_knn.pkl`
- **Algorithm:** K-Nearest Neighbors (KNN)
- **Features:** `tagg/rf_efs.csv`
  - `rmsf_cdrs_mu_400`
  - `gyr_cdrs_Rg_std_400`
  - `all-temp_lamda_b=25_eq=20`

### T<sub>m</sub> (Melting Temperature)
- **Model:** `tm/efs_best_randomforest.pkl`
- **Algorithm:** Random Forest
- **Features:** `tm/rf_efs.csv`
  - `gyr_cdrs_Rg_std_350`
  - `bonds_contacts_std_350`
  - `rmsf_cdrl1_std_350`

### T<sub>m,onset</sub> (Onset Melting Temperature)
- **Model:** `tmon/efs_best_elasticnet.pkl`
- **Algorithm:** ElasticNet
- **Features:** `tmon/rf_efs.csv`
  - `bonds_contacts_std_350`
  - `all-temp-sasa_core_mean_k=20_eq=20`
  - `all-temp-sasa_core_std_k=20_eq=20`
  - `r-lamda_b=2.5_eq=20`

## Model Training Details

These models were trained using:
- **Feature Selection:** Exhaustive Feature Selection (EFS) with Random Forest
- **Optimization:** Bayesian optimization (skopt) for hyperparameter tuning
- **Cross-validation:** Repeated K-Fold cross-validation
- **Training Data:** Internal antibody thermostability dataset

## Usage

Models are automatically loaded by the `AbMeltPredictor` class in `src/model_inference.py`:

```python

from src.model_inference import AbMeltPredictor



predictor = AbMeltPredictor()

predictions = predictor.predict_all(descriptors_df)

```

## Source

These models and feature definitions were copied from the original AbMelt paper implementation:
- Original location: `../AbMelt/models/`
- Paper: Rollins, Z.A., Widatalla, T., Cheng, A.C., & Metwally, E. (2024). AbMelt: Learning antibody thermostability from molecular dynamics.

## File Sizes

- `tagg/efs_best_knn.pkl`: ~1-5 KB (KNN is lightweight)
- `tm/efs_best_randomforest.pkl`: ~100-500 KB (depends on tree depth/count)
- `tmon/efs_best_elasticnet.pkl`: ~1-5 KB (linear model is lightweight)

## Notes

- Models expect features in the exact order specified in the `rf_efs.csv` files
- Feature values should be normalized/scaled consistently with training data
- The `rf_efs.csv` files include both feature columns and target columns (tagg, tm, tmonset)
- These are the best models selected from exhaustive feature selection experiments