Tutorial: using ModelManager¶
[ ]:
from peptdeep.pretrained_models import ModelManager
ModelManager is the main entry to access MS2/RT/CCS models.
[ ]:
model_mgr = ModelManager(mask_modloss=True, device='cpu')
Most of the default parameters and attributes of ModelManager class are controlled by peptdeep.settings.global_settings which is a dict.
from peptdeep.settings import global_settings
The default values of peptdeep.settings.global_settings is defined in default_settings.yaml.
ModelManager.load_installed_models¶
ModelManager.load_installed_models(model_type) enables users to load different model types. The model_type could be:
generic: generic RT/CCS/MS2 models including HLA
HLA: currently the same as
genericphos: RT/CCS/MS2 models for Phospho@S/T/Y
digly: RT/CCS/MS2 models for GlyGly@K
Calling ModelManager(...) will also call ModelManager.load_installed_models implicitly, and the default model_type is global_settings['model_mgr']['model_type'].
Test the RT model¶
Use the 11 iRT peptides to test the RT model
[ ]:
from peptdeep.model.rt import IRT_PEPTIDE_DF
[ ]:
df = IRT_PEPTIDE_DF.copy()
# randomly add some modifications, this may change the real irt
df.loc[1,'mods'] = 'Phospho@S'
df.loc[1,'mod_sites'] = '5'
df
| sequence | pep_name | irt | mods | mod_sites | nAA | |
|---|---|---|---|---|---|---|
| 0 | LGGNEQVTR | RT-pep a | -24.92 | 9 | ||
| 1 | GAGSSEPVTGLDAK | RT-pep b | 0.00 | Phospho@S | 5 | 14 |
| 2 | VEATFGVDESNAK | RT-pep c | 12.39 | 13 | ||
| 3 | YILAGVENSK | RT-pep d | 19.79 | 10 | ||
| 4 | TPVISGGPYEYR | RT-pep e | 28.71 | 12 | ||
| 5 | TPVITGAPYEYR | RT-pep f | 33.38 | 12 | ||
| 6 | DGLDAASYYAPVR | RT-pep g | 42.26 | 13 | ||
| 7 | ADVTPADFSEWSK | RT-pep h | 54.62 | 13 | ||
| 8 | GTFIIDPGGVIR | RT-pep i | 70.52 | 12 | ||
| 9 | GTFIIDPAAVIR | RT-pep k | 87.23 | 12 | ||
| 10 | LFLQFGAQGSPFLK | RT-pep l | 100.00 | 14 |
[ ]:
model_mgr.load_installed_models('phos')
model_mgr.predict_rt(df)
model_mgr.rt_model.add_irt_column_to_precursor_df(df)
2022-09-09 21:54:02> Predicting RT ...
100%|██████████| 5/5 [00:00<00:00, 125.27it/s]
| sequence | pep_name | irt | mods | mod_sites | nAA | rt_pred | rt_norm_pred | irt_pred | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | LGGNEQVTR | RT-pep a | -24.92 | 9 | 0.184235 | 0.184235 | -26.123537 | ||
| 1 | GAGSSEPVTGLDAK | RT-pep b | 0.00 | Phospho@S | 5 | 14 | 0.266746 | 0.266746 | 11.916059 |
| 2 | VEATFGVDESNAK | RT-pep c | 12.39 | 13 | 0.266133 | 0.266133 | 11.633120 | ||
| 3 | YILAGVENSK | RT-pep d | 19.79 | 10 | 0.290495 | 0.290495 | 22.864811 | ||
| 4 | TPVISGGPYEYR | RT-pep e | 28.71 | 12 | 0.303847 | 0.303847 | 29.020259 | ||
| 5 | TPVITGAPYEYR | RT-pep f | 33.38 | 12 | 0.316514 | 0.316514 | 34.860122 | ||
| 6 | DGLDAASYYAPVR | RT-pep g | 42.26 | 13 | 0.324423 | 0.324423 | 38.506308 | ||
| 7 | ADVTPADFSEWSK | RT-pep h | 54.62 | 13 | 0.345197 | 0.345197 | 48.083890 | ||
| 8 | GTFIIDPGGVIR | RT-pep i | 70.52 | 12 | 0.394248 | 0.394248 | 70.697474 | ||
| 9 | GTFIIDPAAVIR | RT-pep k | 87.23 | 12 | 0.434775 | 0.434775 | 89.381150 | ||
| 10 | LFLQFGAQGSPFLK | RT-pep l | 100.00 | 14 | 0.459583 | 0.459583 | 100.818303 |
Training RT model on df with the rt_norm column:
[ ]:
def normalize_irt(df):
min_rt = df.irt.min()
df['rt_norm'] = (
df.irt - min_rt
) / (df.irt.max()-min_rt)
normalize_irt(df)
model_mgr.epoch_to_train_rt_ccs=50
model_mgr.train_rt_model(df)
model_mgr.predict_rt(df)
model_mgr.rt_model.add_irt_column_to_precursor_df(df)
2022-09-09 21:54:02> 11 PSMs for RT training/fine-tuning
2022-09-09 21:54:09> Predicting RT ...
100%|██████████| 5/5 [00:00<00:00, 151.56it/s]
| sequence | pep_name | irt | mods | mod_sites | nAA | rt_pred | rt_norm_pred | irt_pred | rt_norm | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | LGGNEQVTR | RT-pep a | -24.92 | 9 | 0.127189 | 0.127189 | -18.916407 | 0.000000 | ||
| 1 | GAGSSEPVTGLDAK | RT-pep b | 0.00 | Phospho@S | 5 | 14 | 0.199919 | 0.199919 | -5.504272 | 0.199488 |
| 2 | VEATFGVDESNAK | RT-pep c | 12.39 | 13 | 0.295237 | 0.295237 | 12.073141 | 0.298671 | ||
| 3 | YILAGVENSK | RT-pep d | 19.79 | 10 | 0.357351 | 0.357351 | 23.527389 | 0.357909 | ||
| 4 | TPVISGGPYEYR | RT-pep e | 28.71 | 12 | 0.429762 | 0.429762 | 36.880596 | 0.429315 | ||
| 5 | TPVITGAPYEYR | RT-pep f | 33.38 | 12 | 0.392419 | 0.392419 | 29.994243 | 0.466699 | ||
| 6 | DGLDAASYYAPVR | RT-pep g | 42.26 | 13 | 0.387393 | 0.387393 | 29.067502 | 0.537784 | ||
| 7 | ADVTPADFSEWSK | RT-pep h | 54.62 | 13 | 0.634485 | 0.634485 | 74.633402 | 0.636728 | ||
| 8 | GTFIIDPGGVIR | RT-pep i | 70.52 | 12 | 0.671310 | 0.671310 | 81.424123 | 0.764009 | ||
| 9 | GTFIIDPAAVIR | RT-pep k | 87.23 | 12 | 0.699334 | 0.699334 | 86.592033 | 0.897775 | ||
| 10 | LFLQFGAQGSPFLK | RT-pep l | 100.00 | 14 | 0.607442 | 0.607442 | 69.646337 | 1.000000 |
Test the CCS model¶
[ ]:
df['charge'] = 2
model_mgr.predict_mobility(df)
2022-09-09 21:54:09> Predicting mobility ...
100%|██████████| 5/5 [00:00<00:00, 117.53it/s]
| sequence | pep_name | irt | mods | mod_sites | nAA | rt_pred | rt_norm_pred | irt_pred | rt_norm | charge | ccs_pred | precursor_mz | mobility_pred | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | LGGNEQVTR | RT-pep a | -24.92 | 9 | 0.127189 | 0.127189 | -18.916407 | 0.000000 | 2 | 331.279816 | 487.256705 | 0.815533 | ||
| 1 | GAGSSEPVTGLDAK | RT-pep b | 0.00 | Phospho@S | 5 | 14 | 0.199919 | 0.199919 | -5.504272 | 0.199488 | 2 | 381.067841 | 684.805772 | 0.941902 |
| 2 | VEATFGVDESNAK | RT-pep c | 12.39 | 13 | 0.295237 | 0.295237 | 12.073141 | 0.298671 | 2 | 394.208893 | 683.827889 | 0.974369 | ||
| 3 | YILAGVENSK | RT-pep d | 19.79 | 10 | 0.357351 | 0.357351 | 23.527389 | 0.357909 | 2 | 364.828003 | 547.298039 | 0.899500 | ||
| 4 | TPVISGGPYEYR | RT-pep e | 28.71 | 12 | 0.429762 | 0.429762 | 36.880596 | 0.429315 | 2 | 394.317596 | 669.838059 | 0.974434 | ||
| 5 | TPVITGAPYEYR | RT-pep f | 33.38 | 12 | 0.392419 | 0.392419 | 29.994243 | 0.466699 | 2 | 399.848633 | 683.853709 | 0.988309 | ||
| 6 | DGLDAASYYAPVR | RT-pep g | 42.26 | 13 | 0.387393 | 0.387393 | 29.067502 | 0.537784 | 2 | 399.736542 | 699.338423 | 0.988252 | ||
| 7 | ADVTPADFSEWSK | RT-pep h | 54.62 | 13 | 0.634485 | 0.634485 | 74.633402 | 0.636728 | 2 | 405.532562 | 726.835714 | 1.002953 | ||
| 8 | GTFIIDPGGVIR | RT-pep i | 70.52 | 12 | 0.671310 | 0.671310 | 81.424123 | 0.764009 | 2 | 379.443451 | 622.853512 | 0.936954 | ||
| 9 | GTFIIDPAAVIR | RT-pep k | 87.23 | 12 | 0.699334 | 0.699334 | 86.592033 | 0.897775 | 2 | 387.886780 | 636.869163 | 0.958034 | ||
| 10 | LFLQFGAQGSPFLK | RT-pep l | 100.00 | 14 | 0.607442 | 0.607442 | 69.646337 | 1.000000 | 2 | 435.544861 | 776.929751 | 1.077836 |
Test the MS2 model¶
[ ]:
df['charge'] = 2
inten_df = model_mgr.predict_ms2(df)
inten_df
2022-09-09 21:54:10> Predicting MS2 ...
100%|██████████| 5/5 [00:00<00:00, 82.83it/s]
| b_z1 | b_z2 | y_z1 | y_z2 | b_modloss_z1 | b_modloss_z2 | y_modloss_z1 | y_modloss_z2 | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0.000000 | 0.0 | 1.000000 | 0.021727 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1 | 0.191613 | 0.0 | 0.343992 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2 | 0.063825 | 0.0 | 0.119938 | 0.015200 | 0.0 | 0.0 | 0.0 | 0.0 |
| 3 | 0.033420 | 0.0 | 0.257022 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 4 | 0.027311 | 0.0 | 0.340053 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 118 | 0.000000 | 0.0 | 0.101413 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 119 | 0.000000 | 0.0 | 0.672498 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 120 | 0.000000 | 0.0 | 0.034437 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 121 | 0.000000 | 0.0 | 0.125430 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 122 | 0.000000 | 0.0 | 0.112338 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
123 rows × 8 columns
Note that modloss fragment intensities are enabled in this case (ModelManager(mask_modloss=False, ...)), so modloss intensities are not zero for Phosphopeptides:
[ ]:
phos_precursor_id = 1 # we manually assigned this peptide as phospho
inten_df.iloc[
df.loc[phos_precursor_id,'frag_start_idx']:
df.loc[phos_precursor_id,'frag_stop_idx'],:
]
| b_z1 | b_z2 | y_z1 | y_z2 | b_modloss_z1 | b_modloss_z2 | y_modloss_z1 | y_modloss_z2 | |
|---|---|---|---|---|---|---|---|---|
| 8 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 9 | 0.063835 | 0.0 | 0.012835 | 0.000606 | 0.0 | 0.0 | 0.0 | 0.0 |
| 10 | 0.066177 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 11 | 0.061181 | 0.0 | 0.064921 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 12 | 0.000000 | 0.0 | 0.082699 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 13 | 0.000000 | 0.0 | 1.000000 | 0.080108 | 0.0 | 0.0 | 0.0 | 0.0 |
| 14 | 0.000000 | 0.0 | 0.068587 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 15 | 0.000000 | 0.0 | 0.293111 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 16 | 0.000000 | 0.0 | 0.185996 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 17 | 0.000000 | 0.0 | 0.024486 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 18 | 0.000000 | 0.0 | 0.105864 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 19 | 0.000000 | 0.0 | 0.148301 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 20 | 0.000000 | 0.0 | 0.046693 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
To disable this, use ModelManager(mask_modloss=False, ...):
[ ]:
model_mgr = ModelManager(mask_modloss=True, device='cpu')
model_mgr.load_installed_models('phos')
df = IRT_PEPTIDE_DF.copy()
df.loc[1,'mods'] = 'Phospho@S'
df.loc[1,'mod_sites'] = '5'
df['charge'] = 2
inten_df = model_mgr.predict_ms2(df)
inten_df.iloc[
df.loc[1,'frag_start_idx']:
df.loc[1,'frag_stop_idx'],:
]
2022-09-09 21:54:13> Predicting MS2 ...
100%|██████████| 5/5 [00:00<00:00, 86.70it/s]
| b_z1 | b_z2 | y_z1 | y_z2 | b_modloss_z1 | b_modloss_z2 | y_modloss_z1 | y_modloss_z2 | |
|---|---|---|---|---|---|---|---|---|
| 8 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 9 | 0.063835 | 0.0 | 0.012835 | 0.000606 | 0.0 | 0.0 | 0.0 | 0.0 |
| 10 | 0.066177 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 11 | 0.061181 | 0.0 | 0.064921 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 12 | 0.000000 | 0.0 | 0.082699 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 13 | 0.000000 | 0.0 | 1.000000 | 0.080108 | 0.0 | 0.0 | 0.0 | 0.0 |
| 14 | 0.000000 | 0.0 | 0.068587 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 15 | 0.000000 | 0.0 | 0.293111 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 16 | 0.000000 | 0.0 | 0.185996 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 17 | 0.000000 | 0.0 | 0.024486 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 18 | 0.000000 | 0.0 | 0.105864 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 19 | 0.000000 | 0.0 | 0.148301 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
| 20 | 0.000000 | 0.0 | 0.046693 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 |
[ ]: