Tutorial: using ModelManager

[ ]:
from peptdeep.pretrained_models import ModelManager

ModelManager is the main entry to access MS2/RT/CCS models.

[ ]:
model_mgr = ModelManager(mask_modloss=True, device='cpu')

Most of the default parameters and attributes of ModelManager class are controlled by peptdeep.settings.global_settings which is a dict.

from peptdeep.settings import global_settings

The default values of peptdeep.settings.global_settings is defined in default_settings.yaml.

ModelManager.load_installed_models

ModelManager.load_installed_models(model_type) enables users to load different model types. The model_type could be:

  • generic: generic RT/CCS/MS2 models including HLA

  • HLA: currently the same as generic

  • phos: RT/CCS/MS2 models for Phospho@S/T/Y

  • digly: RT/CCS/MS2 models for GlyGly@K

Calling ModelManager(...) will also call ModelManager.load_installed_models implicitly, and the default model_type is global_settings['model_mgr']['model_type'].

Test the RT model

Use the 11 iRT peptides to test the RT model

[ ]:
from peptdeep.model.rt import IRT_PEPTIDE_DF
[ ]:
df = IRT_PEPTIDE_DF.copy()
# randomly add some modifications, this may change the real irt
df.loc[1,'mods'] = 'Phospho@S'
df.loc[1,'mod_sites'] = '5'
df
sequence pep_name irt mods mod_sites nAA
0 LGGNEQVTR RT-pep a -24.92 9
1 GAGSSEPVTGLDAK RT-pep b 0.00 Phospho@S 5 14
2 VEATFGVDESNAK RT-pep c 12.39 13
3 YILAGVENSK RT-pep d 19.79 10
4 TPVISGGPYEYR RT-pep e 28.71 12
5 TPVITGAPYEYR RT-pep f 33.38 12
6 DGLDAASYYAPVR RT-pep g 42.26 13
7 ADVTPADFSEWSK RT-pep h 54.62 13
8 GTFIIDPGGVIR RT-pep i 70.52 12
9 GTFIIDPAAVIR RT-pep k 87.23 12
10 LFLQFGAQGSPFLK RT-pep l 100.00 14
[ ]:
model_mgr.load_installed_models('phos')
model_mgr.predict_rt(df)
model_mgr.rt_model.add_irt_column_to_precursor_df(df)
2022-09-09 21:54:02> Predicting RT ...
100%|██████████| 5/5 [00:00<00:00, 125.27it/s]
sequence pep_name irt mods mod_sites nAA rt_pred rt_norm_pred irt_pred
0 LGGNEQVTR RT-pep a -24.92 9 0.184235 0.184235 -26.123537
1 GAGSSEPVTGLDAK RT-pep b 0.00 Phospho@S 5 14 0.266746 0.266746 11.916059
2 VEATFGVDESNAK RT-pep c 12.39 13 0.266133 0.266133 11.633120
3 YILAGVENSK RT-pep d 19.79 10 0.290495 0.290495 22.864811
4 TPVISGGPYEYR RT-pep e 28.71 12 0.303847 0.303847 29.020259
5 TPVITGAPYEYR RT-pep f 33.38 12 0.316514 0.316514 34.860122
6 DGLDAASYYAPVR RT-pep g 42.26 13 0.324423 0.324423 38.506308
7 ADVTPADFSEWSK RT-pep h 54.62 13 0.345197 0.345197 48.083890
8 GTFIIDPGGVIR RT-pep i 70.52 12 0.394248 0.394248 70.697474
9 GTFIIDPAAVIR RT-pep k 87.23 12 0.434775 0.434775 89.381150
10 LFLQFGAQGSPFLK RT-pep l 100.00 14 0.459583 0.459583 100.818303

Training RT model on df with the rt_norm column:

[ ]:
def normalize_irt(df):
    min_rt = df.irt.min()
    df['rt_norm'] = (
        df.irt - min_rt
    ) / (df.irt.max()-min_rt)
normalize_irt(df)
model_mgr.epoch_to_train_rt_ccs=50
model_mgr.train_rt_model(df)
model_mgr.predict_rt(df)
model_mgr.rt_model.add_irt_column_to_precursor_df(df)
2022-09-09 21:54:02> 11 PSMs for RT training/fine-tuning
2022-09-09 21:54:09> Predicting RT ...
100%|██████████| 5/5 [00:00<00:00, 151.56it/s]
sequence pep_name irt mods mod_sites nAA rt_pred rt_norm_pred irt_pred rt_norm
0 LGGNEQVTR RT-pep a -24.92 9 0.127189 0.127189 -18.916407 0.000000
1 GAGSSEPVTGLDAK RT-pep b 0.00 Phospho@S 5 14 0.199919 0.199919 -5.504272 0.199488
2 VEATFGVDESNAK RT-pep c 12.39 13 0.295237 0.295237 12.073141 0.298671
3 YILAGVENSK RT-pep d 19.79 10 0.357351 0.357351 23.527389 0.357909
4 TPVISGGPYEYR RT-pep e 28.71 12 0.429762 0.429762 36.880596 0.429315
5 TPVITGAPYEYR RT-pep f 33.38 12 0.392419 0.392419 29.994243 0.466699
6 DGLDAASYYAPVR RT-pep g 42.26 13 0.387393 0.387393 29.067502 0.537784
7 ADVTPADFSEWSK RT-pep h 54.62 13 0.634485 0.634485 74.633402 0.636728
8 GTFIIDPGGVIR RT-pep i 70.52 12 0.671310 0.671310 81.424123 0.764009
9 GTFIIDPAAVIR RT-pep k 87.23 12 0.699334 0.699334 86.592033 0.897775
10 LFLQFGAQGSPFLK RT-pep l 100.00 14 0.607442 0.607442 69.646337 1.000000

Test the CCS model

[ ]:
df['charge'] = 2
model_mgr.predict_mobility(df)
2022-09-09 21:54:09> Predicting mobility ...
100%|██████████| 5/5 [00:00<00:00, 117.53it/s]
sequence pep_name irt mods mod_sites nAA rt_pred rt_norm_pred irt_pred rt_norm charge ccs_pred precursor_mz mobility_pred
0 LGGNEQVTR RT-pep a -24.92 9 0.127189 0.127189 -18.916407 0.000000 2 331.279816 487.256705 0.815533
1 GAGSSEPVTGLDAK RT-pep b 0.00 Phospho@S 5 14 0.199919 0.199919 -5.504272 0.199488 2 381.067841 684.805772 0.941902
2 VEATFGVDESNAK RT-pep c 12.39 13 0.295237 0.295237 12.073141 0.298671 2 394.208893 683.827889 0.974369
3 YILAGVENSK RT-pep d 19.79 10 0.357351 0.357351 23.527389 0.357909 2 364.828003 547.298039 0.899500
4 TPVISGGPYEYR RT-pep e 28.71 12 0.429762 0.429762 36.880596 0.429315 2 394.317596 669.838059 0.974434
5 TPVITGAPYEYR RT-pep f 33.38 12 0.392419 0.392419 29.994243 0.466699 2 399.848633 683.853709 0.988309
6 DGLDAASYYAPVR RT-pep g 42.26 13 0.387393 0.387393 29.067502 0.537784 2 399.736542 699.338423 0.988252
7 ADVTPADFSEWSK RT-pep h 54.62 13 0.634485 0.634485 74.633402 0.636728 2 405.532562 726.835714 1.002953
8 GTFIIDPGGVIR RT-pep i 70.52 12 0.671310 0.671310 81.424123 0.764009 2 379.443451 622.853512 0.936954
9 GTFIIDPAAVIR RT-pep k 87.23 12 0.699334 0.699334 86.592033 0.897775 2 387.886780 636.869163 0.958034
10 LFLQFGAQGSPFLK RT-pep l 100.00 14 0.607442 0.607442 69.646337 1.000000 2 435.544861 776.929751 1.077836

Test the MS2 model

[ ]:
df['charge'] = 2
inten_df = model_mgr.predict_ms2(df)
inten_df
2022-09-09 21:54:10> Predicting MS2 ...
100%|██████████| 5/5 [00:00<00:00, 82.83it/s]
b_z1 b_z2 y_z1 y_z2 b_modloss_z1 b_modloss_z2 y_modloss_z1 y_modloss_z2
0 0.000000 0.0 1.000000 0.021727 0.0 0.0 0.0 0.0
1 0.191613 0.0 0.343992 0.000000 0.0 0.0 0.0 0.0
2 0.063825 0.0 0.119938 0.015200 0.0 0.0 0.0 0.0
3 0.033420 0.0 0.257022 0.000000 0.0 0.0 0.0 0.0
4 0.027311 0.0 0.340053 0.000000 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ...
118 0.000000 0.0 0.101413 0.000000 0.0 0.0 0.0 0.0
119 0.000000 0.0 0.672498 0.000000 0.0 0.0 0.0 0.0
120 0.000000 0.0 0.034437 0.000000 0.0 0.0 0.0 0.0
121 0.000000 0.0 0.125430 0.000000 0.0 0.0 0.0 0.0
122 0.000000 0.0 0.112338 0.000000 0.0 0.0 0.0 0.0

123 rows × 8 columns

Note that modloss fragment intensities are enabled in this case (ModelManager(mask_modloss=False, ...)), so modloss intensities are not zero for Phosphopeptides:

[ ]:
phos_precursor_id = 1 # we manually assigned this peptide as phospho
inten_df.iloc[
    df.loc[phos_precursor_id,'frag_start_idx']:
    df.loc[phos_precursor_id,'frag_stop_idx'],:
]
b_z1 b_z2 y_z1 y_z2 b_modloss_z1 b_modloss_z2 y_modloss_z1 y_modloss_z2
8 0.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0
9 0.063835 0.0 0.012835 0.000606 0.0 0.0 0.0 0.0
10 0.066177 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0
11 0.061181 0.0 0.064921 0.000000 0.0 0.0 0.0 0.0
12 0.000000 0.0 0.082699 0.000000 0.0 0.0 0.0 0.0
13 0.000000 0.0 1.000000 0.080108 0.0 0.0 0.0 0.0
14 0.000000 0.0 0.068587 0.000000 0.0 0.0 0.0 0.0
15 0.000000 0.0 0.293111 0.000000 0.0 0.0 0.0 0.0
16 0.000000 0.0 0.185996 0.000000 0.0 0.0 0.0 0.0
17 0.000000 0.0 0.024486 0.000000 0.0 0.0 0.0 0.0
18 0.000000 0.0 0.105864 0.000000 0.0 0.0 0.0 0.0
19 0.000000 0.0 0.148301 0.000000 0.0 0.0 0.0 0.0
20 0.000000 0.0 0.046693 0.000000 0.0 0.0 0.0 0.0

To disable this, use ModelManager(mask_modloss=False, ...):

[ ]:
model_mgr = ModelManager(mask_modloss=True, device='cpu')
model_mgr.load_installed_models('phos')
df = IRT_PEPTIDE_DF.copy()
df.loc[1,'mods'] = 'Phospho@S'
df.loc[1,'mod_sites'] = '5'
df['charge'] = 2
inten_df = model_mgr.predict_ms2(df)
inten_df.iloc[
    df.loc[1,'frag_start_idx']:
    df.loc[1,'frag_stop_idx'],:
]
2022-09-09 21:54:13> Predicting MS2 ...
100%|██████████| 5/5 [00:00<00:00, 86.70it/s]
b_z1 b_z2 y_z1 y_z2 b_modloss_z1 b_modloss_z2 y_modloss_z1 y_modloss_z2
8 0.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0
9 0.063835 0.0 0.012835 0.000606 0.0 0.0 0.0 0.0
10 0.066177 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0
11 0.061181 0.0 0.064921 0.000000 0.0 0.0 0.0 0.0
12 0.000000 0.0 0.082699 0.000000 0.0 0.0 0.0 0.0
13 0.000000 0.0 1.000000 0.080108 0.0 0.0 0.0 0.0
14 0.000000 0.0 0.068587 0.000000 0.0 0.0 0.0 0.0
15 0.000000 0.0 0.293111 0.000000 0.0 0.0 0.0 0.0
16 0.000000 0.0 0.185996 0.000000 0.0 0.0 0.0 0.0
17 0.000000 0.0 0.024486 0.000000 0.0 0.0 0.0 0.0
18 0.000000 0.0 0.105864 0.000000 0.0 0.0 0.0 0.0
19 0.000000 0.0 0.148301 0.000000 0.0 0.0 0.0 0.0
20 0.000000 0.0 0.046693 0.000000 0.0 0.0 0.0 0.0
[ ]: