{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tutorial: using `ModelManager`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from peptdeep.pretrained_models import ModelManager"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`ModelManager` is the main entry to access MS2/RT/CCS models."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_mgr = ModelManager(mask_modloss=True, device='cpu')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Most of the default parameters and attributes of `ModelManager` class are controlled by `peptdeep.settings.global_settings` which is a dict.\n",
"\n",
"```\n",
"from peptdeep.settings import global_settings\n",
"```\n",
"\n",
"The default values of `peptdeep.settings.global_settings` is defined in [default_settings.yaml](../peptdeep/constants/default_settings.yaml)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `ModelManager.load_installed_models`\n",
"\n",
"`ModelManager.load_installed_models(model_type)` enables users to load different model types. The `model_type` could be: \n",
"- generic: generic RT/CCS/MS2 models including HLA\n",
"- HLA: currently the same as `generic`\n",
"- phos: RT/CCS/MS2 models for Phospho@S/T/Y\n",
"- digly: RT/CCS/MS2 models for GlyGly@K\n",
"\n",
"Calling `ModelManager(...)` will also call `ModelManager.load_installed_models` implicitly, and the default model_type is `global_settings['model_mgr']['model_type']`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test the RT model\n",
"\n",
"Use the 11 iRT peptides to test the RT model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from peptdeep.model.rt import IRT_PEPTIDE_DF"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" pep_name | \n",
" irt | \n",
" mods | \n",
" mod_sites | \n",
" nAA | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" LGGNEQVTR | \n",
" RT-pep a | \n",
" -24.92 | \n",
" | \n",
" | \n",
" 9 | \n",
"
\n",
" \n",
" | 1 | \n",
" GAGSSEPVTGLDAK | \n",
" RT-pep b | \n",
" 0.00 | \n",
" Phospho@S | \n",
" 5 | \n",
" 14 | \n",
"
\n",
" \n",
" | 2 | \n",
" VEATFGVDESNAK | \n",
" RT-pep c | \n",
" 12.39 | \n",
" | \n",
" | \n",
" 13 | \n",
"
\n",
" \n",
" | 3 | \n",
" YILAGVENSK | \n",
" RT-pep d | \n",
" 19.79 | \n",
" | \n",
" | \n",
" 10 | \n",
"
\n",
" \n",
" | 4 | \n",
" TPVISGGPYEYR | \n",
" RT-pep e | \n",
" 28.71 | \n",
" | \n",
" | \n",
" 12 | \n",
"
\n",
" \n",
" | 5 | \n",
" TPVITGAPYEYR | \n",
" RT-pep f | \n",
" 33.38 | \n",
" | \n",
" | \n",
" 12 | \n",
"
\n",
" \n",
" | 6 | \n",
" DGLDAASYYAPVR | \n",
" RT-pep g | \n",
" 42.26 | \n",
" | \n",
" | \n",
" 13 | \n",
"
\n",
" \n",
" | 7 | \n",
" ADVTPADFSEWSK | \n",
" RT-pep h | \n",
" 54.62 | \n",
" | \n",
" | \n",
" 13 | \n",
"
\n",
" \n",
" | 8 | \n",
" GTFIIDPGGVIR | \n",
" RT-pep i | \n",
" 70.52 | \n",
" | \n",
" | \n",
" 12 | \n",
"
\n",
" \n",
" | 9 | \n",
" GTFIIDPAAVIR | \n",
" RT-pep k | \n",
" 87.23 | \n",
" | \n",
" | \n",
" 12 | \n",
"
\n",
" \n",
" | 10 | \n",
" LFLQFGAQGSPFLK | \n",
" RT-pep l | \n",
" 100.00 | \n",
" | \n",
" | \n",
" 14 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence pep_name irt mods mod_sites nAA\n",
"0 LGGNEQVTR RT-pep a -24.92 9\n",
"1 GAGSSEPVTGLDAK RT-pep b 0.00 Phospho@S 5 14\n",
"2 VEATFGVDESNAK RT-pep c 12.39 13\n",
"3 YILAGVENSK RT-pep d 19.79 10\n",
"4 TPVISGGPYEYR RT-pep e 28.71 12\n",
"5 TPVITGAPYEYR RT-pep f 33.38 12\n",
"6 DGLDAASYYAPVR RT-pep g 42.26 13\n",
"7 ADVTPADFSEWSK RT-pep h 54.62 13\n",
"8 GTFIIDPGGVIR RT-pep i 70.52 12\n",
"9 GTFIIDPAAVIR RT-pep k 87.23 12\n",
"10 LFLQFGAQGSPFLK RT-pep l 100.00 14"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = IRT_PEPTIDE_DF.copy()\n",
"# randomly add some modifications, this may change the real irt\n",
"df.loc[1,'mods'] = 'Phospho@S'\n",
"df.loc[1,'mod_sites'] = '5'\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2022-09-09 21:54:02> Predicting RT ...\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 5/5 [00:00<00:00, 125.27it/s]\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" pep_name | \n",
" irt | \n",
" mods | \n",
" mod_sites | \n",
" nAA | \n",
" rt_pred | \n",
" rt_norm_pred | \n",
" irt_pred | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" LGGNEQVTR | \n",
" RT-pep a | \n",
" -24.92 | \n",
" | \n",
" | \n",
" 9 | \n",
" 0.184235 | \n",
" 0.184235 | \n",
" -26.123537 | \n",
"
\n",
" \n",
" | 1 | \n",
" GAGSSEPVTGLDAK | \n",
" RT-pep b | \n",
" 0.00 | \n",
" Phospho@S | \n",
" 5 | \n",
" 14 | \n",
" 0.266746 | \n",
" 0.266746 | \n",
" 11.916059 | \n",
"
\n",
" \n",
" | 2 | \n",
" VEATFGVDESNAK | \n",
" RT-pep c | \n",
" 12.39 | \n",
" | \n",
" | \n",
" 13 | \n",
" 0.266133 | \n",
" 0.266133 | \n",
" 11.633120 | \n",
"
\n",
" \n",
" | 3 | \n",
" YILAGVENSK | \n",
" RT-pep d | \n",
" 19.79 | \n",
" | \n",
" | \n",
" 10 | \n",
" 0.290495 | \n",
" 0.290495 | \n",
" 22.864811 | \n",
"
\n",
" \n",
" | 4 | \n",
" TPVISGGPYEYR | \n",
" RT-pep e | \n",
" 28.71 | \n",
" | \n",
" | \n",
" 12 | \n",
" 0.303847 | \n",
" 0.303847 | \n",
" 29.020259 | \n",
"
\n",
" \n",
" | 5 | \n",
" TPVITGAPYEYR | \n",
" RT-pep f | \n",
" 33.38 | \n",
" | \n",
" | \n",
" 12 | \n",
" 0.316514 | \n",
" 0.316514 | \n",
" 34.860122 | \n",
"
\n",
" \n",
" | 6 | \n",
" DGLDAASYYAPVR | \n",
" RT-pep g | \n",
" 42.26 | \n",
" | \n",
" | \n",
" 13 | \n",
" 0.324423 | \n",
" 0.324423 | \n",
" 38.506308 | \n",
"
\n",
" \n",
" | 7 | \n",
" ADVTPADFSEWSK | \n",
" RT-pep h | \n",
" 54.62 | \n",
" | \n",
" | \n",
" 13 | \n",
" 0.345197 | \n",
" 0.345197 | \n",
" 48.083890 | \n",
"
\n",
" \n",
" | 8 | \n",
" GTFIIDPGGVIR | \n",
" RT-pep i | \n",
" 70.52 | \n",
" | \n",
" | \n",
" 12 | \n",
" 0.394248 | \n",
" 0.394248 | \n",
" 70.697474 | \n",
"
\n",
" \n",
" | 9 | \n",
" GTFIIDPAAVIR | \n",
" RT-pep k | \n",
" 87.23 | \n",
" | \n",
" | \n",
" 12 | \n",
" 0.434775 | \n",
" 0.434775 | \n",
" 89.381150 | \n",
"
\n",
" \n",
" | 10 | \n",
" LFLQFGAQGSPFLK | \n",
" RT-pep l | \n",
" 100.00 | \n",
" | \n",
" | \n",
" 14 | \n",
" 0.459583 | \n",
" 0.459583 | \n",
" 100.818303 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence pep_name irt mods mod_sites nAA rt_pred \\\n",
"0 LGGNEQVTR RT-pep a -24.92 9 0.184235 \n",
"1 GAGSSEPVTGLDAK RT-pep b 0.00 Phospho@S 5 14 0.266746 \n",
"2 VEATFGVDESNAK RT-pep c 12.39 13 0.266133 \n",
"3 YILAGVENSK RT-pep d 19.79 10 0.290495 \n",
"4 TPVISGGPYEYR RT-pep e 28.71 12 0.303847 \n",
"5 TPVITGAPYEYR RT-pep f 33.38 12 0.316514 \n",
"6 DGLDAASYYAPVR RT-pep g 42.26 13 0.324423 \n",
"7 ADVTPADFSEWSK RT-pep h 54.62 13 0.345197 \n",
"8 GTFIIDPGGVIR RT-pep i 70.52 12 0.394248 \n",
"9 GTFIIDPAAVIR RT-pep k 87.23 12 0.434775 \n",
"10 LFLQFGAQGSPFLK RT-pep l 100.00 14 0.459583 \n",
"\n",
" rt_norm_pred irt_pred \n",
"0 0.184235 -26.123537 \n",
"1 0.266746 11.916059 \n",
"2 0.266133 11.633120 \n",
"3 0.290495 22.864811 \n",
"4 0.303847 29.020259 \n",
"5 0.316514 34.860122 \n",
"6 0.324423 38.506308 \n",
"7 0.345197 48.083890 \n",
"8 0.394248 70.697474 \n",
"9 0.434775 89.381150 \n",
"10 0.459583 100.818303 "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_mgr.load_installed_models('phos')\n",
"model_mgr.predict_rt(df)\n",
"model_mgr.rt_model.add_irt_column_to_precursor_df(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Training RT model on df with the `rt_norm` column:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2022-09-09 21:54:02> 11 PSMs for RT training/fine-tuning\n",
"2022-09-09 21:54:09> Predicting RT ...\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 5/5 [00:00<00:00, 151.56it/s]\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" pep_name | \n",
" irt | \n",
" mods | \n",
" mod_sites | \n",
" nAA | \n",
" rt_pred | \n",
" rt_norm_pred | \n",
" irt_pred | \n",
" rt_norm | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" LGGNEQVTR | \n",
" RT-pep a | \n",
" -24.92 | \n",
" | \n",
" | \n",
" 9 | \n",
" 0.127189 | \n",
" 0.127189 | \n",
" -18.916407 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" | 1 | \n",
" GAGSSEPVTGLDAK | \n",
" RT-pep b | \n",
" 0.00 | \n",
" Phospho@S | \n",
" 5 | \n",
" 14 | \n",
" 0.199919 | \n",
" 0.199919 | \n",
" -5.504272 | \n",
" 0.199488 | \n",
"
\n",
" \n",
" | 2 | \n",
" VEATFGVDESNAK | \n",
" RT-pep c | \n",
" 12.39 | \n",
" | \n",
" | \n",
" 13 | \n",
" 0.295237 | \n",
" 0.295237 | \n",
" 12.073141 | \n",
" 0.298671 | \n",
"
\n",
" \n",
" | 3 | \n",
" YILAGVENSK | \n",
" RT-pep d | \n",
" 19.79 | \n",
" | \n",
" | \n",
" 10 | \n",
" 0.357351 | \n",
" 0.357351 | \n",
" 23.527389 | \n",
" 0.357909 | \n",
"
\n",
" \n",
" | 4 | \n",
" TPVISGGPYEYR | \n",
" RT-pep e | \n",
" 28.71 | \n",
" | \n",
" | \n",
" 12 | \n",
" 0.429762 | \n",
" 0.429762 | \n",
" 36.880596 | \n",
" 0.429315 | \n",
"
\n",
" \n",
" | 5 | \n",
" TPVITGAPYEYR | \n",
" RT-pep f | \n",
" 33.38 | \n",
" | \n",
" | \n",
" 12 | \n",
" 0.392419 | \n",
" 0.392419 | \n",
" 29.994243 | \n",
" 0.466699 | \n",
"
\n",
" \n",
" | 6 | \n",
" DGLDAASYYAPVR | \n",
" RT-pep g | \n",
" 42.26 | \n",
" | \n",
" | \n",
" 13 | \n",
" 0.387393 | \n",
" 0.387393 | \n",
" 29.067502 | \n",
" 0.537784 | \n",
"
\n",
" \n",
" | 7 | \n",
" ADVTPADFSEWSK | \n",
" RT-pep h | \n",
" 54.62 | \n",
" | \n",
" | \n",
" 13 | \n",
" 0.634485 | \n",
" 0.634485 | \n",
" 74.633402 | \n",
" 0.636728 | \n",
"
\n",
" \n",
" | 8 | \n",
" GTFIIDPGGVIR | \n",
" RT-pep i | \n",
" 70.52 | \n",
" | \n",
" | \n",
" 12 | \n",
" 0.671310 | \n",
" 0.671310 | \n",
" 81.424123 | \n",
" 0.764009 | \n",
"
\n",
" \n",
" | 9 | \n",
" GTFIIDPAAVIR | \n",
" RT-pep k | \n",
" 87.23 | \n",
" | \n",
" | \n",
" 12 | \n",
" 0.699334 | \n",
" 0.699334 | \n",
" 86.592033 | \n",
" 0.897775 | \n",
"
\n",
" \n",
" | 10 | \n",
" LFLQFGAQGSPFLK | \n",
" RT-pep l | \n",
" 100.00 | \n",
" | \n",
" | \n",
" 14 | \n",
" 0.607442 | \n",
" 0.607442 | \n",
" 69.646337 | \n",
" 1.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence pep_name irt mods mod_sites nAA rt_pred \\\n",
"0 LGGNEQVTR RT-pep a -24.92 9 0.127189 \n",
"1 GAGSSEPVTGLDAK RT-pep b 0.00 Phospho@S 5 14 0.199919 \n",
"2 VEATFGVDESNAK RT-pep c 12.39 13 0.295237 \n",
"3 YILAGVENSK RT-pep d 19.79 10 0.357351 \n",
"4 TPVISGGPYEYR RT-pep e 28.71 12 0.429762 \n",
"5 TPVITGAPYEYR RT-pep f 33.38 12 0.392419 \n",
"6 DGLDAASYYAPVR RT-pep g 42.26 13 0.387393 \n",
"7 ADVTPADFSEWSK RT-pep h 54.62 13 0.634485 \n",
"8 GTFIIDPGGVIR RT-pep i 70.52 12 0.671310 \n",
"9 GTFIIDPAAVIR RT-pep k 87.23 12 0.699334 \n",
"10 LFLQFGAQGSPFLK RT-pep l 100.00 14 0.607442 \n",
"\n",
" rt_norm_pred irt_pred rt_norm \n",
"0 0.127189 -18.916407 0.000000 \n",
"1 0.199919 -5.504272 0.199488 \n",
"2 0.295237 12.073141 0.298671 \n",
"3 0.357351 23.527389 0.357909 \n",
"4 0.429762 36.880596 0.429315 \n",
"5 0.392419 29.994243 0.466699 \n",
"6 0.387393 29.067502 0.537784 \n",
"7 0.634485 74.633402 0.636728 \n",
"8 0.671310 81.424123 0.764009 \n",
"9 0.699334 86.592033 0.897775 \n",
"10 0.607442 69.646337 1.000000 "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def normalize_irt(df):\n",
" min_rt = df.irt.min()\n",
" df['rt_norm'] = (\n",
" df.irt - min_rt\n",
" ) / (df.irt.max()-min_rt)\n",
"normalize_irt(df)\n",
"model_mgr.epoch_to_train_rt_ccs=50\n",
"model_mgr.train_rt_model(df)\n",
"model_mgr.predict_rt(df)\n",
"model_mgr.rt_model.add_irt_column_to_precursor_df(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test the CCS model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2022-09-09 21:54:09> Predicting mobility ...\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 5/5 [00:00<00:00, 117.53it/s]\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" pep_name | \n",
" irt | \n",
" mods | \n",
" mod_sites | \n",
" nAA | \n",
" rt_pred | \n",
" rt_norm_pred | \n",
" irt_pred | \n",
" rt_norm | \n",
" charge | \n",
" ccs_pred | \n",
" precursor_mz | \n",
" mobility_pred | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" LGGNEQVTR | \n",
" RT-pep a | \n",
" -24.92 | \n",
" | \n",
" | \n",
" 9 | \n",
" 0.127189 | \n",
" 0.127189 | \n",
" -18.916407 | \n",
" 0.000000 | \n",
" 2 | \n",
" 331.279816 | \n",
" 487.256705 | \n",
" 0.815533 | \n",
"
\n",
" \n",
" | 1 | \n",
" GAGSSEPVTGLDAK | \n",
" RT-pep b | \n",
" 0.00 | \n",
" Phospho@S | \n",
" 5 | \n",
" 14 | \n",
" 0.199919 | \n",
" 0.199919 | \n",
" -5.504272 | \n",
" 0.199488 | \n",
" 2 | \n",
" 381.067841 | \n",
" 684.805772 | \n",
" 0.941902 | \n",
"
\n",
" \n",
" | 2 | \n",
" VEATFGVDESNAK | \n",
" RT-pep c | \n",
" 12.39 | \n",
" | \n",
" | \n",
" 13 | \n",
" 0.295237 | \n",
" 0.295237 | \n",
" 12.073141 | \n",
" 0.298671 | \n",
" 2 | \n",
" 394.208893 | \n",
" 683.827889 | \n",
" 0.974369 | \n",
"
\n",
" \n",
" | 3 | \n",
" YILAGVENSK | \n",
" RT-pep d | \n",
" 19.79 | \n",
" | \n",
" | \n",
" 10 | \n",
" 0.357351 | \n",
" 0.357351 | \n",
" 23.527389 | \n",
" 0.357909 | \n",
" 2 | \n",
" 364.828003 | \n",
" 547.298039 | \n",
" 0.899500 | \n",
"
\n",
" \n",
" | 4 | \n",
" TPVISGGPYEYR | \n",
" RT-pep e | \n",
" 28.71 | \n",
" | \n",
" | \n",
" 12 | \n",
" 0.429762 | \n",
" 0.429762 | \n",
" 36.880596 | \n",
" 0.429315 | \n",
" 2 | \n",
" 394.317596 | \n",
" 669.838059 | \n",
" 0.974434 | \n",
"
\n",
" \n",
" | 5 | \n",
" TPVITGAPYEYR | \n",
" RT-pep f | \n",
" 33.38 | \n",
" | \n",
" | \n",
" 12 | \n",
" 0.392419 | \n",
" 0.392419 | \n",
" 29.994243 | \n",
" 0.466699 | \n",
" 2 | \n",
" 399.848633 | \n",
" 683.853709 | \n",
" 0.988309 | \n",
"
\n",
" \n",
" | 6 | \n",
" DGLDAASYYAPVR | \n",
" RT-pep g | \n",
" 42.26 | \n",
" | \n",
" | \n",
" 13 | \n",
" 0.387393 | \n",
" 0.387393 | \n",
" 29.067502 | \n",
" 0.537784 | \n",
" 2 | \n",
" 399.736542 | \n",
" 699.338423 | \n",
" 0.988252 | \n",
"
\n",
" \n",
" | 7 | \n",
" ADVTPADFSEWSK | \n",
" RT-pep h | \n",
" 54.62 | \n",
" | \n",
" | \n",
" 13 | \n",
" 0.634485 | \n",
" 0.634485 | \n",
" 74.633402 | \n",
" 0.636728 | \n",
" 2 | \n",
" 405.532562 | \n",
" 726.835714 | \n",
" 1.002953 | \n",
"
\n",
" \n",
" | 8 | \n",
" GTFIIDPGGVIR | \n",
" RT-pep i | \n",
" 70.52 | \n",
" | \n",
" | \n",
" 12 | \n",
" 0.671310 | \n",
" 0.671310 | \n",
" 81.424123 | \n",
" 0.764009 | \n",
" 2 | \n",
" 379.443451 | \n",
" 622.853512 | \n",
" 0.936954 | \n",
"
\n",
" \n",
" | 9 | \n",
" GTFIIDPAAVIR | \n",
" RT-pep k | \n",
" 87.23 | \n",
" | \n",
" | \n",
" 12 | \n",
" 0.699334 | \n",
" 0.699334 | \n",
" 86.592033 | \n",
" 0.897775 | \n",
" 2 | \n",
" 387.886780 | \n",
" 636.869163 | \n",
" 0.958034 | \n",
"
\n",
" \n",
" | 10 | \n",
" LFLQFGAQGSPFLK | \n",
" RT-pep l | \n",
" 100.00 | \n",
" | \n",
" | \n",
" 14 | \n",
" 0.607442 | \n",
" 0.607442 | \n",
" 69.646337 | \n",
" 1.000000 | \n",
" 2 | \n",
" 435.544861 | \n",
" 776.929751 | \n",
" 1.077836 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence pep_name irt mods mod_sites nAA rt_pred \\\n",
"0 LGGNEQVTR RT-pep a -24.92 9 0.127189 \n",
"1 GAGSSEPVTGLDAK RT-pep b 0.00 Phospho@S 5 14 0.199919 \n",
"2 VEATFGVDESNAK RT-pep c 12.39 13 0.295237 \n",
"3 YILAGVENSK RT-pep d 19.79 10 0.357351 \n",
"4 TPVISGGPYEYR RT-pep e 28.71 12 0.429762 \n",
"5 TPVITGAPYEYR RT-pep f 33.38 12 0.392419 \n",
"6 DGLDAASYYAPVR RT-pep g 42.26 13 0.387393 \n",
"7 ADVTPADFSEWSK RT-pep h 54.62 13 0.634485 \n",
"8 GTFIIDPGGVIR RT-pep i 70.52 12 0.671310 \n",
"9 GTFIIDPAAVIR RT-pep k 87.23 12 0.699334 \n",
"10 LFLQFGAQGSPFLK RT-pep l 100.00 14 0.607442 \n",
"\n",
" rt_norm_pred irt_pred rt_norm charge ccs_pred precursor_mz \\\n",
"0 0.127189 -18.916407 0.000000 2 331.279816 487.256705 \n",
"1 0.199919 -5.504272 0.199488 2 381.067841 684.805772 \n",
"2 0.295237 12.073141 0.298671 2 394.208893 683.827889 \n",
"3 0.357351 23.527389 0.357909 2 364.828003 547.298039 \n",
"4 0.429762 36.880596 0.429315 2 394.317596 669.838059 \n",
"5 0.392419 29.994243 0.466699 2 399.848633 683.853709 \n",
"6 0.387393 29.067502 0.537784 2 399.736542 699.338423 \n",
"7 0.634485 74.633402 0.636728 2 405.532562 726.835714 \n",
"8 0.671310 81.424123 0.764009 2 379.443451 622.853512 \n",
"9 0.699334 86.592033 0.897775 2 387.886780 636.869163 \n",
"10 0.607442 69.646337 1.000000 2 435.544861 776.929751 \n",
"\n",
" mobility_pred \n",
"0 0.815533 \n",
"1 0.941902 \n",
"2 0.974369 \n",
"3 0.899500 \n",
"4 0.974434 \n",
"5 0.988309 \n",
"6 0.988252 \n",
"7 1.002953 \n",
"8 0.936954 \n",
"9 0.958034 \n",
"10 1.077836 "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['charge'] = 2\n",
"model_mgr.predict_mobility(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test the MS2 model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2022-09-09 21:54:10> Predicting MS2 ...\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 5/5 [00:00<00:00, 82.83it/s]\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" b_z1 | \n",
" b_z2 | \n",
" y_z1 | \n",
" y_z2 | \n",
" b_modloss_z1 | \n",
" b_modloss_z2 | \n",
" y_modloss_z1 | \n",
" y_modloss_z2 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 1.000000 | \n",
" 0.021727 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 1 | \n",
" 0.191613 | \n",
" 0.0 | \n",
" 0.343992 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 0.063825 | \n",
" 0.0 | \n",
" 0.119938 | \n",
" 0.015200 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 0.033420 | \n",
" 0.0 | \n",
" 0.257022 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 4 | \n",
" 0.027311 | \n",
" 0.0 | \n",
" 0.340053 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 118 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.101413 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 119 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.672498 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 120 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.034437 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 121 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.125430 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 122 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.112338 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
"
\n",
"
123 rows × 8 columns
\n",
"
"
],
"text/plain": [
" b_z1 b_z2 y_z1 y_z2 b_modloss_z1 b_modloss_z2 \\\n",
"0 0.000000 0.0 1.000000 0.021727 0.0 0.0 \n",
"1 0.191613 0.0 0.343992 0.000000 0.0 0.0 \n",
"2 0.063825 0.0 0.119938 0.015200 0.0 0.0 \n",
"3 0.033420 0.0 0.257022 0.000000 0.0 0.0 \n",
"4 0.027311 0.0 0.340053 0.000000 0.0 0.0 \n",
".. ... ... ... ... ... ... \n",
"118 0.000000 0.0 0.101413 0.000000 0.0 0.0 \n",
"119 0.000000 0.0 0.672498 0.000000 0.0 0.0 \n",
"120 0.000000 0.0 0.034437 0.000000 0.0 0.0 \n",
"121 0.000000 0.0 0.125430 0.000000 0.0 0.0 \n",
"122 0.000000 0.0 0.112338 0.000000 0.0 0.0 \n",
"\n",
" y_modloss_z1 y_modloss_z2 \n",
"0 0.0 0.0 \n",
"1 0.0 0.0 \n",
"2 0.0 0.0 \n",
"3 0.0 0.0 \n",
"4 0.0 0.0 \n",
".. ... ... \n",
"118 0.0 0.0 \n",
"119 0.0 0.0 \n",
"120 0.0 0.0 \n",
"121 0.0 0.0 \n",
"122 0.0 0.0 \n",
"\n",
"[123 rows x 8 columns]"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['charge'] = 2\n",
"inten_df = model_mgr.predict_ms2(df)\n",
"inten_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that modloss fragment intensities are enabled in this case (`ModelManager(mask_modloss=False, ...)`), so modloss intensities are not zero for Phosphopeptides:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" b_z1 | \n",
" b_z2 | \n",
" y_z1 | \n",
" y_z2 | \n",
" b_modloss_z1 | \n",
" b_modloss_z2 | \n",
" y_modloss_z1 | \n",
" y_modloss_z2 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 8 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 9 | \n",
" 0.063835 | \n",
" 0.0 | \n",
" 0.012835 | \n",
" 0.000606 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 10 | \n",
" 0.066177 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 11 | \n",
" 0.061181 | \n",
" 0.0 | \n",
" 0.064921 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 12 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.082699 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 13 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 1.000000 | \n",
" 0.080108 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 14 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.068587 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 15 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.293111 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 16 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.185996 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 17 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.024486 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 18 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.105864 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 19 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.148301 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 20 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.046693 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" b_z1 b_z2 y_z1 y_z2 b_modloss_z1 b_modloss_z2 \\\n",
"8 0.000000 0.0 0.000000 0.000000 0.0 0.0 \n",
"9 0.063835 0.0 0.012835 0.000606 0.0 0.0 \n",
"10 0.066177 0.0 0.000000 0.000000 0.0 0.0 \n",
"11 0.061181 0.0 0.064921 0.000000 0.0 0.0 \n",
"12 0.000000 0.0 0.082699 0.000000 0.0 0.0 \n",
"13 0.000000 0.0 1.000000 0.080108 0.0 0.0 \n",
"14 0.000000 0.0 0.068587 0.000000 0.0 0.0 \n",
"15 0.000000 0.0 0.293111 0.000000 0.0 0.0 \n",
"16 0.000000 0.0 0.185996 0.000000 0.0 0.0 \n",
"17 0.000000 0.0 0.024486 0.000000 0.0 0.0 \n",
"18 0.000000 0.0 0.105864 0.000000 0.0 0.0 \n",
"19 0.000000 0.0 0.148301 0.000000 0.0 0.0 \n",
"20 0.000000 0.0 0.046693 0.000000 0.0 0.0 \n",
"\n",
" y_modloss_z1 y_modloss_z2 \n",
"8 0.0 0.0 \n",
"9 0.0 0.0 \n",
"10 0.0 0.0 \n",
"11 0.0 0.0 \n",
"12 0.0 0.0 \n",
"13 0.0 0.0 \n",
"14 0.0 0.0 \n",
"15 0.0 0.0 \n",
"16 0.0 0.0 \n",
"17 0.0 0.0 \n",
"18 0.0 0.0 \n",
"19 0.0 0.0 \n",
"20 0.0 0.0 "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"phos_precursor_id = 1 # we manually assigned this peptide as phospho\n",
"inten_df.iloc[\n",
" df.loc[phos_precursor_id,'frag_start_idx']:\n",
" df.loc[phos_precursor_id,'frag_stop_idx'],:\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To disable this, use `ModelManager(mask_modloss=False, ...)`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2022-09-09 21:54:13> Predicting MS2 ...\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 5/5 [00:00<00:00, 86.70it/s]\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" b_z1 | \n",
" b_z2 | \n",
" y_z1 | \n",
" y_z2 | \n",
" b_modloss_z1 | \n",
" b_modloss_z2 | \n",
" y_modloss_z1 | \n",
" y_modloss_z2 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 8 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 9 | \n",
" 0.063835 | \n",
" 0.0 | \n",
" 0.012835 | \n",
" 0.000606 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 10 | \n",
" 0.066177 | \n",
" 0.0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 11 | \n",
" 0.061181 | \n",
" 0.0 | \n",
" 0.064921 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 12 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.082699 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 13 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 1.000000 | \n",
" 0.080108 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 14 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.068587 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 15 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.293111 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 16 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.185996 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 17 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.024486 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 18 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.105864 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 19 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.148301 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" | 20 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.046693 | \n",
" 0.000000 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" b_z1 b_z2 y_z1 y_z2 b_modloss_z1 b_modloss_z2 \\\n",
"8 0.000000 0.0 0.000000 0.000000 0.0 0.0 \n",
"9 0.063835 0.0 0.012835 0.000606 0.0 0.0 \n",
"10 0.066177 0.0 0.000000 0.000000 0.0 0.0 \n",
"11 0.061181 0.0 0.064921 0.000000 0.0 0.0 \n",
"12 0.000000 0.0 0.082699 0.000000 0.0 0.0 \n",
"13 0.000000 0.0 1.000000 0.080108 0.0 0.0 \n",
"14 0.000000 0.0 0.068587 0.000000 0.0 0.0 \n",
"15 0.000000 0.0 0.293111 0.000000 0.0 0.0 \n",
"16 0.000000 0.0 0.185996 0.000000 0.0 0.0 \n",
"17 0.000000 0.0 0.024486 0.000000 0.0 0.0 \n",
"18 0.000000 0.0 0.105864 0.000000 0.0 0.0 \n",
"19 0.000000 0.0 0.148301 0.000000 0.0 0.0 \n",
"20 0.000000 0.0 0.046693 0.000000 0.0 0.0 \n",
"\n",
" y_modloss_z1 y_modloss_z2 \n",
"8 0.0 0.0 \n",
"9 0.0 0.0 \n",
"10 0.0 0.0 \n",
"11 0.0 0.0 \n",
"12 0.0 0.0 \n",
"13 0.0 0.0 \n",
"14 0.0 0.0 \n",
"15 0.0 0.0 \n",
"16 0.0 0.0 \n",
"17 0.0 0.0 \n",
"18 0.0 0.0 \n",
"19 0.0 0.0 \n",
"20 0.0 0.0 "
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_mgr = ModelManager(mask_modloss=True, device='cpu')\n",
"model_mgr.load_installed_models('phos')\n",
"df = IRT_PEPTIDE_DF.copy()\n",
"df.loc[1,'mods'] = 'Phospho@S'\n",
"df.loc[1,'mod_sites'] = '5'\n",
"df['charge'] = 2\n",
"inten_df = model_mgr.predict_ms2(df)\n",
"inten_df.iloc[\n",
" df.loc[1,'frag_start_idx']:\n",
" df.loc[1,'frag_stop_idx'],:\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.3 ('base')",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.8.3"
},
"vscode": {
"interpreter": {
"hash": "8a3b27e141e49c996c9b863f8707e97aabd49c4a7e8445b9b783b34e4a21a9b2"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}