peptdeep.pretrained_models#
The main entry of pretrained MS2/RT/CCS models.
Classes:
|
The manager class to access MS2/RT/CCS models. |
Functions:
|
|
|
|
|
|
|
|
|
|
|
|
|
- class peptdeep.pretrained_models.ModelManager(mask_modloss: bool = False, device: str = 'gpu')[source][source]#
Bases:
object
The manager class to access MS2/RT/CCS models.
- ms2_model#
The MS2 prediction model.
- rt_model#
The RT prediction model.
- ccs_model#
The CCS prediciton model.
- psm_num_to_train_ms2#
Number of PSMs to train the MS2 model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘psm_num_to_train_ms2’].
- Type:
int
- epoch_to_train_ms2#
Number of epoches to train the MS2 model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘epoch_ms2’].
- Type:
int
- psm_num_to_train_rt_ccs#
Number of PSMs to train RT/CCS model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘psm_num_to_train_rt_ccs’].
- Type:
int
- epoch_to_train_rt_ccs#
Number of epoches to train RT/CCS model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘epoch_rt_ccs’].
- Type:
int
- nce#
Default NCE value for a precursor_df without the ‘nce’ column. Defaults to global_settings[‘model_mgr’][‘default_nce’].
- Type:
float
- instrument#
Default instrument type for a precursor_df without the ‘instrument’ column. Defaults to global_settings[‘model_mgr’][‘default_instrument’].
- Type:
str
- use_grid_nce_search#
If self.ms2_model uses peptdeep.model.ms2.pDeepModel.grid_nce_search() to determine optimal NCE and instrument type. This will change self.nce and self.instrument values. Defaults to global_settings[‘model_mgr’][‘transfer’][‘grid_nce_search’].
- Type:
bool
Methods:
__init__
([mask_modloss, device])- param mask_modloss:
If modloss ions are masked to zeros in the ms2 model. modloss
load_external_models
(*[, ms2_model_file, ...])Load external MS2/RT/CCS models.
load_installed_models
([model_type])Load built-in MS2/CCS/RT models.
predict_all
(precursor_df, *[, ...])Predict all items defined by predict_items, which may include rt, mobility, fragment_mz and fragment_intensity.
predict_all_mp
(precursor_df, *[, ...])predict_mobility
(precursor_df, *[, batch_size])Predict mobility (ccs_pred and mobility_pred) inplace into precursor_df.
predict_ms2
(precursor_df, *[, batch_size, ...])Predict MS2 for the given precursor_df
predict_rt
(precursor_df, *[, batch_size])Predict RT ('rt_pred') inplace into precursor_df.
reset_by_global_settings
([reload_models])save_models
(folder)Save MS2/RT/CCS models into a folder
set_default_nce
(df)Alias for set_default_nce_instrument
Append 'nce' and 'instrument' columns into df with self.nce and self.instrument
train_ccs_model
(psm_df)Train/fine-tune the CCS model.
train_ms2_model
(psm_df, matched_intensity_df)Using matched_intensity_df to train/fine-tune the ms2 model.
train_rt_model
(psm_df)Train/fine-tune the RT model.
Attributes:
- __init__(mask_modloss: bool = False, device: str = 'gpu')[source][source]#
- Parameters:
mask_modloss (bool, optional) – If modloss ions are masked to zeros in the ms2 model. modloss ions are mostly useful for phospho MS2 prediciton model. Defaults to True.
device (str, optional) – Device for DL models, could be ‘gpu’ (‘cuda’) or ‘cpu’. if device==’gpu’ but no GPUs are detected, it will automatically switch to ‘cpu’. Defaults to ‘gpu’
- property instrument#
- load_external_models(*, ms2_model_file: str | BytesIO = '', rt_model_file: str | BytesIO = '', ccs_model_file: str | BytesIO = '')[source][source]#
Load external MS2/RT/CCS models.
- Parameters:
ms2_model_file (Tuple[str, io.BytesIO], optional) – MS2 model file or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.
rt_model_file (Tuple[str, io.BytesIO], optional) – RT model file or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.
ccs_model_file (Tuple[str, io.BytesIO], optional) – CCS model or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.
- load_installed_models(model_type: str = 'generic')[source][source]#
Load built-in MS2/CCS/RT models.
- Parameters:
model_type (str, optional) – To load the installed MS2/RT/CCS models or phos MS2/RT/CCS models. It could be ‘digly’, ‘phospho’, ‘HLA’, or ‘generic’. Defaults to ‘generic’.
- predict_all(precursor_df: DataFrame, *, predict_items: list = ['rt', 'mobility', 'ms2'], frag_types: list = None, multiprocessing: bool = True, min_required_precursor_num_for_mp: int = 3000, process_num: int = 8, mp_batch_size: int = 100000) Dict[str, DataFrame] [source][source]#
Predict all items defined by predict_items, which may include rt, mobility, fragment_mz and fragment_intensity.
- Parameters:
precursor_df (pd.DataFrame) – Precursor dataframe contains sequence, mods, mod_sites, charge … columns.
predict_items (list, optional) – items (‘rt’, ‘mobility’, ‘ms2’) to predict. Defaults to [‘rt’ ,’mobility’ ,’ms2’].
frag_types (list, optional) – Fragment types to predict. If it is None, it then depends on self.ms2_model.charged_frag_types and self.ms2_model.model._mask_modloss. Defaults to None.
multiprocessing (bool, optional) – If use multiprocessing is gpu is not available Defaults to True.
process_num (int, optional) – Defaults to 4
min_required_precursor_num_for_mp (int, optional) – It will not use multiprocessing when the number of precursors in precursor_df is lower than this value. Defaults to 3000.
mp_batch_size (int, optional) – Splitting data into batches for multiprocessing. Defaults to 100000.
- Returns:
{‘precursor_df’: precursor_df} and if ‘ms2’ in predict_items, it also contains:
` { 'fragment_mz_df': fragment_mz_df, 'fragment_intensity_df': fragment_intensity_df } `
- Return type:
Dict[str, pd.DataFrame]
- predict_all_mp(precursor_df: DataFrame, *, predict_items: list = ['rt', 'mobility', 'ms2'], frag_types: list = None, process_num: int = 8, mp_batch_size: int = 100000)[source][source]#
- predict_mobility(precursor_df: DataFrame, *, batch_size: int = 1024) DataFrame [source][source]#
Predict mobility (ccs_pred and mobility_pred) inplace into precursor_df.
- Parameters:
precursor_df (pd.DataFrame) – Precursor_df for CCS/mobility prediction
batch_size (int, optional) – Batch size for prediction. Defaults to 1024.
- Returns:
df with ‘ccs_pred’ and ‘mobility_pred’ columns.
- Return type:
pd.DataFrame
- predict_ms2(precursor_df: DataFrame, *, batch_size: int = 512, reference_frag_df: DataFrame = None) DataFrame [source][source]#
Predict MS2 for the given precursor_df
- Parameters:
precursor_df (pd.DataFrame) – Precursor dataframe for MS2 prediction
batch_size (int, optional) – Batch size for prediction. Defaults to 512.
reference_frag_df (pd.DataFrame, optional) – If precursor_df has ‘frag_start_idx’ pointing to reference_frag_df. Defaults to None
- Returns:
Predicted fragment intensity dataframe. If there are no such two columns in precursor_df, it will insert ‘frag_start_idx’ and frag_stop_idx in precursor_df pointing to this predicted fragment dataframe.
- Return type:
pd.DataFrame
- predict_rt(precursor_df: DataFrame, *, batch_size: int = 1024) DataFrame [source][source]#
Predict RT (‘rt_pred’) inplace into precursor_df.
- Parameters:
precursor_df (pd.DataFrame) – precursor_df for RT prediction
batch_size (int, optional) – Batch size for prediction. Defaults to 1024.
- Returns:
df with ‘rt_pred’ and ‘rt_norm_pred’ columns.
- Return type:
pd.DataFrame
- save_models(folder: str)[source][source]#
Save MS2/RT/CCS models into a folder
- Parameters:
folder (str) – folder to save
- set_default_nce_instrument(df)[source][source]#
Append ‘nce’ and ‘instrument’ columns into df with self.nce and self.instrument
- train_ccs_model(psm_df: DataFrame)[source][source]#
Train/fine-tune the CCS model. The fine-tuning will be skipped if self.psm_num_to_train_rt_ccs is zero.
- Parameters:
psm_df (pd.DataFrame) – Training psm_df which contains ‘ccs’ or ‘mobility’ column.
- train_ms2_model(psm_df: DataFrame, matched_intensity_df: DataFrame)[source][source]#
Using matched_intensity_df to train/fine-tune the ms2 model.
It will sample n=self.psm_num_to_train_ms2 PSMs into training dataframe (tr_df) to for fine-tuning.
This method will also consider some important PTMs (n=self.top_n_mods_to_train) into tr_df for fine-tuning.
If self.use_grid_nce_search==True, this method will call self.ms2_model.grid_nce_search to find the best NCE and instrument.
- Parameters:
psm_df (pd.DataFrame) – PSM dataframe for fine-tuning
matched_intensity_df (pd.DataFrame) – The matched fragment intensities for psm_df.
- peptdeep.pretrained_models.clear_error_modloss_intensities(fragment_mz_df, fragment_intensity_df)[source][source]#
- peptdeep.pretrained_models.download_models(url: str = 'https://github.com/MannLabs/alphapeptdeep/releases/download/pre-trained-models/pretrained_models.zip', overwrite=True)[source][source]#
- Parameters:
url (str, optional) – Remote or local path. Defaults to peptdeep.pretrained_models.model_url
overwrite (bool, optional) – overwirte old model files. Defaults to True.
- Raises:
FileNotFoundError – If remote url is not accessible.