peptdeep.pretrained_models#

The main entry of pretrained MS2/RT/CCS models.

Classes:

ModelManager([mask_modloss, device])

The manager class to access MS2/RT/CCS models.

Functions:

`clear_error_modloss_intensities`(...)
`count_mods`(psm_df)
`download_models`([url, overwrite])	param url: Remote or local path.
`is_model_zip`(downloaded_zip)
`load_models`([mask_modloss])
`load_models_by_model_type_in_zip`(...[, ...])
`load_phos_models`([mask_modloss])
`psm_sampling_with_important_mods`(psm_df, ...)

class peptdeep.pretrained_models.ModelManager(mask_modloss: bool = False, device: str = 'gpu')[source][source]#

Bases: object

The manager class to access MS2/RT/CCS models.

ms2_model#

The MS2 prediction model.

Type:: peptdeep.model.ms2.pDeepModel

rt_model#

The RT prediction model.

Type:: peptdeep.model.rt.AlphaRTModel

ccs_model#

The CCS prediciton model.

Type:: peptdeep.model.ccs.AlphaCCSModel

psm_num_to_train_ms2#

Number of PSMs to train the MS2 model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘psm_num_to_train_ms2’].

Type:: int

epoch_to_train_ms2#

Number of epoches to train the MS2 model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘epoch_ms2’].

Type:: int

psm_num_to_train_rt_ccs#

Number of PSMs to train RT/CCS model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘psm_num_to_train_rt_ccs’].

Type:: int

epoch_to_train_rt_ccs#

Number of epoches to train RT/CCS model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘epoch_rt_ccs’].

Type:: int

nce#

Default NCE value for a precursor_df without the ‘nce’ column. Defaults to global_settings[‘model_mgr’][‘default_nce’].

Type:: float

instrument#

Default instrument type for a precursor_df without the ‘instrument’ column. Defaults to global_settings[‘model_mgr’][‘default_instrument’].

Type:: str

use_grid_nce_search#

If self.ms2_model uses peptdeep.model.ms2.pDeepModel.grid_nce_search() to determine optimal NCE and instrument type. This will change self.nce and self.instrument values. Defaults to global_settings[‘model_mgr’][‘transfer’][‘grid_nce_search’].

Type:: bool

Methods:

`__init__`([mask_modloss, device])	param mask_modloss: If modloss ions are masked to zeros in the ms2 model. modloss
`load_external_models`(*[, ms2_model_file, ...])	Load external MS2/RT/CCS models.
`load_installed_models`([model_type])	Load built-in MS2/CCS/RT models.
`predict_all`(precursor_df, *[, ...])	Predict all items defined by predict_items, which may include rt, mobility, fragment_mz and fragment_intensity.
`predict_all_mp`(precursor_df, *[, ...])
`predict_mobility`(precursor_df, *[, batch_size])	Predict mobility (ccs_pred and mobility_pred) inplace into precursor_df.
`predict_ms2`(precursor_df, *[, batch_size, ...])	Predict MS2 for the given precursor_df
`predict_rt`(precursor_df, *[, batch_size])	Predict RT ('rt_pred') inplace into precursor_df.
`reset_by_global_settings`([reload_models])
`save_models`(folder)	Save MS2/RT/CCS models into a folder
`set_default_nce`(df)	Alias for set_default_nce_instrument
`set_default_nce_instrument`(df)	Append 'nce' and 'instrument' columns into df with self.nce and self.instrument
`train_ccs_model`(psm_df)	Train/fine-tune the CCS model.
`train_ms2_model`(psm_df, matched_intensity_df)	Using matched_intensity_df to train/fine-tune the ms2 model.
`train_rt_model`(psm_df)	Train/fine-tune the RT model.

Attributes:

instrument

__init__(mask_modloss: bool = False, device: str = 'gpu')[source][source]#

Parameters:

mask_modloss (bool, optional) – If modloss ions are masked to zeros in the ms2 model. modloss ions are mostly useful for phospho MS2 prediciton model. Defaults to True.
device (str, optional) – Device for DL models, could be ‘gpu’ (‘cuda’) or ‘cpu’. if device==’gpu’ but no GPUs are detected, it will automatically switch to ‘cpu’. Defaults to ‘gpu’

property instrument#

load_external_models(*, ms2_model_file: str | BytesIO = '', rt_model_file: str | BytesIO = '', ccs_model_file: str | BytesIO = '')[source][source]#

Load external MS2/RT/CCS models.

Parameters:

ms2_model_file (Tuple[str, io.BytesIO], optional) – MS2 model file or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.
rt_model_file (Tuple[str, io.BytesIO], optional) – RT model file or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.
ccs_model_file (Tuple[str, io.BytesIO], optional) – CCS model or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.

load_installed_models(model_type: str = 'generic')[source][source]#

Load built-in MS2/CCS/RT models.

Parameters:: model_type (str, optional) – To load the installed MS2/RT/CCS models or phos MS2/RT/CCS models. It could be ‘digly’, ‘phospho’, ‘HLA’, or ‘generic’. Defaults to ‘generic’.

predict_all(precursor_df: DataFrame, *, predict_items: list = ['rt', 'mobility', 'ms2'], frag_types: list = None, multiprocessing: bool = True, min_required_precursor_num_for_mp: int = 3000, process_num: int = 8, mp_batch_size: int = 100000) → Dict[str, DataFrame][source][source]#

Predict all items defined by predict_items, which may include rt, mobility, fragment_mz and fragment_intensity.

Parameters:

precursor_df (pd.DataFrame) – Precursor dataframe contains sequence, mods, mod_sites, charge … columns.
predict_items (list, optional) – items (‘rt’, ‘mobility’, ‘ms2’) to predict. Defaults to [‘rt’ ,’mobility’ ,’ms2’].
frag_types (list, optional) – Fragment types to predict. If it is None, it then depends on self.ms2_model.charged_frag_types and self.ms2_model.model._mask_modloss. Defaults to None.
multiprocessing (bool, optional) – If use multiprocessing is gpu is not available Defaults to True.
process_num (int, optional) – Defaults to 4
min_required_precursor_num_for_mp (int, optional) – It will not use multiprocessing when the number of precursors in precursor_df is lower than this value. Defaults to 3000.
mp_batch_size (int, optional) – Splitting data into batches for multiprocessing. Defaults to 100000.

Returns:

{‘precursor_df’: precursor_df} and if ‘ms2’ in predict_items, it also contains: ` { 'fragment_mz_df': fragment_mz_df, 'fragment_intensity_df': fragment_intensity_df } `

Return type:

Dict[str, pd.DataFrame]

predict_all_mp(precursor_df: DataFrame, *, predict_items: list = ['rt', 'mobility', 'ms2'], frag_types: list = None, process_num: int = 8, mp_batch_size: int = 100000)[source][source]#

predict_mobility(precursor_df: DataFrame, *, batch_size: int = 1024) → DataFrame[source][source]#

Predict mobility (ccs_pred and mobility_pred) inplace into precursor_df.

Parameters:

precursor_df (pd.DataFrame) – Precursor_df for CCS/mobility prediction
batch_size (int, optional) – Batch size for prediction. Defaults to 1024.

Returns:

df with ‘ccs_pred’ and ‘mobility_pred’ columns.

Return type:

pd.DataFrame

predict_ms2(precursor_df: DataFrame, *, batch_size: int = 512, reference_frag_df: DataFrame = None) → DataFrame[source][source]#

Predict MS2 for the given precursor_df

Parameters:

precursor_df (pd.DataFrame) – Precursor dataframe for MS2 prediction
batch_size (int, optional) – Batch size for prediction. Defaults to 512.
reference_frag_df (pd.DataFrame, optional) – If precursor_df has ‘frag_start_idx’ pointing to reference_frag_df. Defaults to None

Returns:

Predicted fragment intensity dataframe. If there are no such two columns in precursor_df, it will insert ‘frag_start_idx’ and frag_stop_idx in precursor_df pointing to this predicted fragment dataframe.

Return type:

pd.DataFrame

predict_rt(precursor_df: DataFrame, *, batch_size: int = 1024) → DataFrame[source][source]#

Predict RT (‘rt_pred’) inplace into precursor_df.

Parameters:

precursor_df (pd.DataFrame) – precursor_df for RT prediction
batch_size (int, optional) – Batch size for prediction. Defaults to 1024.

Returns:

df with ‘rt_pred’ and ‘rt_norm_pred’ columns.

Return type:

pd.DataFrame

reset_by_global_settings(reload_models=True)[source][source]#

save_models(folder: str)[source][source]#

Save MS2/RT/CCS models into a folder

Parameters:: folder (str) – folder to save

set_default_nce(df)[source][source]#: Alias for set_default_nce_instrument

set_default_nce_instrument(df)[source][source]#: Append ‘nce’ and ‘instrument’ columns into df with self.nce and self.instrument

train_ccs_model(psm_df: DataFrame)[source][source]#

Train/fine-tune the CCS model. The fine-tuning will be skipped if self.psm_num_to_train_rt_ccs is zero.

Parameters:: psm_df (pd.DataFrame) – Training psm_df which contains ‘ccs’ or ‘mobility’ column.

train_ms2_model(psm_df: DataFrame, matched_intensity_df: DataFrame)[source][source]#

Using matched_intensity_df to train/fine-tune the ms2 model.

It will sample n=self.psm_num_to_train_ms2 PSMs into training dataframe (tr_df) to for fine-tuning.
This method will also consider some important PTMs (n=self.top_n_mods_to_train) into tr_df for fine-tuning.
If self.use_grid_nce_search==True, this method will call self.ms2_model.grid_nce_search to find the best NCE and instrument.

Parameters:

psm_df (pd.DataFrame) – PSM dataframe for fine-tuning
matched_intensity_df (pd.DataFrame) – The matched fragment intensities for psm_df.

train_rt_model(psm_df: DataFrame)[source][source]#

Train/fine-tune the RT model. The fine-tuning will be skipped if self.psm_num_to_train_rt_ccs is zero.

Parameters:: psm_df (pd.DataFrame) – Training psm_df which contains ‘rt_norm’ column.

peptdeep.pretrained_models.clear_error_modloss_intensities(fragment_mz_df, fragment_intensity_df)[source][source]#

peptdeep.pretrained_models.count_mods(psm_df) → DataFrame[source][source]#

peptdeep.pretrained_models.download_models(url: str = 'https://github.com/MannLabs/alphapeptdeep/releases/download/pre-trained-models/pretrained_models.zip', overwrite=True)[source][source]#

Parameters:

url (str, optional) – Remote or local path. Defaults to peptdeep.pretrained_models.model_url
overwrite (bool, optional) – overwirte old model files. Defaults to True.

Raises:

FileNotFoundError – If remote url is not accessible.

peptdeep.pretrained_models.is_model_zip(downloaded_zip)[source][source]#

peptdeep.pretrained_models.load_models(mask_modloss=True)[source][source]#

peptdeep.pretrained_models.load_models_by_model_type_in_zip(model_type_in_zip: str, mask_modloss=True)[source][source]#

peptdeep.pretrained_models.load_phos_models(mask_modloss=True)[source][source]#

peptdeep.pretrained_models.psm_sampling_with_important_mods(psm_df, n_sample, top_n_mods=10, n_sample_each_mod=0, uniform_sampling_column=None, random_state=1337)[source][source]#