peptdeep.pretrained_models#

The main entry of pretrained MS2/RT/CCS models.

Classes:

ModelManager([mask_modloss, device])

The manager class to access MS2/RT/CCS models.

Functions:

clear_error_modloss_intensities(...)

count_mods(psm_df)

download_models([url, overwrite])

param url:

Remote or local path.

is_model_zip(downloaded_zip)

load_models([mask_modloss])

load_models_by_model_type_in_zip(...[, ...])

load_phos_models([mask_modloss])

psm_sampling_with_important_mods(psm_df, ...)

class peptdeep.pretrained_models.ModelManager(mask_modloss: bool = False, device: str = 'gpu')[source][source]#

Bases: object

The manager class to access MS2/RT/CCS models.

ms2_model#

The MS2 prediction model.

Type:

peptdeep.model.ms2.pDeepModel

rt_model#

The RT prediction model.

Type:

peptdeep.model.rt.AlphaRTModel

ccs_model#

The CCS prediciton model.

Type:

peptdeep.model.ccs.AlphaCCSModel

psm_num_to_train_ms2#

Number of PSMs to train the MS2 model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘psm_num_to_train_ms2’].

Type:

int

epoch_to_train_ms2#

Number of epoches to train the MS2 model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘epoch_ms2’].

Type:

int

psm_num_to_train_rt_ccs#

Number of PSMs to train RT/CCS model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘psm_num_to_train_rt_ccs’].

Type:

int

epoch_to_train_rt_ccs#

Number of epoches to train RT/CCS model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘epoch_rt_ccs’].

Type:

int

nce#

Default NCE value for a precursor_df without the ‘nce’ column. Defaults to global_settings[‘model_mgr’][‘default_nce’].

Type:

float

instrument#

Default instrument type for a precursor_df without the ‘instrument’ column. Defaults to global_settings[‘model_mgr’][‘default_instrument’].

Type:

str

If self.ms2_model uses peptdeep.model.ms2.pDeepModel.grid_nce_search() to determine optimal NCE and instrument type. This will change self.nce and self.instrument values. Defaults to global_settings[‘model_mgr’][‘transfer’][‘grid_nce_search’].

Type:

bool

Methods:

__init__([mask_modloss, device])

param mask_modloss:

If modloss ions are masked to zeros in the ms2 model. modloss

load_external_models(*[, ms2_model_file, ...])

Load external MS2/RT/CCS models.

load_installed_models([model_type])

Load built-in MS2/CCS/RT models.

predict_all(precursor_df, *[, ...])

Predict all items defined by predict_items, which may include rt, mobility, fragment_mz and fragment_intensity.

predict_all_mp(precursor_df, *[, ...])

predict_mobility(precursor_df, *[, batch_size])

Predict mobility (ccs_pred and mobility_pred) inplace into precursor_df.

predict_ms2(precursor_df, *[, batch_size, ...])

Predict MS2 for the given precursor_df

predict_rt(precursor_df, *[, batch_size])

Predict RT ('rt_pred') inplace into precursor_df.

reset_by_global_settings([reload_models])

save_models(folder)

Save MS2/RT/CCS models into a folder

set_default_nce(df)

Alias for set_default_nce_instrument

set_default_nce_instrument(df)

Append 'nce' and 'instrument' columns into df with self.nce and self.instrument

train_ccs_model(psm_df)

Train/fine-tune the CCS model.

train_ms2_model(psm_df, matched_intensity_df)

Using matched_intensity_df to train/fine-tune the ms2 model.

train_rt_model(psm_df)

Train/fine-tune the RT model.

Attributes:

__init__(mask_modloss: bool = False, device: str = 'gpu')[source][source]#
Parameters:
  • mask_modloss (bool, optional) – If modloss ions are masked to zeros in the ms2 model. modloss ions are mostly useful for phospho MS2 prediciton model. Defaults to True.

  • device (str, optional) – Device for DL models, could be ‘gpu’ (‘cuda’) or ‘cpu’. if device==’gpu’ but no GPUs are detected, it will automatically switch to ‘cpu’. Defaults to ‘gpu’

property instrument#
load_external_models(*, ms2_model_file: str | BytesIO = '', rt_model_file: str | BytesIO = '', ccs_model_file: str | BytesIO = '')[source][source]#

Load external MS2/RT/CCS models.

Parameters:
  • ms2_model_file (Tuple[str, io.BytesIO], optional) – MS2 model file or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.

  • rt_model_file (Tuple[str, io.BytesIO], optional) – RT model file or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.

  • ccs_model_file (Tuple[str, io.BytesIO], optional) – CCS model or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.

load_installed_models(model_type: str = 'generic')[source][source]#

Load built-in MS2/CCS/RT models.

Parameters:

model_type (str, optional) – To load the installed MS2/RT/CCS models or phos MS2/RT/CCS models. It could be ‘digly’, ‘phospho’, ‘HLA’, or ‘generic’. Defaults to ‘generic’.

predict_all(precursor_df: DataFrame, *, predict_items: list = ['rt', 'mobility', 'ms2'], frag_types: list = None, multiprocessing: bool = True, min_required_precursor_num_for_mp: int = 3000, process_num: int = 8, mp_batch_size: int = 100000) Dict[str, DataFrame][source][source]#

Predict all items defined by predict_items, which may include rt, mobility, fragment_mz and fragment_intensity.

Parameters:
  • precursor_df (pd.DataFrame) – Precursor dataframe contains sequence, mods, mod_sites, charge … columns.

  • predict_items (list, optional) – items (‘rt’, ‘mobility’, ‘ms2’) to predict. Defaults to [‘rt’ ,’mobility’ ,’ms2’].

  • frag_types (list, optional) – Fragment types to predict. If it is None, it then depends on self.ms2_model.charged_frag_types and self.ms2_model.model._mask_modloss. Defaults to None.

  • multiprocessing (bool, optional) – If use multiprocessing is gpu is not available Defaults to True.

  • process_num (int, optional) – Defaults to 4

  • min_required_precursor_num_for_mp (int, optional) – It will not use multiprocessing when the number of precursors in precursor_df is lower than this value. Defaults to 3000.

  • mp_batch_size (int, optional) – Splitting data into batches for multiprocessing. Defaults to 100000.

Returns:

{‘precursor_df’: precursor_df} and if ‘ms2’ in predict_items, it also contains: ` { 'fragment_mz_df': fragment_mz_df, 'fragment_intensity_df': fragment_intensity_df } `

Return type:

Dict[str, pd.DataFrame]

predict_all_mp(precursor_df: DataFrame, *, predict_items: list = ['rt', 'mobility', 'ms2'], frag_types: list = None, process_num: int = 8, mp_batch_size: int = 100000)[source][source]#
predict_mobility(precursor_df: DataFrame, *, batch_size: int = 1024) DataFrame[source][source]#

Predict mobility (ccs_pred and mobility_pred) inplace into precursor_df.

Parameters:
  • precursor_df (pd.DataFrame) – Precursor_df for CCS/mobility prediction

  • batch_size (int, optional) – Batch size for prediction. Defaults to 1024.

Returns:

df with ‘ccs_pred’ and ‘mobility_pred’ columns.

Return type:

pd.DataFrame

predict_ms2(precursor_df: DataFrame, *, batch_size: int = 512, reference_frag_df: DataFrame = None) DataFrame[source][source]#

Predict MS2 for the given precursor_df

Parameters:
  • precursor_df (pd.DataFrame) – Precursor dataframe for MS2 prediction

  • batch_size (int, optional) – Batch size for prediction. Defaults to 512.

  • reference_frag_df (pd.DataFrame, optional) – If precursor_df has ‘frag_start_idx’ pointing to reference_frag_df. Defaults to None

Returns:

Predicted fragment intensity dataframe. If there are no such two columns in precursor_df, it will insert ‘frag_start_idx’ and frag_stop_idx in precursor_df pointing to this predicted fragment dataframe.

Return type:

pd.DataFrame

predict_rt(precursor_df: DataFrame, *, batch_size: int = 1024) DataFrame[source][source]#

Predict RT (‘rt_pred’) inplace into precursor_df.

Parameters:
  • precursor_df (pd.DataFrame) – precursor_df for RT prediction

  • batch_size (int, optional) – Batch size for prediction. Defaults to 1024.

Returns:

df with ‘rt_pred’ and ‘rt_norm_pred’ columns.

Return type:

pd.DataFrame

reset_by_global_settings(reload_models=True)[source][source]#
save_models(folder: str)[source][source]#

Save MS2/RT/CCS models into a folder

Parameters:

folder (str) – folder to save

set_default_nce(df)[source][source]#

Alias for set_default_nce_instrument

set_default_nce_instrument(df)[source][source]#

Append ‘nce’ and ‘instrument’ columns into df with self.nce and self.instrument

train_ccs_model(psm_df: DataFrame)[source][source]#

Train/fine-tune the CCS model. The fine-tuning will be skipped if self.psm_num_to_train_rt_ccs is zero.

Parameters:

psm_df (pd.DataFrame) – Training psm_df which contains ‘ccs’ or ‘mobility’ column.

train_ms2_model(psm_df: DataFrame, matched_intensity_df: DataFrame)[source][source]#

Using matched_intensity_df to train/fine-tune the ms2 model.

  1. It will sample n=self.psm_num_to_train_ms2 PSMs into training dataframe (tr_df) to for fine-tuning.

  2. This method will also consider some important PTMs (n=self.top_n_mods_to_train) into tr_df for fine-tuning.

  3. If self.use_grid_nce_search==True, this method will call self.ms2_model.grid_nce_search to find the best NCE and instrument.

Parameters:
  • psm_df (pd.DataFrame) – PSM dataframe for fine-tuning

  • matched_intensity_df (pd.DataFrame) – The matched fragment intensities for psm_df.

train_rt_model(psm_df: DataFrame)[source][source]#

Train/fine-tune the RT model. The fine-tuning will be skipped if self.psm_num_to_train_rt_ccs is zero.

Parameters:

psm_df (pd.DataFrame) – Training psm_df which contains ‘rt_norm’ column.

peptdeep.pretrained_models.clear_error_modloss_intensities(fragment_mz_df, fragment_intensity_df)[source][source]#
peptdeep.pretrained_models.count_mods(psm_df) DataFrame[source][source]#
peptdeep.pretrained_models.download_models(url: str = 'https://github.com/MannLabs/alphapeptdeep/releases/download/pre-trained-models/pretrained_models.zip', overwrite=True)[source][source]#
Parameters:
  • url (str, optional) – Remote or local path. Defaults to peptdeep.pretrained_models.model_url

  • overwrite (bool, optional) – overwirte old model files. Defaults to True.

Raises:

FileNotFoundError – If remote url is not accessible.

peptdeep.pretrained_models.is_model_zip(downloaded_zip)[source][source]#
peptdeep.pretrained_models.load_models(mask_modloss=True)[source][source]#
peptdeep.pretrained_models.load_models_by_model_type_in_zip(model_type_in_zip: str, mask_modloss=True)[source][source]#
peptdeep.pretrained_models.load_phos_models(mask_modloss=True)[source][source]#
peptdeep.pretrained_models.psm_sampling_with_important_mods(psm_df, n_sample, top_n_mods=10, n_sample_each_mod=0, uniform_sampling_column=None, random_state=1337)[source][source]#