peptdeep.pretrained_models

The main entry of pretrained MS2/RT/CCS models.

Classes:

ModelManager([mask_modloss, device])

The manager class to access MS2/RT/CCS models.

Functions:

clear_error_modloss_intensities(...)

count_mods(psm_df)

download_models([url, target_path, overwrite])

get_local_model_zip_name()

Get the local model zip file name dynamically from settings.

get_model_download_instructions()

Get the model download instructions dynamically from settings.

get_model_url()

Get the model URL dynamically from settings.

get_model_zip_file_path()

Get the full path to the model zip file dynamically from settings.

get_pretrain_dir()

Get the pretrained models directory path dynamically from settings.

is_model_zip(downloaded_zip)

load_models([mask_modloss])

load_models_by_model_type_in_zip(...[, ...])

load_phos_models([mask_modloss])

psm_sampling_with_important_mods(psm_df, ...)

class peptdeep.pretrained_models.ModelManager(mask_modloss: bool = False, device: str = 'gpu')[source][source]

Bases: object

The manager class to access MS2/RT/CCS models.

ms2_model

The MS2 prediction model.

Type:

peptdeep.model.ms2.pDeepModel

rt_model

The RT prediction model.

Type:

peptdeep.model.rt.AlphaRTModel

ccs_model

The CCS prediciton model.

Type:

peptdeep.model.ccs.AlphaCCSModel

psm_num_to_train_ms2

Number of PSMs to train the MS2 model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘psm_num_to_train_ms2’].

Type:

int

epoch_to_train_ms2

Number of epoches to train the MS2 model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘epoch_ms2’].

Type:

int

psm_num_to_train_rt_ccs

Number of PSMs to train RT/CCS model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘psm_num_to_train_rt_ccs’].

Type:

int

epoch_to_train_rt_ccs

Number of epoches to train RT/CCS model. Defaults to global_settings[‘model_mgr’][‘transfer’][‘epoch_rt_ccs’].

Type:

int

nce

Default NCE value for a precursor_df without the ‘nce’ column. Defaults to global_settings[‘model_mgr’][‘default_nce’].

Type:

float

instrument

Default instrument type for a precursor_df without the ‘instrument’ column. Defaults to global_settings[‘model_mgr’][‘default_instrument’].

Type:

str

If self.ms2_model uses peptdeep.model.ms2.pDeepModel.grid_nce_search() to determine optimal NCE and instrument type. This will change self.nce and self.instrument values. Defaults to global_settings[‘model_mgr’][‘transfer’][‘grid_nce_search’].

Type:

bool

Methods:

__init__([mask_modloss, device])

load_external_models(*[, ms2_model_file, ...])

Load external MS2/RT/CCS models.

load_installed_models([model_type])

Load built-in MS2/CCS/RT models.

predict_all(precursor_df, *[, ...])

Predict all items defined by predict_items, which may include rt, mobility, fragment_mz and fragment_intensity.

predict_all_mp(precursor_df, *[, ...])

predict_charge(psm_df, min_precursor_charge, ...)

Predict charge states for a given PSM dataframe by predicting the probabilities of each charge state, and including precursors with charge probabilities above the cutoff.

predict_mobility(precursor_df, *[, batch_size])

Predict mobility (ccs_pred and mobility_pred) inplace into precursor_df.

predict_ms2(precursor_df, *[, batch_size, ...])

Predict MS2 for the given precursor_df

predict_rt(precursor_df, *[, batch_size])

Predict RT ('rt_pred') inplace into precursor_df.

reinitialize_ms2_model(charged_frag_types, ...)

Reinitialize the MS2 model with new charged fragment types.

reset_by_global_settings([reload_models])

save_models(folder)

Save MS2/RT/CCS models into a folder

set_default_nce(df)

Alias for set_default_nce_instrument

set_default_nce_instrument(df)

Append 'nce' and 'instrument' columns into df with self.nce and self.instrument

train_ccs_model(psm_df)

Train/fine-tune the CCS model.

train_charge_model(psm_df)

Train/fine-tune the charge model.

train_ms2_model(psm_df, matched_intensity_df)

Using matched_intensity_df to train/fine-tune the ms2 model.

train_rt_model(psm_df)

Train/fine-tune the RT model.

Attributes:

__init__(mask_modloss: bool = False, device: str = 'gpu')[source][source]
Parameters:
  • mask_modloss (bool, optional) – If modloss ions are masked to zeros in the ms2 model. modloss ions are mostly useful for phospho MS2 prediciton model. Defaults to True.

  • device (str, optional) – Device for DL models, could be ‘gpu’ (‘cuda’) or ‘cpu’. if device==’gpu’ but no GPUs are detected, it will automatically switch to ‘cpu’. Defaults to ‘gpu’

property instrument
load_external_models(*, ms2_model_file: str | BytesIO = '', rt_model_file: str | BytesIO = '', ccs_model_file: str | BytesIO = '', charge_model_file: str | BytesIO = '')[source][source]

Load external MS2/RT/CCS models.

Parameters:
  • ms2_model_file (Tuple[str, io.BytesIO], optional) – MS2 model file or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.

  • rt_model_file (Tuple[str, io.BytesIO], optional) – RT model file or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.

  • ccs_model_file (Tuple[str, io.BytesIO], optional) – CCS model or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.

  • charge_model_file (Tuple[str, io.BytesIO], optional) – Charge model or stream. Do nothing if the value is ‘’ or None. Defaults to ‘’.

load_installed_models(model_type: str = 'generic')[source][source]

Load built-in MS2/CCS/RT models.

Parameters:

model_type (str, optional) – To load the installed MS2/RT/CCS models or phos MS2/RT/CCS models. It could be ‘digly’, ‘phospho’, ‘HLA’, or ‘generic’. Defaults to ‘generic’.

predict_all(precursor_df: DataFrame, *, predict_items: list = ['rt', 'mobility', 'ms2'], frag_types: list = None, multiprocessing: bool = True, min_required_precursor_num_for_mp: int = 3000, process_num: int = 8, mp_batch_size: int = 100000) Dict[str, DataFrame][source][source]

Predict all items defined by predict_items, which may include rt, mobility, fragment_mz and fragment_intensity.

Parameters:
  • precursor_df (pd.DataFrame) – Precursor dataframe contains sequence, mods, mod_sites, charge … columns.

  • predict_items (list, optional) – items (‘rt’, ‘mobility’, ‘ms2’) to predict. Defaults to [‘rt’ ,’mobility’ ,’ms2’].

  • frag_types (list, optional) – Fragment types to predict. If it is None, it then depends on self.ms2_model.charged_frag_types and self.ms2_model.model._mask_modloss. Defaults to None.

  • multiprocessing (bool, optional) – If use multiprocessing is gpu is not available Defaults to True.

  • process_num (int, optional) – Defaults to 4

  • min_required_precursor_num_for_mp (int, optional) – It will not use multiprocessing when the number of precursors in precursor_df is lower than this value. Defaults to 3000.

  • mp_batch_size (int, optional) – Splitting data into batches for multiprocessing. Defaults to 100000.

Returns:

{‘precursor_df’: precursor_df} and if ‘ms2’ in predict_items, it also contains: ` { 'fragment_mz_df': fragment_mz_df, 'fragment_intensity_df': fragment_intensity_df } `

Return type:

Dict[str, pd.DataFrame]

predict_all_mp(precursor_df: DataFrame, *, predict_items: list = ['rt', 'mobility', 'ms2'], frag_types: list = None, process_num: int = 8, mp_batch_size: int = 100000)[source][source]
predict_charge(psm_df: DataFrame, min_precursor_charge: int, max_precursor_charge: int, charge_prob_cutoff: float = None) DataFrame[source][source]

Predict charge states for a given PSM dataframe by predicting the probabilities of each charge state, and including precursors with charge probabilities above the cutoff.

Parameters:
  • psm_df (pd.DataFrame) – PSM dataframe to predict charge states.

  • min_precursor_charge (int) – Minimum precursor charge.

  • max_precursor_charge (int) – Maximum precursor charge.

  • charge_prob_cutoff (float) – Charge probability cutoff for including precursors set to 0.0 to predict all charges in the given range, and set to None to use the default value from the default_settings yaml.

Returns:

PSM dataframe with predicted charge states.

Return type:

pd.DataFrame

predict_mobility(precursor_df: DataFrame, *, batch_size: int = 1024) DataFrame[source][source]

Predict mobility (ccs_pred and mobility_pred) inplace into precursor_df.

Parameters:
  • precursor_df (pd.DataFrame) – Precursor_df for CCS/mobility prediction

  • batch_size (int, optional) – Batch size for prediction. Defaults to 1024.

Returns:

df with ‘ccs_pred’ and ‘mobility_pred’ columns.

Return type:

pd.DataFrame

predict_ms2(precursor_df: DataFrame, *, batch_size: int = 512, reference_frag_df: DataFrame = None) DataFrame[source][source]

Predict MS2 for the given precursor_df

Parameters:
  • precursor_df (pd.DataFrame) – Precursor dataframe for MS2 prediction

  • batch_size (int, optional) – Batch size for prediction. Defaults to 512.

  • reference_frag_df (pd.DataFrame, optional) – If precursor_df has ‘frag_start_idx’ pointing to reference_frag_df. Defaults to None

Returns:

Predicted fragment intensity dataframe. If there are no such two columns in precursor_df, it will insert ‘frag_start_idx’ and frag_stop_idx in precursor_df pointing to this predicted fragment dataframe.

Return type:

pd.DataFrame

predict_rt(precursor_df: DataFrame, *, batch_size: int = 1024) DataFrame[source][source]

Predict RT (‘rt_pred’) inplace into precursor_df.

Parameters:
  • precursor_df (pd.DataFrame) – precursor_df for RT prediction

  • batch_size (int, optional) – Batch size for prediction. Defaults to 1024.

Returns:

df with ‘rt_pred’ and ‘rt_norm_pred’ columns.

Return type:

pd.DataFrame

reinitialize_ms2_model(charged_frag_types: List[str], **kwargs)[source][source]

Reinitialize the MS2 model with new charged fragment types.

Parameters:
  • charged_frag_types (List[str]) – Charged fragment types for the new MS2 model.

  • kwargs (dict) – Other keyword arguments for pDeepModel.

reset_by_global_settings(reload_models=True)[source][source]
save_models(folder: str)[source][source]

Save MS2/RT/CCS models into a folder

Parameters:

folder (str) – folder to save

set_default_nce(df)[source][source]

Alias for set_default_nce_instrument

set_default_nce_instrument(df)[source][source]

Append ‘nce’ and ‘instrument’ columns into df with self.nce and self.instrument

train_ccs_model(psm_df: DataFrame)[source][source]

Train/fine-tune the CCS model. The fine-tuning will be skipped if self.psm_num_to_train_rt_ccs is zero.

Parameters:

psm_df (pd.DataFrame) – Training psm_df which contains ‘ccs’ or ‘mobility’ column.

train_charge_model(psm_df: DataFrame)[source][source]

Train/fine-tune the charge model.

Parameters:

psm_df (pd.DataFrame) – Training psm_df which contains ‘charge’ column.

train_ms2_model(psm_df: DataFrame, matched_intensity_df: DataFrame)[source][source]

Using matched_intensity_df to train/fine-tune the ms2 model.

  1. It will sample n=self.psm_num_to_train_ms2 PSMs into training dataframe (tr_df) to for fine-tuning.

  2. This method will also consider some important PTMs (n=self.top_n_mods_to_train) into tr_df for fine-tuning.

  3. If self.use_grid_nce_search==True, this method will call self.ms2_model.grid_nce_search to find the best NCE and instrument.

Parameters:
  • psm_df (pd.DataFrame) – PSM dataframe for fine-tuning

  • matched_intensity_df (pd.DataFrame) – The matched fragment intensities for psm_df.

train_rt_model(psm_df: DataFrame)[source][source]

Train/fine-tune the RT model. The fine-tuning will be skipped if self.psm_num_to_train_rt_ccs is zero.

Parameters:

psm_df (pd.DataFrame) – Training psm_df which contains ‘rt_norm’ column.

peptdeep.pretrained_models.clear_error_modloss_intensities(fragment_mz_df, fragment_intensity_df)[source][source]
peptdeep.pretrained_models.count_mods(psm_df) DataFrame[source][source]
peptdeep.pretrained_models.download_models(url: str = None, target_path: str = None, overwrite: bool = True)[source][source]
Parameters:
  • url (str, optional) – Remote or local path. Defaults to None, which will take the default using get_model_url()

  • target_path (str, optional) – Target file path after download. Defaults to None, which will take the default using get_model_zip_file_path()

  • overwrite (bool, optional) – overwrite old model files. Defaults to True.

Raises:

FileNotFoundError – If remote url is not accessible.

peptdeep.pretrained_models.get_local_model_zip_name() str[source][source]

Get the local model zip file name dynamically from settings.

peptdeep.pretrained_models.get_model_download_instructions() str[source][source]

Get the model download instructions dynamically from settings.

peptdeep.pretrained_models.get_model_url() str[source][source]

Get the model URL dynamically from settings.

peptdeep.pretrained_models.get_model_zip_file_path() str[source][source]

Get the full path to the model zip file dynamically from settings.

peptdeep.pretrained_models.get_pretrain_dir() str[source][source]

Get the pretrained models directory path dynamically from settings.

peptdeep.pretrained_models.is_model_zip(downloaded_zip)[source][source]
peptdeep.pretrained_models.load_models(mask_modloss=True)[source][source]
peptdeep.pretrained_models.load_models_by_model_type_in_zip(model_type_in_zip: str, mask_modloss=True)[source][source]
peptdeep.pretrained_models.load_phos_models(mask_modloss=True)[source][source]
peptdeep.pretrained_models.psm_sampling_with_important_mods(psm_df, n_sample, top_n_mods=10, n_sample_each_mod=0, uniform_sampling_column=None, random_state=1337)[source][source]