peptdeep.pipeline_api

High-level APIs for:

All the default parameters of these functionalities are controlled by the peptdeep.settings.global_settings.

Functions:

generate_library()

Generate/predict a spectral library.

get_median_pccs_for_dia_psms(psm_match, ...)

Compute median PCC between fragment intensities across scans.

import_psm_df(psm_files, psm_type)

Import PSM files of a search engine as a pd.DataFrame

match_psms()

Match the PSMs against the MS files.

transfer_learn([verbose])

Transfer learn / refine the RT/CCS(/MS2) models.

peptdeep.pipeline_api.generate_library()[source][source]

Generate/predict a spectral library.

Required information in global_settings:

`python lib_settings = global_settings['library'] output_folder = lib_settings['output_folder'] # str. Output folder of the library lib_settings['infile_type'] # str. Input type for the library, could be 'fasta', 'sequence', 'peptide', or 'precursor' lib_settings['infiles'] # list of str. Input files to generate librarys lib_settings['output_tsv']['enabled'] # bool. If output tsv for diann/spectronaut ` :raises Exception: Any kinds of exception if the pipeline fails.

peptdeep.pipeline_api.get_median_pccs_for_dia_psms(psm_match: PepSpecMatch_DIA, psm_df: DataFrame, fragment_mz_df: DataFrame, fragment_intensity_df: DataFrame)[source][source]

Compute median PCC between fragment intensities across scans.

Parameters:
  • psm_match (PepSpecMatch_DIA) – The matcher object containing psm_df with replicated PSMs and matching parameters (max_spec_per_query, min_frag_mz).

  • psm_df (pd.DataFrame) – PSM dataframe.

  • fragment_mz_df (pd.DataFrame) – Fragment m/z values, indexed by frag_start_idx/frag_stop_idx in psm_df.

  • fragment_intensity_df (pd.DataFrame) – Matched fragment intensities from MS2 scans, same structure as fragment_mz_df.

Returns:

Median PCC values for each PSM, in the same order as psm_df.

Return type:

np.ndarray

Notes

The psm_df contains max_spec_per_query copies per peptide, each matched against a different MS2 scan. The PSMs need to be sorted by spec index before processing because peptdeep.match.psm_match.PepSpecMatch.match_ms2_multi_raw orders them by raw_name when multiple MS files are used.

See also (peptdeep.match.psm_match):
  • PepSpecMatch_DIA._prepare_matching_dfs: creates replicated PSM/fragment structure

  • PepSpecMatch_DIA._match_ms2_one_raw_numba: finds nearby MS2 scans and extracts fragments

  • PepSpecMatch.match_ms2_multi_raw: processes multiple MS files (reorders by raw_name)

peptdeep.pipeline_api.import_psm_df(psm_files: list, psm_type: str) DataFrame[source][source]

Import PSM files of a search engine as a pd.DataFrame

Parameters:
  • psm_files (list) – List[str]. PSM file paths

  • psm_type (str) – PSM type or search engine name/type

Returns:

DataFrame that contains all PSM information

Return type:

pd.DataFrame

peptdeep.pipeline_api.match_psms() Tuple[DataFrame, DataFrame, DataFrame][source][source]

Match the PSMs against the MS files.

All required information is in global_settings: ` mgr_settings = global_settings['model_mgr'] mgr_settings['transfer']['psm_files'] # list. PSM file paths mgr_settings['transfer']['psm_type'] # str. PSM type or earch engine type mgr_settings['transfer']['ms_files'] # list. MS files or RAW files mgr_settings['transfer']['ms_file_type'] # str. MS file type global_settings['model']['frag_types'] # list. Fragment types to be considered, e.g. b_z1, y_modloss_z2 ... global_settings['model']['max_frag_charge'] # int. Max fragment charge to be considered global_settings['peak_matching']['ms2_ppm'] # bool. If use ppm as MS2 tolerance global_settings['peak_matching']['ms2_tol_value'] # float. MS2 tolerance value `

Returns:

pd.DataFrame: the PSM DataFrame pd.DataFrame: the fragment mz DataFrame pd.DataFrame: the matched fragment intensity DataFrame

Return type:

Tuple[pd.DataFrame,pd.DataFrame,pd.DataFrame]

peptdeep.pipeline_api.transfer_learn(verbose=True)[source][source]

Transfer learn / refine the RT/CCS(/MS2) models.

Required information in global_settings:

`python mgr_settings = global_settings['model_mgr'] mgr_settings['transfer']['verbose'] = verbose # bool global_settings['PEPTDEEP_HOME'] # str. The folder to store all refined models. By default "~/peptdeep". ` For transfer learning of MS2 model, the required information:

`python mgr_settings['transfer']['psm_files'] # list. PSM file paths mgr_settings['transfer']['psm_type'] # str. PSM type or earch engine type mgr_settings['transfer']['ms_files'] # list. MS files or RAW files mgr_settings['transfer']['ms_file_type'] # str. MS file type global_settings['model']['frag_types'] # list. Fragment types to be considered, e.g. b_z1, y_modloss_z2 ... global_settings['model']['max_frag_charge'] # int. Max fragment charge to be considered global_settings['peak_matching']['ms2_ppm'] # bool. If use ppm as MS2 tolerance global_settings['peak_matching']['ms2_tol_value'] # float. MS2 tolerance value `

Parameters:

verbose (bool) – Print the training details. Optional, default True

Raises:

Exception – Any kinds of exception if the pipeline fails.