peptdeep.pipeline_api¶

High-level APIs for:

transfer learing (peptdeep.pipeline_api.transfer_learn())
library prediction (peptdeep.pipeline_api.generate_library())
rescoring (peptdeep.pipeline_api.rescore())
more will be added

All the default parameters of these functionalities are controlled by the peptdeep.settings.global_settings.

Functions:

`generate_library`()	Generate/predict a spectral library.
`get_median_pccs_for_dia_psms`(psm_match, ...)	Compute median PCC between fragment intensities across scans.
`import_psm_df`(psm_files, psm_type)	Import PSM files of a search engine as a pd.DataFrame
`match_psms`()	Match the PSMs against the MS files.
`transfer_learn`([verbose])	Transfer learn / refine the RT/CCS(/MS2) models.

peptdeep.pipeline_api.generate_library()[source][source]¶

Generate/predict a spectral library.

Required information in global_settings:

`python lib_settings = global_settings['library'] output_folder = lib_settings['output_folder'] # str. Output folder of the library lib_settings['infile_type'] # str. Input type for the library, could be 'fasta', 'sequence', 'peptide', or 'precursor' lib_settings['infiles'] # list of str. Input files to generate librarys lib_settings['output_tsv']['enabled'] # bool. If output tsv for diann/spectronaut ` :raises Exception: Any kinds of exception if the pipeline fails.

peptdeep.pipeline_api.get_median_pccs_for_dia_psms(psm_match: PepSpecMatch_DIA, psm_df: DataFrame, fragment_mz_df: DataFrame, fragment_intensity_df: DataFrame)[source][source]¶

Compute median PCC between fragment intensities across scans.

Parameters:

psm_match (PepSpecMatch_DIA) – The matcher object containing psm_df with replicated PSMs and matching parameters (max_spec_per_query, min_frag_mz).
psm_df (pd.DataFrame) – PSM dataframe.
fragment_mz_df (pd.DataFrame) – Fragment m/z values, indexed by frag_start_idx/frag_stop_idx in psm_df.
fragment_intensity_df (pd.DataFrame) – Matched fragment intensities from MS2 scans, same structure as fragment_mz_df.

Returns:

Median PCC values for each PSM, in the same order as psm_df.

Return type:

np.ndarray

Notes

The psm_df contains max_spec_per_query copies per peptide, each matched against a different MS2 scan. The PSMs need to be sorted by spec index before processing because peptdeep.match.psm_match.PepSpecMatch.match_ms2_multi_raw orders them by raw_name when multiple MS files are used.

See also (peptdeep.match.psm_match):

PepSpecMatch_DIA._prepare_matching_dfs: creates replicated PSM/fragment structure
PepSpecMatch_DIA._match_ms2_one_raw_numba: finds nearby MS2 scans and extracts fragments
PepSpecMatch.match_ms2_multi_raw: processes multiple MS files (reorders by raw_name)

peptdeep.pipeline_api.import_psm_df(psm_files: list, psm_type: str) → DataFrame[source][source]¶

Import PSM files of a search engine as a pd.DataFrame

Parameters:

psm_files (list) – List[str]. PSM file paths
psm_type (str) – PSM type or search engine name/type

Returns:

DataFrame that contains all PSM information

Return type:

pd.DataFrame

peptdeep.pipeline_api.match_psms() → Tuple[DataFrame, DataFrame, DataFrame][source][source]¶

Match the PSMs against the MS files.

All required information is in global_settings: ` mgr_settings = global_settings['model_mgr'] mgr_settings['transfer']['psm_files'] # list. PSM file paths mgr_settings['transfer']['psm_type'] # str. PSM type or earch engine type mgr_settings['transfer']['ms_files'] # list. MS files or RAW files mgr_settings['transfer']['ms_file_type'] # str. MS file type global_settings['model']['frag_types'] # list. Fragment types to be considered, e.g. b_z1, y_modloss_z2 ... global_settings['model']['max_frag_charge'] # int. Max fragment charge to be considered global_settings['peak_matching']['ms2_ppm'] # bool. If use ppm as MS2 tolerance global_settings['peak_matching']['ms2_tol_value'] # float. MS2 tolerance value `

Returns:: pd.DataFrame: the PSM DataFrame pd.DataFrame: the fragment mz DataFrame pd.DataFrame: the matched fragment intensity DataFrame
Return type:: Tuple[pd.DataFrame,pd.DataFrame,pd.DataFrame]

peptdeep.pipeline_api.transfer_learn(verbose=True)[source][source]¶

Transfer learn / refine the RT/CCS(/MS2) models.

Required information in global_settings:

`python mgr_settings = global_settings['model_mgr'] mgr_settings['transfer']['verbose'] = verbose # bool global_settings['PEPTDEEP_HOME'] # str. The folder to store all refined models. By default "~/peptdeep". ` For transfer learning of MS2 model, the required information:

`python mgr_settings['transfer']['psm_files'] # list. PSM file paths mgr_settings['transfer']['psm_type'] # str. PSM type or earch engine type mgr_settings['transfer']['ms_files'] # list. MS files or RAW files mgr_settings['transfer']['ms_file_type'] # str. MS file type global_settings['model']['frag_types'] # list. Fragment types to be considered, e.g. b_z1, y_modloss_z2 ... global_settings['model']['max_frag_charge'] # int. Max fragment charge to be considered global_settings['peak_matching']['ms2_ppm'] # bool. If use ppm as MS2 tolerance global_settings['peak_matching']['ms2_tol_value'] # float. MS2 tolerance value `

Parameters:: verbose (bool) – Print the training details. Optional, default True
Raises:: Exception – Any kinds of exception if the pipeline fails.