peptdeep.pipeline_api¶
High-level APIs for:
transfer learing (
peptdeep.pipeline_api.transfer_learn())library prediction (
peptdeep.pipeline_api.generate_library())rescoring (
peptdeep.pipeline_api.rescore())more will be added
All the default parameters of these functionalities
are controlled by the peptdeep.settings.global_settings.
Functions:
Generate/predict a spectral library. |
|
|
Compute median PCC between fragment intensities across scans. |
|
Import PSM files of a search engine as a pd.DataFrame |
Match the PSMs against the MS files. |
|
|
Transfer learn / refine the RT/CCS(/MS2) models. |
- peptdeep.pipeline_api.generate_library()[source][source]¶
Generate/predict a spectral library.
Required information in global_settings:
`python lib_settings = global_settings['library'] output_folder = lib_settings['output_folder'] # str. Output folder of the library lib_settings['infile_type'] # str. Input type for the library, could be 'fasta', 'sequence', 'peptide', or 'precursor' lib_settings['infiles'] # list of str. Input files to generate librarys lib_settings['output_tsv']['enabled'] # bool. If output tsv for diann/spectronaut `:raises Exception: Any kinds of exception if the pipeline fails.
- peptdeep.pipeline_api.get_median_pccs_for_dia_psms(psm_match: PepSpecMatch_DIA, psm_df: DataFrame, fragment_mz_df: DataFrame, fragment_intensity_df: DataFrame)[source][source]¶
Compute median PCC between fragment intensities across scans.
- Parameters:
psm_match (PepSpecMatch_DIA) – The matcher object containing psm_df with replicated PSMs and matching parameters (max_spec_per_query, min_frag_mz).
psm_df (pd.DataFrame) – PSM dataframe.
fragment_mz_df (pd.DataFrame) – Fragment m/z values, indexed by frag_start_idx/frag_stop_idx in psm_df.
fragment_intensity_df (pd.DataFrame) – Matched fragment intensities from MS2 scans, same structure as fragment_mz_df.
- Returns:
Median PCC values for each PSM, in the same order as psm_df.
- Return type:
np.ndarray
Notes
The psm_df contains max_spec_per_query copies per peptide, each matched against a different MS2 scan. The PSMs need to be sorted by spec index before processing because peptdeep.match.psm_match.PepSpecMatch.match_ms2_multi_raw orders them by raw_name when multiple MS files are used.
- See also (peptdeep.match.psm_match):
PepSpecMatch_DIA._prepare_matching_dfs: creates replicated PSM/fragment structure
PepSpecMatch_DIA._match_ms2_one_raw_numba: finds nearby MS2 scans and extracts fragments
PepSpecMatch.match_ms2_multi_raw: processes multiple MS files (reorders by raw_name)
- peptdeep.pipeline_api.import_psm_df(psm_files: list, psm_type: str) DataFrame[source][source]¶
Import PSM files of a search engine as a pd.DataFrame
- Parameters:
psm_files (list) – List[str]. PSM file paths
psm_type (str) – PSM type or search engine name/type
- Returns:
DataFrame that contains all PSM information
- Return type:
pd.DataFrame
- peptdeep.pipeline_api.match_psms() Tuple[DataFrame, DataFrame, DataFrame][source][source]¶
Match the PSMs against the MS files.
All required information is in global_settings:
` mgr_settings = global_settings['model_mgr'] mgr_settings['transfer']['psm_files'] # list. PSM file paths mgr_settings['transfer']['psm_type'] # str. PSM type or earch engine type mgr_settings['transfer']['ms_files'] # list. MS files or RAW files mgr_settings['transfer']['ms_file_type'] # str. MS file type global_settings['model']['frag_types'] # list. Fragment types to be considered, e.g. b_z1, y_modloss_z2 ... global_settings['model']['max_frag_charge'] # int. Max fragment charge to be considered global_settings['peak_matching']['ms2_ppm'] # bool. If use ppm as MS2 tolerance global_settings['peak_matching']['ms2_tol_value'] # float. MS2 tolerance value `- Returns:
pd.DataFrame: the PSM DataFrame pd.DataFrame: the fragment mz DataFrame pd.DataFrame: the matched fragment intensity DataFrame
- Return type:
Tuple[pd.DataFrame,pd.DataFrame,pd.DataFrame]
- peptdeep.pipeline_api.transfer_learn(verbose=True)[source][source]¶
Transfer learn / refine the RT/CCS(/MS2) models.
Required information in global_settings:
`python mgr_settings = global_settings['model_mgr'] mgr_settings['transfer']['verbose'] = verbose # bool global_settings['PEPTDEEP_HOME'] # str. The folder to store all refined models. By default "~/peptdeep". `For transfer learning of MS2 model, the required information:`python mgr_settings['transfer']['psm_files'] # list. PSM file paths mgr_settings['transfer']['psm_type'] # str. PSM type or earch engine type mgr_settings['transfer']['ms_files'] # list. MS files or RAW files mgr_settings['transfer']['ms_file_type'] # str. MS file type global_settings['model']['frag_types'] # list. Fragment types to be considered, e.g. b_z1, y_modloss_z2 ... global_settings['model']['max_frag_charge'] # int. Max fragment charge to be considered global_settings['peak_matching']['ms2_ppm'] # bool. If use ppm as MS2 tolerance global_settings['peak_matching']['ms2_tol_value'] # float. MS2 tolerance value `- Parameters:
verbose (bool) – Print the training details. Optional, default True
- Raises:
Exception – Any kinds of exception if the pipeline fails.