peptdeep.rescore.feature_extractor

Classes:

ScoreFeatureExtractor(model_mgr)

ScoreFeatureExtractor: Feature extractor for percolator

ScoreFeatureExtractorMP(model_mgr)

Functions:

get_ms2_features(psm_df, frag_types, ...)

Extract ms2 features from the given predict_intensity_df and matched_intensity_df.

get_ms2_features_mp(args)

get_psm_scores(psm_df, predict_intensity_df, ...)

AlphaPeptDeep has a built-in score for PSMs, it works much better than other scores such as X!Tandem

match_one_raw(psm_df_one_raw, ms2_file, ...)

Internal function

match_one_raw_mp(args)

class peptdeep.rescore.feature_extractor.ScoreFeatureExtractor(model_mgr: ModelManager)[source][source]

Bases: object

ScoreFeatureExtractor: Feature extractor for percolator

with a single process.

Parameters:

model_mgr (ModelManager) – The ModelManager in peptdeep.pretrained_models.

Methods:

__init__(model_mgr)

extract_features(psm_df, ms2_file_dict, ...)

Extract features and add columns (self.score_feature_list) into psm_df

extract_mobility_features(psm_df)

extract_rt_features(psm_df)

fine_tune_models(psm_df, ms2_file_dict, ...)

Sample some (n=`self.raw_num_to_tune`) from ms2 files, and extract spectrum/peak information, and then fine-tune the models.

match_ms2(psm_df, ms2_file_dict, ms2_file_type)

reset_by_global_settings()

__init__(model_mgr: ModelManager)[source][source]
extract_features(psm_df: DataFrame, ms2_file_dict, ms2_file_type, frag_types: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], ms2_ppm=True, ms2_tol=20.0) DataFrame[source][source]

Extract features and add columns (self.score_feature_list) into psm_df

Parameters:
  • psm_df (pd.DataFrame) – psm dataframe to extract features

  • ms2_file_dict ([type]) – MS2 file path dict: {raw_name: ms2_path}

  • ms2_file_type (str, optional) – MS2 file type, coult be ‘alphapept’, ‘mgf’, or ‘raw’.

  • frag_types (list, optional) – fragment types. Defaults to alphabase.fragment.get_charged_frag_types([‘b’,’y’], 2).

  • ms2_ppm (bool, optional) – Matching MS2 mass tolerance unit. Defaults to True.

  • ms2_tol (int, optional) – Matching mass tolerance. Defaults to 20.

Returns:

psm_df with feature columns added

Return type:

pd.DataFrame

extract_mobility_features(psm_df)[source][source]
extract_rt_features(psm_df)[source][source]
fine_tune_models(psm_df: DataFrame, ms2_file_dict: dict, ms2_file_type: str, frag_types_to_match: str, ms2_ppm: bool, ms2_tol: float)[source][source]

Sample some (n=`self.raw_num_to_tune`) from ms2 files, and extract spectrum/peak information, and then fine-tune the models.

Parameters:
  • psm_df (pd.DataFrame) – psm_df

  • ms2_file_dict (dict) – {raw_name: ms2_file_path}

  • ms2_file_type (str) – ms2_file_type, could be ‘alphapept’, ‘mgf’, ‘thermo_raw’

  • frag_types_to_match (str) – [‘b_z1’,’b_z2’,’y_z1’…]

  • ms2_ppm (bool) – is ppm tolerance for ms2 matching

  • ms2_tol (float) – tolerance value for ms2 matching

match_ms2(psm_df: DataFrame, ms2_file_dict, ms2_file_type: str, frag_types_to_match: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], ms2_ppm=True, ms2_tol=20)[source][source]
reset_by_global_settings()[source][source]
class peptdeep.rescore.feature_extractor.ScoreFeatureExtractorMP(model_mgr: ModelManager)[source][source]

Bases: ScoreFeatureExtractor

Methods:

__init__(model_mgr)

ScoreFeatureExtractorMP: Feature extractor for percolator

extract_features(psm_df, ms2_file_dict, ...)

Extract (multiprocessing) features and add columns (self.score_feature_list) into psm_df.

extract_features_one_raw(df_one_raw, ...)

extract_features_one_raw_mp(args)

fine_tune_models(psm_df, ms2_file_dict, ...)

Sample some (n=`self.raw_num_to_tune`) from ms2 files, and extract (MP) spectrum/peak information, and then fine-tune the models.

__init__(model_mgr: ModelManager)[source][source]
ScoreFeatureExtractorMP: Feature extractor for percolator

with multiprocessing.

Parameters:

model_mgr (ModelManager) – The ModelManager in peptdeep.pretrained_models.

extract_features(psm_df: DataFrame, ms2_file_dict, ms2_file_type, frag_types: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], ms2_ppm=True, ms2_tol=20.0) DataFrame[source][source]

Extract (multiprocessing) features and add columns (self.score_feature_list) into psm_df.

Parameters:
  • psm_df (pd.DataFrame) – psm dataframe to extract features

  • ms2_file_dict ([type]) – MS2 file path dict: {raw_name: ms2_path}

  • ms2_file_type (str, optional) – MS2 file type, coult be ‘alphapept’, ‘mgf’, or ‘thermo’.

  • frag_types (list, optional) – fragment types. Defaults to alphabase.fragment.get_charged_frag_types([‘b’,’y’], 2).

  • ms2_ppm (bool, optional) – Matching MS2 mass tolerance unit. Defaults to True.

  • ms2_tol (int, optional) – Matching mass tolerance. Defaults to 20.

Returns:

psm_df with feature columns added

Return type:

pd.DataFrame

extract_features_one_raw(df_one_raw: DataFrame, ms2_file, ms2_file_type, frag_types, ms2_ppm, ms2_tol, calibrate_frag_mass_error)[source][source]
extract_features_one_raw_mp(args)[source][source]
fine_tune_models(psm_df, ms2_file_dict, ms2_file_type, frag_types_to_match, ms2_ppm, ms2_tol)[source][source]

Sample some (n=`self.raw_num_to_tune`) from ms2 files, and extract (MP) spectrum/peak information, and then fine-tune the models.

Parameters:
  • psm_df (pd.DataFrame) – psm_df

  • ms2_file_dict (dict) – {raw_name: ms2_file_path}

  • ms2_file_type (str) – ms2_file_type, could be ‘alphapept’, ‘mgf’, ‘thermo_raw’

  • frag_types_to_match (str) – [‘b_z1’,’b_z2’,’y_z1’…]

  • ms2_ppm (bool) – is ppm tolerance for ms2 matching

  • ms2_tol (float) – tolerance value for ms2 matching

peptdeep.rescore.feature_extractor.get_ms2_features(psm_df, frag_types, predict_intensity_df, matched_intensity_df, matched_mass_err_df) DataFrame[source][source]

Extract ms2 features from the given predict_intensity_df and matched_intensity_df. It will add columns into psm_df:

  • cos: cosine similarity between predicted and matched fragments

  • pcc: pearson correlation between predicted and matched fragments

  • sa: spectral angle between predicted and matched fragments

  • spc: Spearman’s rank correlation between predicted and matched fragments.

  • cos_bion: …

  • cos_yion: …

  • pcc_bion: …

  • pcc_yion: …

  • sa_bion: …

  • sa_yion: …

  • spc_bion: …

  • spc_yion: …

  • matched_frag_ratio: # matched fragments / # total b+y fragments

  • matched_bion_ratio: # matched b fragments / # total b fragments

  • matched_yion_ratio: # matched y fragments / # total y fragments

  • and more …

peptdeep.rescore.feature_extractor.get_ms2_features_mp(args)[source][source]
peptdeep.rescore.feature_extractor.get_psm_scores(psm_df: DataFrame, predict_intensity_df: DataFrame, matched_intensity_df: DataFrame, matched_mass_err_df: DataFrame) DataFrame[source][source]

AlphaPeptDeep has a built-in score for PSMs, it works much better than other scores such as X!Tandem

Parameters:
  • psm_df (pd.DataFrame) – PSM DataFrame

  • predict_intensity_df (pd.DataFrame) – Predict intensity DataFrame

  • matched_intensity_df (pd.DataFrame) – Matched intensity DataFrame

  • matched_mass_err_df (pd.DataFrame) – Matched mass error DataFrame

Returns:

psm_df with “*_score” columns appended inplace

Return type:

DataFrame

peptdeep.rescore.feature_extractor.match_one_raw(psm_df_one_raw, ms2_file, ms2_file_type, frag_types_to_match, ms2_ppm, ms2_tol, calibrate_frag_mass_error)[source][source]

Internal function

peptdeep.rescore.feature_extractor.match_one_raw_mp(args)[source][source]