peptdeep.rescore.feature_extractor¶
Classes:
|
ScoreFeatureExtractor: Feature extractor for percolator |
|
Functions:
|
Extract ms2 features from the given predict_intensity_df and matched_intensity_df. |
|
|
|
AlphaPeptDeep has a built-in score for PSMs, it works much better than other scores such as X!Tandem |
|
Internal function |
|
- class peptdeep.rescore.feature_extractor.ScoreFeatureExtractor(model_mgr: ModelManager)[source][source]¶
Bases:
object- ScoreFeatureExtractor: Feature extractor for percolator
with a single process.
- Parameters:
model_mgr (ModelManager) – The ModelManager in peptdeep.pretrained_models.
Methods:
__init__(model_mgr)extract_features(psm_df, ms2_file_dict, ...)Extract features and add columns (self.score_feature_list) into psm_df
extract_mobility_features(psm_df)extract_rt_features(psm_df)fine_tune_models(psm_df, ms2_file_dict, ...)Sample some (n=`self.raw_num_to_tune`) from ms2 files, and extract spectrum/peak information, and then fine-tune the models.
match_ms2(psm_df, ms2_file_dict, ms2_file_type)- __init__(model_mgr: ModelManager)[source][source]¶
- extract_features(psm_df: DataFrame, ms2_file_dict, ms2_file_type, frag_types: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], ms2_ppm=True, ms2_tol=20.0) DataFrame[source][source]¶
Extract features and add columns (self.score_feature_list) into psm_df
- Parameters:
psm_df (pd.DataFrame) – psm dataframe to extract features
ms2_file_dict ([type]) – MS2 file path dict: {raw_name: ms2_path}
ms2_file_type (str, optional) – MS2 file type, coult be ‘alphapept’, ‘mgf’, or ‘raw’.
frag_types (list, optional) – fragment types. Defaults to alphabase.fragment.get_charged_frag_types([‘b’,’y’], 2).
ms2_ppm (bool, optional) – Matching MS2 mass tolerance unit. Defaults to True.
ms2_tol (int, optional) – Matching mass tolerance. Defaults to 20.
- Returns:
psm_df with feature columns added
- Return type:
pd.DataFrame
- fine_tune_models(psm_df: DataFrame, ms2_file_dict: dict, ms2_file_type: str, frag_types_to_match: str, ms2_ppm: bool, ms2_tol: float)[source][source]¶
Sample some (n=`self.raw_num_to_tune`) from ms2 files, and extract spectrum/peak information, and then fine-tune the models.
- Parameters:
psm_df (pd.DataFrame) – psm_df
ms2_file_dict (dict) – {raw_name: ms2_file_path}
ms2_file_type (str) – ms2_file_type, could be ‘alphapept’, ‘mgf’, ‘thermo_raw’
frag_types_to_match (str) – [‘b_z1’,’b_z2’,’y_z1’…]
ms2_ppm (bool) – is ppm tolerance for ms2 matching
ms2_tol (float) – tolerance value for ms2 matching
- class peptdeep.rescore.feature_extractor.ScoreFeatureExtractorMP(model_mgr: ModelManager)[source][source]¶
Bases:
ScoreFeatureExtractorMethods:
__init__(model_mgr)ScoreFeatureExtractorMP: Feature extractor for percolator
extract_features(psm_df, ms2_file_dict, ...)Extract (multiprocessing) features and add columns (self.score_feature_list) into psm_df.
extract_features_one_raw(df_one_raw, ...)fine_tune_models(psm_df, ms2_file_dict, ...)Sample some (n=`self.raw_num_to_tune`) from ms2 files, and extract (MP) spectrum/peak information, and then fine-tune the models.
- __init__(model_mgr: ModelManager)[source][source]¶
- ScoreFeatureExtractorMP: Feature extractor for percolator
with multiprocessing.
- Parameters:
model_mgr (ModelManager) – The ModelManager in peptdeep.pretrained_models.
- extract_features(psm_df: DataFrame, ms2_file_dict, ms2_file_type, frag_types: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], ms2_ppm=True, ms2_tol=20.0) DataFrame[source][source]¶
Extract (multiprocessing) features and add columns (self.score_feature_list) into psm_df.
- Parameters:
psm_df (pd.DataFrame) – psm dataframe to extract features
ms2_file_dict ([type]) – MS2 file path dict: {raw_name: ms2_path}
ms2_file_type (str, optional) – MS2 file type, coult be ‘alphapept’, ‘mgf’, or ‘thermo’.
frag_types (list, optional) – fragment types. Defaults to alphabase.fragment.get_charged_frag_types([‘b’,’y’], 2).
ms2_ppm (bool, optional) – Matching MS2 mass tolerance unit. Defaults to True.
ms2_tol (int, optional) – Matching mass tolerance. Defaults to 20.
- Returns:
psm_df with feature columns added
- Return type:
pd.DataFrame
- extract_features_one_raw(df_one_raw: DataFrame, ms2_file, ms2_file_type, frag_types, ms2_ppm, ms2_tol, calibrate_frag_mass_error)[source][source]¶
- fine_tune_models(psm_df, ms2_file_dict, ms2_file_type, frag_types_to_match, ms2_ppm, ms2_tol)[source][source]¶
Sample some (n=`self.raw_num_to_tune`) from ms2 files, and extract (MP) spectrum/peak information, and then fine-tune the models.
- Parameters:
psm_df (pd.DataFrame) – psm_df
ms2_file_dict (dict) – {raw_name: ms2_file_path}
ms2_file_type (str) – ms2_file_type, could be ‘alphapept’, ‘mgf’, ‘thermo_raw’
frag_types_to_match (str) – [‘b_z1’,’b_z2’,’y_z1’…]
ms2_ppm (bool) – is ppm tolerance for ms2 matching
ms2_tol (float) – tolerance value for ms2 matching
- peptdeep.rescore.feature_extractor.get_ms2_features(psm_df, frag_types, predict_intensity_df, matched_intensity_df, matched_mass_err_df) DataFrame[source][source]¶
Extract ms2 features from the given predict_intensity_df and matched_intensity_df. It will add columns into psm_df:
cos: cosine similarity between predicted and matched fragments
pcc: pearson correlation between predicted and matched fragments
sa: spectral angle between predicted and matched fragments
spc: Spearman’s rank correlation between predicted and matched fragments.
cos_bion: …
cos_yion: …
pcc_bion: …
pcc_yion: …
sa_bion: …
sa_yion: …
spc_bion: …
spc_yion: …
matched_frag_ratio: # matched fragments / # total b+y fragments
matched_bion_ratio: # matched b fragments / # total b fragments
matched_yion_ratio: # matched y fragments / # total y fragments
and more …
- peptdeep.rescore.feature_extractor.get_psm_scores(psm_df: DataFrame, predict_intensity_df: DataFrame, matched_intensity_df: DataFrame, matched_mass_err_df: DataFrame) DataFrame[source][source]¶
AlphaPeptDeep has a built-in score for PSMs, it works much better than other scores such as X!Tandem
- Parameters:
psm_df (pd.DataFrame) – PSM DataFrame
predict_intensity_df (pd.DataFrame) – Predict intensity DataFrame
matched_intensity_df (pd.DataFrame) – Matched intensity DataFrame
matched_mass_err_df (pd.DataFrame) – Matched mass error DataFrame
- Returns:
psm_df with “*_score” columns appended inplace
- Return type:
DataFrame