peptdeep.model.featurize¶
Functions:
|
Convert peptide sequences into ASCII code array. |
|
Convert peptide sequences into AA ID array. |
|
|
|
|
|
Get modification feature of a given peptide (len=nAA). |
- peptdeep.model.featurize.get_ascii_indices(seq_array: List | ndarray) ndarray[source][source]¶
Convert peptide sequences into ASCII code array. The values are from 0 to 127. Zeros are padded into the N- and C-term for each sequence.
- Parameters:
seq_array (Union[List,np.ndarray]) – list or 1-D array of sequences.
- Returns:
2-D np.int32 array with the shape (len(seq_array), max seq length+2). For the the sequence whose length is shorter than max seq length, zeros are padded to the missing values.
- Return type:
np.ndarray
- peptdeep.model.featurize.get_batch_aa_indices(seq_array: List | ndarray) ndarray[source][source]¶
Convert peptide sequences into AA ID array. ID=0 is reserved for masking, so ID of ‘A’ is 1, ID of ‘B’ is 2, …, ID of ‘Z’ is 26 (maximum). Zeros are padded into the N- and C-term for each sequence.
- Parameters:
seq_array (Union[List,np.ndarray]) – list or 1-D array of sequences with the same length
- Returns:
2-D np.int32 array with the shape (len(seq_array), len(seq_array[0])+2). Zeros is padded into the N- and C-term of each sequence, so the 1st-D is len(seq_array[0])+2.
- Return type:
np.ndarray
- peptdeep.model.featurize.get_batch_mod_feature(batch_df: DataFrame) ndarray[source][source]¶
- Parameters:
batch_df (pd.DataFrame) – dataframe with ‘sequence’, ‘mods’, ‘mod_sites’ and ‘nAA’ columns. All sequence lengths must be the same, meaning that nAA values must be equal.
- Returns:
3-D tensor with shape (batch_size, nAA+2, mod_feature_size)
- Return type:
np.ndarray
- peptdeep.model.featurize.parse_mod_feature(nAA: int, mod_names: List[str], mod_sites: List[int]) ndarray[source][source]¶
Get modification feature of a given peptide (len=nAA). Note that site=0 is for peptide N-term modification, site=-1 is for peptide C-term modification, and 1<=site<=nAA is for residue modifications on the peptide.
- Parameters:
nAA (int) – the lenght of the peptide sequence
mod_names (List[str]) – the modification names
mod_sites (List[str]) – the modification sites corresponding to mod_names on the peptide
- Returns:
2-D feature array with shape (nAA+2,mod_feature_size)
- Return type:
np.ndarray