peptdeep.model.model_interface#

Classes:

ModelInterface([device, fixed_sequence_len, ...])

Provides standardized methods to interact with ml models.

Functions:

`append_nAA_column_if_missing`(precursor_df)	Append a column containing the number of Amino Acids
`get_cosine_schedule_with_warmup`(optimizer, ...)	Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

class peptdeep.model.model_interface.ModelInterface(device: str = 'gpu', fixed_sequence_len: int = 0, min_pred_value: float = 0.0, **kwargs)[source][source]#

Bases: object

Provides standardized methods to interact with ml models. Inherit into new class and override the abstract (i.e. not implemented) methods.

Methods:

`__init__`([device, fixed_sequence_len, ...])	param device: device type in 'get_available', 'cpu', 'mps', 'gpu' (or 'cuda'),
`build`(model_class, **kwargs)	Builds the model by specifying the PyTorch module, the parameters, the device, the loss function ...
`build_from_py_codes`(model_code_file_or_zip)	Build the model based on a python file.
`get_parameter_num`()	Get total number of parameters in model.
`load`(model_file[, model_path_in_zip])	Load a model specified in a zip file, a text file or a file stream.
`predict`(precursor_df, *[, batch_size, verbose])	The model predicts the properties based on the inputs it has been trained for.
`predict_mp`(precursor_df, *[, batch_size, ...])	Predicting with multiprocessing is no GPUs are availible.
`save`(filename)	Save the model state, the constants used, the code defining the model and the model parameters.
`set_bert_trainable`([bert_layer_name, ...])
`set_device`([device_type, device_ids])	Set the device (e.g. gpu (cuda), mps, cpu, ...) to be used for the model.
`set_layer_trainable`([layer_names, trainable])
`set_lr`(lr)	Set learning rate
`train`(precursor_df, *[, batch_size, epoch, ...])	Train the model according to specifications.
`train_with_warmup`(precursor_df, *[, ...])	Train the model according to specifications.

Attributes:

`device`	Read-only
`device_ids`	Read-only
`device_type`	Read-only
`fixed_sequence_len`	This attribute controls how to train and infer for variable-length sequences:
`min_pred_value`	The predicted values cannot be smaller than this value.
`target_column_to_predict`
`target_column_to_train`

__init__(device: str = 'gpu', fixed_sequence_len: int = 0, min_pred_value: float = 0.0, **kwargs)[source][source]#

Parameters:

device (str, optional) – device type in ‘get_available’, ‘cpu’, ‘mps’, ‘gpu’ (or ‘cuda’), by default ‘gpu’
fixed_sequence_len (int, optional) – See fixed_sequence_len, defaults to 0.
min_pred_value (float, optional) – See min_pred_value, defaults to 0.0.

build(model_class: Module, **kwargs)[source][source]#: Builds the model by specifying the PyTorch module, the parameters, the device, the loss function …

build_from_py_codes(model_code_file_or_zip: str, code_file_in_zip: str = None, include_model_params_yaml: bool = True, **kwargs)[source][source]#: Build the model based on a python file. Must contain a PyTorch model implemented as ‘class Model(…’

property device: device#: Read-only

property device_ids: list#: Read-only

property device_type: str#: Read-only

property fixed_sequence_len: int#

This attribute controls how to train and infer for variable-length sequences:

if the value is 0, all sequence tensors will be grouped by nAA and train/infer on same nAA in batch.
if the value is > 0: all sequence tensors will be padded by zeros to the fixed length.
if the value is < 0: in each batch, padded by zeros to max length of the batch.

get_parameter_num()[source][source]#: Get total number of parameters in model.

load(model_file: Tuple[str, IO], model_path_in_zip: str = None, **kwargs)[source][source]#: Load a model specified in a zip file, a text file or a file stream.

property min_pred_value: float#: The predicted values cannot be smaller than this value.

predict(precursor_df: DataFrame, *, batch_size: int = 1024, verbose: bool = False, **kwargs) → DataFrame[source][source]#: The model predicts the properties based on the inputs it has been trained for. Returns the ouput as a pandas dataframe.

predict_mp(precursor_df: DataFrame, *, batch_size: int = 1024, mp_batch_size: int = 100000, process_num: int = 16, **kwargs) → DataFrame[source][source]#: Predicting with multiprocessing is no GPUs are availible. Note this multiprocessing method only works for models those predict values within (inplace of) the precursor_df.

save(filename: str)[source][source]#: Save the model state, the constants used, the code defining the model and the model parameters.

set_bert_trainable(bert_layer_name='hidden_nn', bert_layer_idxes=[1, 2], trainable=True)[source][source]#

set_device(device_type: str = 'gpu', device_ids: list = [])[source][source]#

Set the device (e.g. gpu (cuda), mps, cpu, …) to be used for the model.

Parameters:

device_type (str, optional) – Device type, see peptdeep.utils.torch_device_dict. It will check available devices using peptdeep.utils.get_available_device() if device_type==’get_available’. By default ‘gpu’
device_ids (list, optional) – List of int. Device ids for cuda/gpu (e.g. [1,3] for cuda:1,3). By default []

set_layer_trainable(layer_names=[], trainable=True)[source][source]#

set_lr(lr: float)[source][source]#: Set learning rate

property target_column_to_predict: str#

property target_column_to_train: str#

train(precursor_df: DataFrame, *, batch_size=1024, epoch=10, warmup_epoch: int = 0, lr=0.0001, verbose=False, verbose_each_epoch=False, **kwargs)[source][source]#: Train the model according to specifications.

train_with_warmup(precursor_df: DataFrame, *, batch_size=1024, epoch=10, warmup_epoch=5, lr=0.0001, verbose=False, verbose_each_epoch=False, **kwargs)[source][source]#: Train the model according to specifications. Includes a warumup phase with linear increasing and cosine decreasing for lr scheduling).

peptdeep.model.model_interface.append_nAA_column_if_missing(precursor_df)[source][source]#: Append a column containing the number of Amino Acids

peptdeep.model.model_interface.get_cosine_schedule_with_warmup(optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = -1)[source][source]#

Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

Parameters:

optimizer ([~torch.optim.Optimizer]) – The optimizer for which to schedule the learning rate.
num_warmup_steps (int) – The number of steps for the warmup phase.
num_training_steps (int) – The total number of training steps.
num_cycles (float, optional, defaults to 0.5) – The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine).
last_epoch (int, optional, defaults to -1) – The index of the last epoch when resuming training.

Returns:

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.