peptdeep.model.model_interface

Classes:

CallbackHandler()

A CallbackHandler class that can be used to add callbacks to the training process for both epoch-level and batch-level events.

LR_SchedulerInterface(optimizer, **kwargs)

ModelInterface([device, fixed_sequence_len, ...])

Provides standardized methods to interact with ml models.

WarmupLR_Scheduler(optimizer, ...[, ...])

A learning rate scheduler that includes a warmup phase and then a cosine annealing phase.

Functions:

append_nAA_column_if_missing(precursor_df)

Append a column containing the number of Amino Acids

class peptdeep.model.model_interface.CallbackHandler[source][source]

Bases: object

A CallbackHandler class that can be used to add callbacks to the training process for both epoch-level and batch-level events. To have more control over the training process, you can create a subclass of this class and override the methods you need.

Methods:

batch_callback(batch, batch_loss)

This method will be called at the end of each batch.

epoch_callback(epoch, epoch_loss)

This method will be called at the end of each epoch.

batch_callback(batch: int, batch_loss: float)[source][source]

This method will be called at the end of each batch.

Parameters:
  • batch (int) – The current batch number.

  • batch_loss (float) – The loss value of the current batch.

epoch_callback(epoch: int, epoch_loss: float) bool[source][source]

This method will be called at the end of each epoch. The callback can also be used to stop the training by returning False. If the return value is None, or True, the training will continue.

Parameters:
  • epoch (int) – The current epoch number.

  • epoch_loss (float) – The loss value of the current epoch.

Returns:

continue_training – If False, the training will stop.

Return type:

bool

class peptdeep.model.model_interface.LR_SchedulerInterface(optimizer: Optimizer, **kwargs)[source][source]

Bases: object

Methods:

__init__(optimizer, **kwargs)

get_last_lr()

Get the last learning rate.

step(epoch, loss)

This method must be implemented in the sub-class.

__init__(optimizer: Optimizer, **kwargs)[source][source]
get_last_lr() List[float][source][source]

Get the last learning rate.

Returns:

The last learning rate.

Return type:

List[float]

step(epoch: int, loss: float)[source][source]

This method must be implemented in the sub-class. It will be called to get the learning rate for the next epoch. While the one we are using here does not need the loss value, this is left in case of using something like the ReduceLROnPlateau scheduler.

Parameters:
  • epoch (int) – The current epoch number.

  • loss (float) – The loss value of the current epoch.

class peptdeep.model.model_interface.ModelInterface(device: str = 'gpu', fixed_sequence_len: int = 0, min_pred_value: float = 0.0, **kwargs)[source][source]

Bases: object

Provides standardized methods to interact with ml models. Inherit into new class and override the abstract (i.e. not implemented) methods.

Methods:

__init__([device, fixed_sequence_len, ...])

build(model_class, **kwargs)

Builds the model by specifying the PyTorch module, the parameters, the device, the loss function ...

build_from_py_codes(model_code_file_or_zip)

Build the model based on a python file.

get_parameter_num()

Get total number of parameters in model.

load(model_file[, model_path_in_zip])

Load a model specified in a zip file, a text file or a file stream.

predict(precursor_df, *[, batch_size, verbose])

The model predicts the properties based on the inputs it has been trained for.

predict_mp(precursor_df, *[, batch_size, ...])

Predicting with multiprocessing is no GPUs are availible.

save(filename)

Save the model state, the constants used, the code defining the model and the model parameters.

set_bert_trainable([bert_layer_name, ...])

set_callback_handler(callback_handler)

Set the callback handler.

set_device([device_type, device_ids])

Set the device (e.g. gpu (cuda), mps, cpu, ...) to be used for the model.

set_layer_trainable([layer_names, trainable])

set_lr(lr)

Set learning rate

set_lr_scheduler_class(lr_scheduler_class)

Set the learning rate scheduler class.

train(precursor_df, *[, batch_size, epoch, ...])

Train the model according to specifications.

train_with_warmup(precursor_df, *[, ...])

Train the model according to specifications.

Attributes:

device

Read-only

device_ids

Read-only

device_type

Read-only

fixed_sequence_len

This attribute controls how to train and infer for variable-length sequences:

min_pred_value

The predicted values cannot be smaller than this value.

target_column_to_predict

target_column_to_train

__init__(device: str = 'gpu', fixed_sequence_len: int = 0, min_pred_value: float = 0.0, **kwargs)[source][source]
Parameters:
  • device (str, optional) – device type in ‘get_available’, ‘cpu’, ‘mps’, ‘gpu’ (or ‘cuda’), by default ‘gpu’

  • fixed_sequence_len (int, optional) – See fixed_sequence_len, defaults to 0.

  • min_pred_value (float, optional) – See min_pred_value, defaults to 0.0.

build(model_class: Module, **kwargs)[source][source]

Builds the model by specifying the PyTorch module, the parameters, the device, the loss function …

build_from_py_codes(model_code_file_or_zip: str, code_file_in_zip: str = None, include_model_params_yaml: bool = True, **kwargs)[source][source]

Build the model based on a python file. Must contain a PyTorch model implemented as ‘class Model(…’

property device: device

Read-only

property device_ids: list

Read-only

property device_type: str

Read-only

property fixed_sequence_len: int

This attribute controls how to train and infer for variable-length sequences:

  • if the value is 0, all sequence tensors will be grouped by nAA and train/infer on same nAA in batch.

  • if the value is > 0: all sequence tensors will be padded by zeros to the fixed length.

  • if the value is < 0: in each batch, padded by zeros to max length of the batch.

get_parameter_num()[source][source]

Get total number of parameters in model.

load(model_file: Tuple[str, IO], model_path_in_zip: str = None, **kwargs)[source][source]

Load a model specified in a zip file, a text file or a file stream.

property min_pred_value: float

The predicted values cannot be smaller than this value.

predict(precursor_df: DataFrame, *, batch_size: int = 1024, verbose: bool = False, **kwargs) DataFrame[source][source]

The model predicts the properties based on the inputs it has been trained for. Returns the ouput as a pandas dataframe.

predict_mp(precursor_df: DataFrame, *, batch_size: int = 1024, mp_batch_size: int = 100000, process_num: int = 16, **kwargs) DataFrame[source][source]

Predicting with multiprocessing is no GPUs are availible. Note this multiprocessing method only works for models those predict values within (inplace of) the precursor_df.

save(filename: str)[source][source]

Save the model state, the constants used, the code defining the model and the model parameters.

set_bert_trainable(bert_layer_name='hidden_nn', bert_layer_idxes=[1, 2], trainable=True)[source][source]
set_callback_handler(callback_handler: CallbackHandler) None[source][source]

Set the callback handler. It has to be a subclass of CallbackHandler.

set_device(device_type: str = 'gpu', device_ids: list = [])[source][source]

Set the device (e.g. gpu (cuda), mps, cpu, …) to be used for the model.

Parameters:
  • device_type (str, optional) – Device type, see peptdeep.utils.torch_device_dict. It will check available devices using peptdeep.utils.get_available_device() if device_type==’get_available’. By default ‘gpu’

  • device_ids (list, optional) – List of int. Device ids for cuda/gpu (e.g. [1,3] for cuda:1,3). By default []

set_layer_trainable(layer_names=[], trainable=True)[source][source]
set_lr(lr: float)[source][source]

Set learning rate

set_lr_scheduler_class(lr_scheduler_class: LR_SchedulerInterface) None[source][source]

Set the learning rate scheduler class. We require the user pass a class that is a subclass of LR_SchedulerInterface because the current implementation will create an instance of it within this class.

Parameters:

lr_scheduler_class (LR_SchedulerInterface) – The learning rate scheduler class. Since we create an instance of it within this class, the ModelInterface needs the class to take the arguments optimizer, num_warmup_steps, num_training_steps

property target_column_to_predict: str
property target_column_to_train: str
train(precursor_df: DataFrame, *, batch_size=1024, epoch=10, warmup_epoch: int = 0, lr=0.0001, verbose=False, verbose_each_epoch=False, **kwargs)[source][source]

Train the model according to specifications.

train_with_warmup(precursor_df: DataFrame, *, batch_size=1024, epoch=10, warmup_epoch=5, lr=0.0001, verbose=False, verbose_each_epoch=False, **kwargs)[source][source]

Train the model according to specifications. Includes a warumup phase with linear increasing and cosine decreasing for lr scheduling).

class peptdeep.model.model_interface.WarmupLR_Scheduler(optimizer: Optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = -1)[source][source]

Bases: LR_SchedulerInterface

A learning rate scheduler that includes a warmup phase and then a cosine annealing phase.

Methods:

__init__(optimizer, num_warmup_steps, ...[, ...])

get_cosine_schedule_with_warmup(optimizer, ...)

Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

get_last_lr()

Get the last learning rate.

step([epoch, loss])

Get the learning rate for the next epoch.

__init__(optimizer: Optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = -1)[source][source]
get_cosine_schedule_with_warmup(optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = -1)[source][source]

Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

Parameters:
  • optimizer ([~torch.optim.Optimizer]) – The optimizer for which to schedule the learning rate.

  • num_warmup_steps (int) – The number of steps for the warmup phase.

  • num_training_steps (int) – The total number of training steps.

  • num_cycles (float, optional, defaults to 0.5) – The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine).

  • last_epoch (int, optional, defaults to -1) – The index of the last epoch when resuming training.

Returns:

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.

get_last_lr() List[float][source][source]

Get the last learning rate.

Returns:

The last learning rate.

Return type:

List[float]

step(epoch: int = None, loss=None)[source][source]

Get the learning rate for the next epoch.

Parameters:

epoch (int (Deprecated)) – The current epoch number.

peptdeep.model.model_interface.append_nAA_column_if_missing(precursor_df)[source][source]

Append a column containing the number of Amino Acids