peptdeep.model.model_interface#

Classes:

ModelInterface([device, fixed_sequence_len, ...])

Provides standardized methods to interact with ml models.

Functions:

append_nAA_column_if_missing(precursor_df)

Append a column containing the number of Amino Acids

get_cosine_schedule_with_warmup(optimizer, ...)

Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

class peptdeep.model.model_interface.ModelInterface(device: str = 'gpu', fixed_sequence_len: int = 0, min_pred_value: float = 0.0, **kwargs)[source][source]#

Bases: object

Provides standardized methods to interact with ml models. Inherit into new class and override the abstract (i.e. not implemented) methods.

Methods:

__init__([device, fixed_sequence_len, ...])

param device:

device type in 'get_available', 'cpu', 'mps', 'gpu' (or 'cuda'),

build(model_class, **kwargs)

Builds the model by specifying the PyTorch module, the parameters, the device, the loss function ...

build_from_py_codes(model_code_file_or_zip)

Build the model based on a python file.

get_parameter_num()

Get total number of parameters in model.

load(model_file[, model_path_in_zip])

Load a model specified in a zip file, a text file or a file stream.

predict(precursor_df, *[, batch_size, verbose])

The model predicts the properties based on the inputs it has been trained for.

predict_mp(precursor_df, *[, batch_size, ...])

Predicting with multiprocessing is no GPUs are availible.

save(filename)

Save the model state, the constants used, the code defining the model and the model parameters.

set_bert_trainable([bert_layer_name, ...])

set_device([device_type, device_ids])

Set the device (e.g. gpu (cuda), mps, cpu, ...) to be used for the model.

set_layer_trainable([layer_names, trainable])

set_lr(lr)

Set learning rate

train(precursor_df, *[, batch_size, epoch, ...])

Train the model according to specifications.

train_with_warmup(precursor_df, *[, ...])

Train the model according to specifications.

Attributes:

device

Read-only

device_ids

Read-only

device_type

Read-only

fixed_sequence_len

This attribute controls how to train and infer for variable-length sequences:

min_pred_value

The predicted values cannot be smaller than this value.

target_column_to_predict

target_column_to_train

__init__(device: str = 'gpu', fixed_sequence_len: int = 0, min_pred_value: float = 0.0, **kwargs)[source][source]#
Parameters:
  • device (str, optional) – device type in ‘get_available’, ‘cpu’, ‘mps’, ‘gpu’ (or ‘cuda’), by default ‘gpu’

  • fixed_sequence_len (int, optional) – See fixed_sequence_len, defaults to 0.

  • min_pred_value (float, optional) – See min_pred_value, defaults to 0.0.

build(model_class: Module, **kwargs)[source][source]#

Builds the model by specifying the PyTorch module, the parameters, the device, the loss function …

build_from_py_codes(model_code_file_or_zip: str, code_file_in_zip: str = None, include_model_params_yaml: bool = True, **kwargs)[source][source]#

Build the model based on a python file. Must contain a PyTorch model implemented as ‘class Model(…’

property device: device#

Read-only

property device_ids: list#

Read-only

property device_type: str#

Read-only

property fixed_sequence_len: int#

This attribute controls how to train and infer for variable-length sequences:

  • if the value is 0, all sequence tensors will be grouped by nAA and train/infer on same nAA in batch.

  • if the value is > 0: all sequence tensors will be padded by zeros to the fixed length.

  • if the value is < 0: in each batch, padded by zeros to max length of the batch.

get_parameter_num()[source][source]#

Get total number of parameters in model.

load(model_file: Tuple[str, IO], model_path_in_zip: str = None, **kwargs)[source][source]#

Load a model specified in a zip file, a text file or a file stream.

property min_pred_value: float#

The predicted values cannot be smaller than this value.

predict(precursor_df: DataFrame, *, batch_size: int = 1024, verbose: bool = False, **kwargs) DataFrame[source][source]#

The model predicts the properties based on the inputs it has been trained for. Returns the ouput as a pandas dataframe.

predict_mp(precursor_df: DataFrame, *, batch_size: int = 1024, mp_batch_size: int = 100000, process_num: int = 16, **kwargs) DataFrame[source][source]#

Predicting with multiprocessing is no GPUs are availible. Note this multiprocessing method only works for models those predict values within (inplace of) the precursor_df.

save(filename: str)[source][source]#

Save the model state, the constants used, the code defining the model and the model parameters.

set_bert_trainable(bert_layer_name='hidden_nn', bert_layer_idxes=[1, 2], trainable=True)[source][source]#
set_device(device_type: str = 'gpu', device_ids: list = [])[source][source]#

Set the device (e.g. gpu (cuda), mps, cpu, …) to be used for the model.

Parameters:
  • device_type (str, optional) – Device type, see peptdeep.utils.torch_device_dict. It will check available devices using peptdeep.utils.get_available_device() if device_type==’get_available’. By default ‘gpu’

  • device_ids (list, optional) – List of int. Device ids for cuda/gpu (e.g. [1,3] for cuda:1,3). By default []

set_layer_trainable(layer_names=[], trainable=True)[source][source]#
set_lr(lr: float)[source][source]#

Set learning rate

property target_column_to_predict: str#
property target_column_to_train: str#
train(precursor_df: DataFrame, *, batch_size=1024, epoch=10, warmup_epoch: int = 0, lr=0.0001, verbose=False, verbose_each_epoch=False, **kwargs)[source][source]#

Train the model according to specifications.

train_with_warmup(precursor_df: DataFrame, *, batch_size=1024, epoch=10, warmup_epoch=5, lr=0.0001, verbose=False, verbose_each_epoch=False, **kwargs)[source][source]#

Train the model according to specifications. Includes a warumup phase with linear increasing and cosine decreasing for lr scheduling).

peptdeep.model.model_interface.append_nAA_column_if_missing(precursor_df)[source][source]#

Append a column containing the number of Amino Acids

peptdeep.model.model_interface.get_cosine_schedule_with_warmup(optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = -1)[source][source]#

Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

Parameters:
  • optimizer ([~torch.optim.Optimizer]) – The optimizer for which to schedule the learning rate.

  • num_warmup_steps (int) – The number of steps for the warmup phase.

  • num_training_steps (int) – The total number of training steps.

  • num_cycles (float, optional, defaults to 0.5) – The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine).

  • last_epoch (int, optional, defaults to -1) – The index of the last epoch when resuming training.

Returns:

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.