peptdeep.model.model_interface¶
Classes:
A CallbackHandler class that can be used to add callbacks to the training process for both epoch-level and batch-level events. |
|
|
|
|
Provides standardized methods to interact with ml models. |
|
A learning rate scheduler that includes a warmup phase and then a cosine annealing phase. |
Functions:
|
Append a column containing the number of Amino Acids |
- class peptdeep.model.model_interface.CallbackHandler[source][source]¶
Bases:
objectA CallbackHandler class that can be used to add callbacks to the training process for both epoch-level and batch-level events. To have more control over the training process, you can create a subclass of this class and override the methods you need.
Methods:
batch_callback(batch, batch_loss)This method will be called at the end of each batch.
epoch_callback(epoch, epoch_loss)This method will be called at the end of each epoch.
- batch_callback(batch: int, batch_loss: float)[source][source]¶
This method will be called at the end of each batch.
- Parameters:
batch (int) – The current batch number.
batch_loss (float) – The loss value of the current batch.
- epoch_callback(epoch: int, epoch_loss: float) bool[source][source]¶
This method will be called at the end of each epoch. The callback can also be used to stop the training by returning False. If the return value is None, or True, the training will continue.
- Parameters:
epoch (int) – The current epoch number.
epoch_loss (float) – The loss value of the current epoch.
- Returns:
continue_training – If False, the training will stop.
- Return type:
bool
- class peptdeep.model.model_interface.LR_SchedulerInterface(optimizer: Optimizer, **kwargs)[source][source]¶
Bases:
objectMethods:
__init__(optimizer, **kwargs)Get the last learning rate.
step(epoch, loss)This method must be implemented in the sub-class.
- get_last_lr() List[float][source][source]¶
Get the last learning rate.
- Returns:
The last learning rate.
- Return type:
List[float]
- step(epoch: int, loss: float)[source][source]¶
This method must be implemented in the sub-class. It will be called to get the learning rate for the next epoch. While the one we are using here does not need the loss value, this is left in case of using something like the ReduceLROnPlateau scheduler.
- Parameters:
epoch (int) – The current epoch number.
loss (float) – The loss value of the current epoch.
- class peptdeep.model.model_interface.ModelInterface(device: str = 'gpu', fixed_sequence_len: int = 0, min_pred_value: float = 0.0, **kwargs)[source][source]¶
Bases:
objectProvides standardized methods to interact with ml models. Inherit into new class and override the abstract (i.e. not implemented) methods.
Methods:
__init__([device, fixed_sequence_len, ...])build(model_class, **kwargs)Builds the model by specifying the PyTorch module, the parameters, the device, the loss function ...
build_from_py_codes(model_code_file_or_zip)Build the model based on a python file.
Get total number of parameters in model.
load(model_file[, model_path_in_zip])Load a model specified in a zip file, a text file or a file stream.
predict(precursor_df, *[, batch_size, verbose])The model predicts the properties based on the inputs it has been trained for.
predict_mp(precursor_df, *[, batch_size, ...])Predicting with multiprocessing is no GPUs are availible.
save(filename)Save the model state, the constants used, the code defining the model and the model parameters.
set_bert_trainable([bert_layer_name, ...])set_callback_handler(callback_handler)Set the callback handler.
set_device([device_type, device_ids])Set the device (e.g. gpu (cuda), mps, cpu, ...) to be used for the model.
set_layer_trainable([layer_names, trainable])set_lr(lr)Set learning rate
set_lr_scheduler_class(lr_scheduler_class)Set the learning rate scheduler class.
train(precursor_df, *[, batch_size, epoch, ...])Train the model according to specifications.
train_with_warmup(precursor_df, *[, ...])Train the model according to specifications.
Attributes:
Read-only
Read-only
Read-only
This attribute controls how to train and infer for variable-length sequences:
The predicted values cannot be smaller than this value.
- __init__(device: str = 'gpu', fixed_sequence_len: int = 0, min_pred_value: float = 0.0, **kwargs)[source][source]¶
- Parameters:
device (str, optional) – device type in ‘get_available’, ‘cpu’, ‘mps’, ‘gpu’ (or ‘cuda’), by default ‘gpu’
fixed_sequence_len (int, optional) – See
fixed_sequence_len, defaults to 0.min_pred_value (float, optional) – See
min_pred_value, defaults to 0.0.
- build(model_class: Module, **kwargs)[source][source]¶
Builds the model by specifying the PyTorch module, the parameters, the device, the loss function …
- build_from_py_codes(model_code_file_or_zip: str, code_file_in_zip: str = None, include_model_params_yaml: bool = True, **kwargs)[source][source]¶
Build the model based on a python file. Must contain a PyTorch model implemented as ‘class Model(…’
- property device: device¶
Read-only
- property device_ids: list¶
Read-only
- property device_type: str¶
Read-only
- property fixed_sequence_len: int¶
This attribute controls how to train and infer for variable-length sequences:
if the value is 0, all sequence tensors will be grouped by nAA and train/infer on same nAA in batch.
if the value is > 0: all sequence tensors will be padded by zeros to the fixed length.
if the value is < 0: in each batch, padded by zeros to max length of the batch.
- load(model_file: Tuple[str, IO], model_path_in_zip: str = None, **kwargs)[source][source]¶
Load a model specified in a zip file, a text file or a file stream.
- property min_pred_value: float¶
The predicted values cannot be smaller than this value.
- predict(precursor_df: DataFrame, *, batch_size: int = 1024, verbose: bool = False, **kwargs) DataFrame[source][source]¶
The model predicts the properties based on the inputs it has been trained for. Returns the ouput as a pandas dataframe.
- predict_mp(precursor_df: DataFrame, *, batch_size: int = 1024, mp_batch_size: int = 100000, process_num: int = 16, **kwargs) DataFrame[source][source]¶
Predicting with multiprocessing is no GPUs are availible. Note this multiprocessing method only works for models those predict values within (inplace of) the precursor_df.
- save(filename: str)[source][source]¶
Save the model state, the constants used, the code defining the model and the model parameters.
- set_bert_trainable(bert_layer_name='hidden_nn', bert_layer_idxes=[1, 2], trainable=True)[source][source]¶
- set_callback_handler(callback_handler: CallbackHandler) None[source][source]¶
Set the callback handler. It has to be a subclass of CallbackHandler.
- set_device(device_type: str = 'gpu', device_ids: list = [])[source][source]¶
Set the device (e.g. gpu (cuda), mps, cpu, …) to be used for the model.
- Parameters:
device_type (str, optional) – Device type, see
peptdeep.utils.torch_device_dict. It will check available devices usingpeptdeep.utils.get_available_device()if device_type==’get_available’. By default ‘gpu’device_ids (list, optional) – List of int. Device ids for cuda/gpu (e.g. [1,3] for cuda:1,3). By default []
- set_lr_scheduler_class(lr_scheduler_class: LR_SchedulerInterface) None[source][source]¶
Set the learning rate scheduler class. We require the user pass a class that is a subclass of LR_SchedulerInterface because the current implementation will create an instance of it within this class.
- Parameters:
lr_scheduler_class (LR_SchedulerInterface) – The learning rate scheduler class. Since we create an instance of it within this class, the ModelInterface needs the class to take the arguments optimizer, num_warmup_steps, num_training_steps
- property target_column_to_predict: str¶
- property target_column_to_train: str¶
- train(precursor_df: DataFrame, *, batch_size=1024, epoch=10, warmup_epoch: int = 0, lr=0.0001, verbose=False, verbose_each_epoch=False, **kwargs)[source][source]¶
Train the model according to specifications.
- train_with_warmup(precursor_df: DataFrame, *, batch_size=1024, epoch=10, warmup_epoch=5, lr=0.0001, verbose=False, verbose_each_epoch=False, **kwargs)[source][source]¶
Train the model according to specifications. Includes a warumup phase with linear increasing and cosine decreasing for lr scheduling).
- class peptdeep.model.model_interface.WarmupLR_Scheduler(optimizer: Optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = -1)[source][source]¶
Bases:
LR_SchedulerInterfaceA learning rate scheduler that includes a warmup phase and then a cosine annealing phase.
Methods:
__init__(optimizer, num_warmup_steps, ...[, ...])get_cosine_schedule_with_warmup(optimizer, ...)Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.
Get the last learning rate.
step([epoch, loss])Get the learning rate for the next epoch.
- __init__(optimizer: Optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = -1)[source][source]¶
- get_cosine_schedule_with_warmup(optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = -1)[source][source]¶
Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.
- Parameters:
optimizer ([~torch.optim.Optimizer]) – The optimizer for which to schedule the learning rate.
num_warmup_steps (int) – The number of steps for the warmup phase.
num_training_steps (int) – The total number of training steps.
num_cycles (float, optional, defaults to 0.5) – The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine).
last_epoch (int, optional, defaults to -1) – The index of the last epoch when resuming training.
- Returns:
torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.