peptdeep.model.model_interface#
Classes:
|
Provides standardized methods to interact with ml models. |
Functions:
|
Append a column containing the number of Amino Acids |
|
Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer. |
- class peptdeep.model.model_interface.ModelInterface(device: str = 'gpu', fixed_sequence_len: int = 0, min_pred_value: float = 0.0, **kwargs)[source][source]#
Bases:
object
Provides standardized methods to interact with ml models. Inherit into new class and override the abstract (i.e. not implemented) methods.
Methods:
__init__
([device, fixed_sequence_len, ...])- param device:
device type in 'get_available', 'cpu', 'mps', 'gpu' (or 'cuda'),
build
(model_class, **kwargs)Builds the model by specifying the PyTorch module, the parameters, the device, the loss function ...
build_from_py_codes
(model_code_file_or_zip)Build the model based on a python file.
Get total number of parameters in model.
load
(model_file[, model_path_in_zip])Load a model specified in a zip file, a text file or a file stream.
predict
(precursor_df, *[, batch_size, verbose])The model predicts the properties based on the inputs it has been trained for.
predict_mp
(precursor_df, *[, batch_size, ...])Predicting with multiprocessing is no GPUs are availible.
save
(filename)Save the model state, the constants used, the code defining the model and the model parameters.
set_bert_trainable
([bert_layer_name, ...])set_device
([device_type, device_ids])Set the device (e.g. gpu (cuda), mps, cpu, ...) to be used for the model.
set_layer_trainable
([layer_names, trainable])set_lr
(lr)Set learning rate
train
(precursor_df, *[, batch_size, epoch, ...])Train the model according to specifications.
train_with_warmup
(precursor_df, *[, ...])Train the model according to specifications.
Attributes:
Read-only
Read-only
Read-only
This attribute controls how to train and infer for variable-length sequences:
The predicted values cannot be smaller than this value.
- __init__(device: str = 'gpu', fixed_sequence_len: int = 0, min_pred_value: float = 0.0, **kwargs)[source][source]#
- Parameters:
device (str, optional) – device type in ‘get_available’, ‘cpu’, ‘mps’, ‘gpu’ (or ‘cuda’), by default ‘gpu’
fixed_sequence_len (int, optional) – See
fixed_sequence_len
, defaults to 0.min_pred_value (float, optional) – See
min_pred_value
, defaults to 0.0.
- build(model_class: Module, **kwargs)[source][source]#
Builds the model by specifying the PyTorch module, the parameters, the device, the loss function …
- build_from_py_codes(model_code_file_or_zip: str, code_file_in_zip: str = None, include_model_params_yaml: bool = True, **kwargs)[source][source]#
Build the model based on a python file. Must contain a PyTorch model implemented as ‘class Model(…’
- property device: device#
Read-only
- property device_ids: list#
Read-only
- property device_type: str#
Read-only
- property fixed_sequence_len: int#
This attribute controls how to train and infer for variable-length sequences:
if the value is 0, all sequence tensors will be grouped by nAA and train/infer on same nAA in batch.
if the value is > 0: all sequence tensors will be padded by zeros to the fixed length.
if the value is < 0: in each batch, padded by zeros to max length of the batch.
- load(model_file: Tuple[str, IO], model_path_in_zip: str = None, **kwargs)[source][source]#
Load a model specified in a zip file, a text file or a file stream.
- property min_pred_value: float#
The predicted values cannot be smaller than this value.
- predict(precursor_df: DataFrame, *, batch_size: int = 1024, verbose: bool = False, **kwargs) DataFrame [source][source]#
The model predicts the properties based on the inputs it has been trained for. Returns the ouput as a pandas dataframe.
- predict_mp(precursor_df: DataFrame, *, batch_size: int = 1024, mp_batch_size: int = 100000, process_num: int = 16, **kwargs) DataFrame [source][source]#
Predicting with multiprocessing is no GPUs are availible. Note this multiprocessing method only works for models those predict values within (inplace of) the precursor_df.
- save(filename: str)[source][source]#
Save the model state, the constants used, the code defining the model and the model parameters.
- set_bert_trainable(bert_layer_name='hidden_nn', bert_layer_idxes=[1, 2], trainable=True)[source][source]#
- set_device(device_type: str = 'gpu', device_ids: list = [])[source][source]#
Set the device (e.g. gpu (cuda), mps, cpu, …) to be used for the model.
- Parameters:
device_type (str, optional) – Device type, see
peptdeep.utils.torch_device_dict
. It will check available devices usingpeptdeep.utils.get_available_device()
if device_type==’get_available’. By default ‘gpu’device_ids (list, optional) – List of int. Device ids for cuda/gpu (e.g. [1,3] for cuda:1,3). By default []
- property target_column_to_predict: str#
- property target_column_to_train: str#
- train(precursor_df: DataFrame, *, batch_size=1024, epoch=10, warmup_epoch: int = 0, lr=0.0001, verbose=False, verbose_each_epoch=False, **kwargs)[source][source]#
Train the model according to specifications.
- train_with_warmup(precursor_df: DataFrame, *, batch_size=1024, epoch=10, warmup_epoch=5, lr=0.0001, verbose=False, verbose_each_epoch=False, **kwargs)[source][source]#
Train the model according to specifications. Includes a warumup phase with linear increasing and cosine decreasing for lr scheduling).
- peptdeep.model.model_interface.append_nAA_column_if_missing(precursor_df)[source][source]#
Append a column containing the number of Amino Acids
- peptdeep.model.model_interface.get_cosine_schedule_with_warmup(optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = -1)[source][source]#
Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.
- Parameters:
optimizer ([~torch.optim.Optimizer]) – The optimizer for which to schedule the learning rate.
num_warmup_steps (int) – The number of steps for the warmup phase.
num_training_steps (int) – The total number of training steps.
num_cycles (float, optional, defaults to 0.5) – The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine).
last_epoch (int, optional, defaults to -1) – The index of the last epoch when resuming training.
- Returns:
torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.