R6
class for creation and training of Funnel
transformersThis class has the following methods:
create
: creates a new transformer based on Funnel
.
train
: trains and fine-tunes a Funnel
model.
New models can be created using the .AIFEFunnelTransformer$create
method.
Model is created with separete_cls = TRUE
, truncate_seq = TRUE
, and pool_q_only = TRUE
.
To train the model, pass the directory of the model to the method .AIFEFunnelTransformer$train
.
Pre-Trained models which can be fine-tuned with this function are available at https://huggingface.co/.
Training of the model makes use of dynamic masking.
aifeducation::.AIFEBaseTransformer
-> .AIFEFunnelTransformer
Inherited methods
aifeducation::.AIFEBaseTransformer$set_SFC_calculate_vocab()
aifeducation::.AIFEBaseTransformer$set_SFC_check_max_pos_emb()
aifeducation::.AIFEBaseTransformer$set_SFC_create_final_tokenizer()
aifeducation::.AIFEBaseTransformer$set_SFC_create_tokenizer_draft()
aifeducation::.AIFEBaseTransformer$set_SFC_create_transformer_model()
aifeducation::.AIFEBaseTransformer$set_SFC_save_tokenizer_draft()
aifeducation::.AIFEBaseTransformer$set_SFT_create_data_collator()
aifeducation::.AIFEBaseTransformer$set_SFT_cuda_empty_cache()
aifeducation::.AIFEBaseTransformer$set_SFT_load_existing_model()
aifeducation::.AIFEBaseTransformer$set_model_param()
aifeducation::.AIFEBaseTransformer$set_model_temp()
aifeducation::.AIFEBaseTransformer$set_required_SFC()
aifeducation::.AIFEBaseTransformer$set_title()
new()
Creates a new transformer based on Funnel
and sets the title.
.AIFEFunnelTransformer$new()
This method returns nothing.
create()
This method creates a transformer configuration based on the Funnel
transformer base architecture
and a vocabulary based on WordPiece
using the python transformers
and tokenizers
libraries.
This method adds the following 'dependent' parameters to the base class's inherited params
list:
vocab_do_lower_case
target_hidden_size
block_sizes
num_decoder_layers
pooling_type
activation_dropout
.AIFEFunnelTransformer$create(
ml_framework = "pytorch",
model_dir,
text_dataset,
vocab_size = 30522,
vocab_do_lower_case = FALSE,
max_position_embeddings = 512,
hidden_size = 768,
target_hidden_size = 64,
block_sizes = c(4, 4, 4),
num_attention_heads = 12,
intermediate_size = 3072,
num_decoder_layers = 2,
pooling_type = "mean",
hidden_act = "gelu",
hidden_dropout_prob = 0.1,
attention_probs_dropout_prob = 0.1,
activation_dropout = 0,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2
)
ml_framework
string
Framework to use for training and inference.
ml_framework = "tensorflow"
: for 'tensorflow'.
ml_framework = "pytorch"
: for 'pytorch'.
model_dir
string
Path to the directory where the model should be saved.
text_dataset
Object of class LargeDataSetForText.
vocab_size
int
Size of the vocabulary.
vocab_do_lower_case
bool
TRUE
if all words/tokens should be lower case.
max_position_embeddings
int
Number of maximum position embeddings. This parameter also determines the maximum length of a sequence which
can be processed with the model.
hidden_size
int
Number of neurons in each layer. This parameter determines the dimensionality of the resulting text
embedding.
target_hidden_size
int
Number of neurons in the final layer. This parameter determines the dimensionality of the resulting text
embedding.
block_sizes
vector
of int
determining the number and sizes of each block.
num_attention_heads
int
Number of attention heads.
intermediate_size
int
Number of neurons in the intermediate layer of the attention mechanism.
num_decoder_layers
int
Number of decoding layers.
pooling_type
string
Type of pooling.
"mean"
for pooling with mean.
"max"
for pooling with maximum values.
hidden_act
string
Name of the activation function.
hidden_dropout_prob
double
Ratio of dropout.
attention_probs_dropout_prob
double
Ratio of dropout for attention probabilities.
activation_dropout
float
Dropout probability between the layers of the feed-forward blocks.
sustain_track
bool
If TRUE
energy consumption is tracked during training via the python library codecarbon.
sustain_iso_code
string
ISO code (Alpha-3-Code) for the country. This variable must be set if sustainability should be tracked. A
list can be found on Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes.
sustain_region
string
Region within a country. Only available for USA and Canada. See the documentation of codecarbon for more
information https://mlco2.github.io/codecarbon/parameters.html.
sustain_interval
integer
Interval in seconds for measuring power usage.
trace
bool
TRUE
if information about the progress should be printed to the console.
pytorch_safetensors
bool
Only relevant for pytorch models.
TRUE
: a 'pytorch' model is saved in safetensors format.
FALSE
(or 'safetensors' is not available): model is saved in the standard pytorch format (.bin).
log_dir
Path to the directory where the log files should be saved.
log_write_interval
int
Time in seconds determining the interval in which the logger should try to update the log files. Only relevant
if log_dir
is not NULL
.
This method does not return an object. Instead, it saves the configuration and vocabulary of the new model to disk.
train()
This method can be used to train or fine-tune a transformer based on Funnel
Transformer
architecture with the help of the python libraries transformers
, datasets
, and tokenizers
.
.AIFEFunnelTransformer$train(
ml_framework = "pytorch",
output_dir,
model_dir_path,
text_dataset,
p_mask = 0.15,
whole_word = TRUE,
val_size = 0.1,
n_epoch = 1,
batch_size = 12,
chunk_size = 250,
full_sequences_only = FALSE,
min_seq_len = 50,
learning_rate = 0.003,
n_workers = 1,
multi_process = FALSE,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
keras_trace = 1,
pytorch_trace = 1,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2
)
ml_framework
string
Framework to use for training and inference.
ml_framework = "tensorflow"
: for 'tensorflow'.
ml_framework = "pytorch"
: for 'pytorch'.
output_dir
string
Path to the directory where the final model should be saved. If the directory does not exist, it will be
created.
model_dir_path
string
Path to the directory where the original model is stored.
text_dataset
Object of class LargeDataSetForText.
p_mask
double
Ratio that determines the number of words/tokens used for masking.
whole_word
bool
TRUE
: whole word masking should be applied.
FALSE
: token masking is used.
val_size
double
Ratio that determines the amount of token chunks used for validation.
n_epoch
int
Number of epochs for training.
batch_size
int
Size of batches.
chunk_size
int
Size of every chunk for training.
full_sequences_only
bool
TRUE
for using only chunks with a sequence length equal to chunk_size
.
min_seq_len
int
Only relevant if full_sequences_only = FALSE
. Value determines the minimal sequence length included in
training process.
learning_rate
double
Learning rate for adam optimizer.
n_workers
int
Number of workers. Only relevant if ml_framework = "tensorflow"
.
multi_process
bool
TRUE
if multiple processes should be activated. Only relevant if ml_framework = "tensorflow"
.
sustain_track
bool
If TRUE
energy consumption is tracked during training via the python library codecarbon.
sustain_iso_code
string
ISO code (Alpha-3-Code) for the country. This variable must be set if sustainability should be tracked. A
list can be found on Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes.
sustain_region
string
Region within a country. Only available for USA and Canada. See the documentation of codecarbon for more
information https://mlco2.github.io/codecarbon/parameters.html.
sustain_interval
integer
Interval in seconds for measuring power usage.
trace
bool
TRUE
if information about the progress should be printed to the console.
keras_trace
int
keras_trace = 0
: does not print any information about the training process from keras on the console.
keras_trace = 1
: prints a progress bar.
keras_trace = 2
: prints one line of information for every epoch. Only relevant if ml_framework = "tensorflow"
.
pytorch_trace
int
pytorch_trace = 0
: does not print any information about the training process from pytorch on the console.
pytorch_trace = 1
: prints a progress bar.
pytorch_safetensors
bool
Only relevant for pytorch models.
TRUE
: a 'pytorch' model is saved in safetensors format.
FALSE
(or 'safetensors' is not available): model is saved in the standard pytorch format (.bin).
log_dir
Path to the directory where the log files should be saved.
log_write_interval
int
Time in seconds determining the interval in which the logger should try to update the log files. Only relevant
if log_dir
is not NULL
.
This method does not return an object. Instead the trained or fine-tuned model is saved to disk.
clone()
The objects of this class are cloneable with this method.
.AIFEFunnelTransformer$clone(deep = FALSE)
deep
Whether to make a deep clone.
Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing. tools:::Rd_expr_doi("10.48550/arXiv.2006.03236")
Hugging Face documentation
Other Transformers for developers:
.AIFEBaseTransformer
,
.AIFEBertTransformer
,
.AIFEDebertaTransformer
,
.AIFELongformerTransformer
,
.AIFEMpnetTransformer
,
.AIFERobertaTransformer
,
.AIFETrObj