vocabulary_size: Number of tokens in vocabulary
Description
Used in textrecipes::step_tokenize_sentencepiece()
and
textrecipes::step_tokenize_bpe()
.
Usage
vocabulary_size(range = c(1000L, 32000L), trans = NULL)
Arguments
- range
A two-element vector holding the defaults for the smallest and
largest possible values, respectively. If a transformation is specified,
these values should be in the transformed units.
- trans
A trans
object from the scales
package, such as
scales::transform_log10()
or scales::transform_reciprocal()
. If not provided,
the default is used which matches the units used in range
. If no
transformation, NULL
.