These functions can be used to mask a set of utterances or one or more sources.
mask_source(
input,
output = NULL,
proportionToMask = 1,
preventOverwriting = rock::opts$get(preventOverwriting),
encoding = rock::opts$get(encoding),
rlWarn = rock::opts$get(rlWarn),
maskRegex = "[[:alnum:]]",
maskChar = "X",
perl = TRUE,
silent = rock::opts$get(silent)
)mask_sources(
input,
output,
proportionToMask = 1,
outputPrefix = "",
outputSuffix = "_masked",
maskRegex = "[[:alnum:]]",
maskChar = "X",
perl = TRUE,
recursive = TRUE,
filenameRegex = ".*",
filenameReplacement = c("_PRIVATE_", "_public_"),
preventOverwriting = rock::opts$get(preventOverwriting),
encoding = rock::opts$get(encoding),
silent = rock::opts$get(silent)
)
mask_utterances(
input,
proportionToMask = 1,
maskRegex = "[[:alnum:]]",
maskChar = "X",
perl = TRUE
)
A character vector for mask_utterance
and mask_source
, or a list of
character vectors, for mask_sources
.
For mask_utterance
, a character vector where each element is one
utterance; for mask_source
, either a character vector containing the text of the
relevant source or a path to a file that contains the source text; for mask_sources
,
a path to a directory that contains the sources to mask.
For mask_source
, if not NULL
, this is the name (and path) of the
file in which to save the processed source (if it is NULL
, the result will be
returned visibly). For mask_sources
, output
is mandatory and is the path to the
directory where to store the processed sources. This path will be created with a
warning if it does not exist. An exception is if "same
" is specified - in that
case, every file will be written to the same directory it was read from.
The proportion of utterances to mask, from 0 (none) to 1 (all).
Whether to prevent overwriting of output files.
The encoding of the source(s).
Whether to let readLines()
warn, e.g. if files do not end
with a newline character.
A regular expresssion (regex) specifying the characters to mask (i.e. replace with the masking character).
The character to replace the character to mask with.
Whether the regular expression is a perl regex or not.
Whether to suppress the warning about not editing the cleaned source.
The prefix and suffix to add to the filenames when writing the processed files to disk.
Whether to search all subdirectories (TRUE
) as well or not.
A regular expression to match against located files; only files matching this regular expression are processed.
A character vector with two elements that represent,
respectively, the pattern
and replacement
arguments of the gsub()
function.
In other words, the first argument specifies a regular expression to search for
in every processed filename, and the second argument specifies a regular
expression that replaces any matches with the first argument. Set to NULL
to
not perform any replacement on the output file name.
### Mask text but not the codes
rock::mask_utterances(
paste0(
"Lorem ipsum dolor sit amet, consectetur adipiscing ",
"elit. [[expAttitude_expectation_73dnt5z1>earplugsFeelUnpleasant]]"
)
)
Run the code above in your browser using DataLab