Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table). The evaluation computes results which you can retrieve with the get_data_quality_result
API.
See https://www.paws-r-sdk.com/docs/glue_start_data_quality_ruleset_evaluation_run/ for full documentation.
glue_start_data_quality_ruleset_evaluation_run(
DataSource,
Role,
NumberOfWorkers = NULL,
Timeout = NULL,
ClientToken = NULL,
AdditionalRunOptions = NULL,
RulesetNames,
AdditionalDataSources = NULL
)
[required] The data source (Glue table) associated with this run.
[required] An IAM role supplied to encrypt the results of the run.
The number of G.1X
workers to be used in the run. The default is 5.
The timeout for a run in minutes. This is the maximum time that a run
can consume resources before it is terminated and enters TIMEOUT
status. The default is 2,880 minutes (48 hours).
Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.
Additional run options you can specify for an evaluation run.
[required] A list of ruleset names.
A map of reference strings to additional data sources you can specify for an evaluation run.