Creates an Autopilot job also referred to as Autopilot experiment or AutoML job V2.
See https://www.paws-r-sdk.com/docs/sagemaker_create_auto_ml_job_v2/ for full documentation.
sagemaker_create_auto_ml_job_v2(
AutoMLJobName,
AutoMLJobInputDataConfig,
OutputDataConfig,
AutoMLProblemTypeConfig,
RoleArn,
Tags = NULL,
SecurityConfig = NULL,
AutoMLJobObjective = NULL,
ModelDeployConfig = NULL,
DataSplitConfig = NULL,
AutoMLComputeConfig = NULL
)
[required] Identifies an Autopilot job. The name must be unique to your account and is case insensitive.
[required] An array of channel objects describing the input data and their
location. Each channel is a named input source. Similar to the
InputDataConfig
attribute in the create_auto_ml_job
input parameters. The supported formats depend on the problem type:
For tabular problem types: S3Prefix
, ManifestFile
.
For image classification: S3Prefix
, ManifestFile
,
AugmentedManifestFile
.
For text classification: S3Prefix
.
For time-series forecasting: S3Prefix
.
For text generation (LLMs fine-tuning): S3Prefix
.
[required] Provides information about encryption and the Amazon S3 output path needed to store artifacts from an AutoML job.
[required] Defines the configuration settings of one of the supported problem types.
[required] The ARN of the role that is used to access the data.
An array of key-value pairs. You can use tags to categorize your Amazon Web Services resources in different ways, such as by purpose, owner, or environment. For more information, see Tagging Amazon Web ServicesResources. Tag keys must be unique per resource.
The security configuration for traffic encryption or Amazon VPC settings.
Specifies a metric to minimize or maximize as the objective of a job. If not specified, the default objective metric depends on the problem type. For the list of default values per problem type, see AutoMLJobObjective.
For tabular problem types: You must either provide both the
AutoMLJobObjective
and indicate the type of supervised learning
problem in AutoMLProblemTypeConfig
(TabularJobConfig.ProblemType
), or none at all.
For text generation problem types (LLMs fine-tuning): Fine-tuning
language models in Autopilot does not require setting the
AutoMLJobObjective
field. Autopilot fine-tunes LLMs without
requiring multiple candidates to be trained and evaluated. Instead,
using your dataset, Autopilot directly fine-tunes your target model
to enhance a default objective metric, the cross-entropy loss. After
fine-tuning a language model, you can evaluate the quality of its
generated text using different metrics. For a list of the available
metrics, see Metrics for fine-tuning LLMs in Autopilot.
Specifies how to generate the endpoint name for an automatic one-click Autopilot model deployment.
This structure specifies how to split the data into train and validation datasets.
The validation and training datasets must contain the same headers. For
jobs created by calling
create_auto_ml_job
, the validation
dataset must be less than 2 GB in size.
This attribute must not be set for the time-series forecasting problem type, as Autopilot automatically splits the input dataset into training and validation sets.
Specifies the compute configuration for the AutoML job V2.