Learn R Programming

paws.machine.learning (version 0.9.0)

machinelearning_create_data_source_from_redshift: Creates a DataSource from a database hosted on an Amazon Redshift cluster

Description

Creates a DataSource from a database hosted on an Amazon Redshift cluster. A DataSource references data that can be used to perform either create_ml_model, create_evaluation, or create_batch_prediction operations.

See https://www.paws-r-sdk.com/docs/machinelearning_create_data_source_from_redshift/ for full documentation.

Usage

machinelearning_create_data_source_from_redshift(
  DataSourceId,
  DataSourceName = NULL,
  DataSpec,
  RoleARN,
  ComputeStatistics = NULL
)

Arguments

DataSourceId

[required] A user-supplied ID that uniquely identifies the DataSource.

DataSourceName

A user-supplied name or description of the DataSource.

DataSpec

[required] The data specification of an Amazon Redshift DataSource:

  • DatabaseInformation -

    • DatabaseName - The name of the Amazon Redshift database.

    • ClusterIdentifier - The unique ID for the Amazon Redshift cluster.

  • DatabaseCredentials - The AWS Identity and Access Management (IAM) credentials that are used to connect to the Amazon Redshift database.

  • SelectSqlQuery - The query that is used to retrieve the observation data for the Datasource.

  • S3StagingLocation - The Amazon Simple Storage Service (Amazon S3) location for staging Amazon Redshift data. The data retrieved from Amazon Redshift using the SelectSqlQuery query is stored in this location.

  • DataSchemaUri - The Amazon S3 location of the DataSchema.

  • DataSchema - A JSON string representing the schema. This is not required if DataSchemaUri is specified.

  • DataRearrangement - A JSON string that represents the splitting and rearrangement requirements for the DataSource.

    Sample - "{\"splitting\":{\"percentBegin\":10,\"percentEnd\":60}}"

RoleARN

[required] A fully specified role Amazon Resource Name (ARN). Amazon ML assumes the role on behalf of the user to create the following:

  • A security group to allow Amazon ML to execute the SelectSqlQuery query on an Amazon Redshift cluster

  • An Amazon S3 bucket policy to grant Amazon ML read/write permissions on the S3StagingLocation

ComputeStatistics

The compute statistics for a DataSource. The statistics are generated from the observation data referenced by a DataSource. Amazon ML uses the statistics internally during MLModel training. This parameter must be set to true if the DataSource needs to be used for MLModel training.