Learn R Programming

Azure Machine Learning SDK for R (preview)

Data scientists and AI developers use the Azure Machine Learning SDK for R to build and run machine learning workflows with Azure Machine Learning.

Azure Machine Learning SDK for R uses the reticulate package to bind to Azure Machine Learning's Python SDK. By binding directly to Python, the Azure Machine Learning SDK for R allows you access to core objects and methods implemented in the Python SDK from any R environment you choose.

Main capabilities of the SDK include:

  • Manage cloud resources for monitoring, logging, and organizing your machine learning experiments.
  • Train models using cloud resources, including GPU-accelerated model training.
  • Deploy your models as webservices on Azure Container Instances (ACI) and Azure Kubernetes Service (AKS).

Please take a look at the package website https://azure.github.io/azureml-sdk-for-r for complete documentation.

Key Features and Roadmap

:heavy_check_mark: feature available :arrows_counterclockwise: in progress :clipboard: planned

FeaturesDescriptionStatus
WorkspaceThe Workspace class is a foundational resource in the cloud that you use to experiment, train, and deploy machine learning models:heavy_check_mark:
ComputeCloud resources where you can train your machine learning models.:heavy_check_mark:
Data Plane ResourcesDatastore, which stores connection information to an Azure storage service, and DataReference, which describes how and where data should be made available in a run.:heavy_check_mark:
ExperimentA foundational cloud resource that represents a collection of trials (individual model runs).:heavy_check_mark:
RunA Run object represents a single trial of an experiment, and is the object that you use to monitor the asynchronous execution of a trial, store the output of the trial, analyze results, and access generated artifacts. You use Run inside your experimentation code to log metrics and artifacts to the Run History service.:heavy_check_mark:
EstimatorA generic estimator to train data using any supplied training script.:heavy_check_mark:
HyperDriveHyperDrive automates the process of running hyperparameter sweeps for an Experiment.:heavy_check_mark:
ModelCloud representations of machine learning models that help you transfer models between local development environments and the Workspace object in the cloud.:heavy_check_mark:
WebserviceModels can be packaged into container images that include the runtime environment and dependencies. Models must be built into an image before you deploy them as a web service. Webservice is the abstract parent class for creating and deploying web services for your models.:heavy_check_mark:
DatasetAn Azure Machine Learning Dataset allows you to explore, transform, and manage your data for various scenarios such as model training and pipeline creation. When you are ready to use the data for training, you can save the Dataset to your Azure ML workspace to get versioning and reproducibility capabilities.:heavy_check_mark:

Installation

Install Conda if not already installed. Choose Python 3.5 or later.

# Install Azure ML SDK from CRAN
install.packages("azuremlsdk")

# Or the development version from GitHub
install.packages("remotes")
remotes::install_github('https://github.com/Azure/azureml-sdk-for-r', build_vignettes = TRUE)

# Then, use `install_azureml()` to install the compiled code from the AzureML Python SDK.
azuremlsdk::install_azureml()

Now, you're ready to get started!

For a more detailed walk-through of the installation process, advanced options, and troubleshooting, see our Installation Guide.

Getting Started

To begin running experiments with Azure Machine Learning, you must establish a connection to your Azure Machine Learning workspace.

  1. If you don't already have a workspace created, you can create one by doing:

    # If you haven't already set up a resource group, set `create_resource_group = TRUE`  
    # and set `resource_group` to your desired resource group name in order to create the resource group 
    # in the same step.
    new_ws <- create_workspace(name = <workspace_name>, 
                               subscription_id = <subscription_id>, 
    			   resource_group = <resource_group_name>, 
    			   location = location, 
    			   create_resource_group = FALSE)

    After the workspace is created, you can save it to a configuration file to the local machine.

    write_workspace_config(new_ws)
  2. If you have an existing workspace associated with your subscription, you can retrieve it from the server by doing:

    existing_ws <- get_workspace(name = <workspace_name>, 
    			     subscription_id = <subscription_id>, 
    			     resource_group = <resource_group_name>)

    Or, if you have the workspace config.json file on your local machine, you can load the workspace by doing:

    loaded_ws <- load_workspace_from_config()

Once you've accessed your workspace, you can begin running and tracking your own experiments with Azure Machine Learning SDK for R.

Take a look at our code samples and end-to-end vignettes for examples of what's possible with the SDK!

Resources

Contribute

We welcome contributions from the community. If you would like to contribute to the repository, please refer to the contribution guide.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Copy Link

Version

Install

install.packages('azuremlsdk')

Monthly Downloads

371

Version

1.10.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Diondra Peck

Last Published

September 22nd, 2020

Functions in azuremlsdk (1.10.0)

cancel_run

Cancel a run
aci_webservice_deployment_config

Create a deployment config for deploying an ACI web service
bandit_policy

Define a Bandit policy for early termination of HyperDrive runs
complete_run

Mark a run as completed.
attach_aks_compute

Attach an existing AKS cluster to a workspace
aks_webservice_deployment_config

Create a deployment config for deploying an AKS web service
container_registry

Specify Azure Container Registry details
choice

Specify a discrete set of options to sample from
bayesian_parameter_sampling

Define Bayesian sampling over a hyperparameter search space
azureml

azureml module User can access functions/modules in azureml that are not exposed through the exported R functions.
cran_package

Specifies a CRAN package to install in environment
create_tabular_dataset_from_delimited_files

Create an unregistered, in-memory Dataset from delimited files.
create_tabular_dataset_from_json_lines_files

Create a TabularDataset to represent tabular data in JSON Lines files (http://jsonlines.org/).
create_file_dataset_from_files

Create a FileDataset to represent file streams.
create_child_runs

Create one or many child runs
convert_to_dataset_with_parquet_files

Convert the current dataset into a FileDataset containing Parquet files.
convert_to_dataset_with_csv_files

Convert the current dataset into a FileDataset containing CSV files.
data_path

Represents a path to data in a datastore.
create_workspace

Create a new Azure Machine Learning workspace
create_aks_compute

Create an AksCompute cluster
create_tabular_dataset_from_parquet_files

Create an unregistered, in-memory Dataset from parquet files.
create_tabular_dataset_from_sql_query

Create a TabularDataset to represent tabular data in SQL databases.
create_aml_compute

Create an AmlCompute cluster
data_type_bool

Configure conversion to bool.
data_type_datetime

Configure conversion to datetime.
delete_workspace

Delete a workspace
download_files_from_run

Download files from a run
deploy_model

Deploy a web service from registered model(s)
download_from_datastore

Download data from a datastore to the local file system
delete_secrets

Delete secrets from a keyvault
drop_columns_from_dataset

Drop the specified columns from the dataset.
delete_webservice

Delete a web service from a given workspace
estimator

Create an estimator
get_best_run_by_primary_metric

Return the best performing run amongst all completed runs
get_child_run_hyperparameters

Get the hyperparameters for all child runs
data_type_double

Configure conversion to 53-bit double.
get_dataset_by_name

Get a registered Dataset from the workspace by its registration name.
get_compute

Get an existing compute cluster
get_child_runs_sorted_by_primary_metric

Get the child runs sorted in descending order by best primary metric
create_child_run

Create a child run
data_type_string

Configure conversion to string.
define_timestamp_columns_for_dataset

Define timestamp columns for the dataset.
dataset_consumption_config

Represent how to deliver the dataset to a compute target.
get_datastore

Get an existing datastore
get_runs_in_experiment

Return a generator of the runs for an experiment
get_run_metrics

Get the metrics logged to a run
delete_compute

Delete a cluster
experiment

Create an Azure Machine Learning experiment
get_input_dataset_from_run

Return the named list for input datasets.
get_webservice_logs

Retrieve the logs for a web service
data_type_long

Configure conversion to 64-bit integer.
filter_dataset_after_time

Filter Tabular Dataset with time stamp columns after a specified start time.
delete_local_webservice

Delete a local web service from the local machine
detach_aks_compute

Detach an AksCompute cluster from its associated workspace
download_from_file_dataset

Download file streams defined by the dataset as local files.
get_child_run_metrics

Get the metrics from all child runs
download_model

Download a model to the local file system
download_file_from_run

Download a file from a run
delete_model

Delete a model from its associated workspace
filter_dataset_before_time

Filter Tabular Dataset with time stamp columns before a specified end time.
get_secrets

Get secrets from a keyvault
get_model

Get a registered model
filter_dataset_between_time

Filter Tabular Dataset between a specified start and end time.
generate_new_webservice_key

Regenerate one of a web service's keys
get_aks_compute_credentials

Get the credentials for an AksCompute cluster
get_current_run

Get the context object for a run
get_webservice_token

Retrieve the auth token for a web service
local_webservice_deployment_config

Create a deployment config for deploying a local web service
invoke_webservice

Call a web service with the provided input
keep_columns_from_dataset

Keep the specified columns and drops all others from the dataset.
log_accuracy_table_to_run

Log an accuracy table metric to a run
get_run

Get an experiment run
get_run_details

Get the details of a run
get_dataset_by_id

Get Dataset by ID.
get_secrets_from_run

Get secrets from the keyvault associated with a run's workspace
hyperdrive_config

Create a configuration for a HyperDrive run
get_child_runs

Get all children for the current run selected by specified filters
generate_entry_script

Generates the control script for the experiment.
filter_dataset_from_recent_time

Filter Tabular Dataset to contain only the specified duration (amount) of recent data.
get_default_datastore

Get the default datastore for a workspace
inference_config

Create an inference configuration for model deployments
get_environment

Get an existing environment
get_run_file_names

List the files that are stored in association with a run
get_default_keyvault

Get the default keyvault for a workspace
get_run_details_with_logs

Get the details of a run along with the log files' contents
github_package

Specifies a Github package to install in environment
get_model_package_container_registry

Get the Azure container registry that a packaged model uses
list_nodes_in_aml_compute

Get the details (e.g IP address, port etc) of all the compute nodes in the compute target
get_file_dataset_paths

Get a list of file paths for each file stream defined by the dataset.
get_model_package_creation_logs

Get the model package creation logs
list_secrets

List the secrets in a keyvault
log_row_to_run

Log a row metric to a run
package_model

Create a model package that packages all the assets needed to host a model as a web service
load_dataset_into_data_frame

Load all records from the dataset into a dataframe.
get_webservice_keys

Retrieve auth keys for a web service
get_webservice

Get a deployed web service
load_workspace_from_config

Load workspace configuration details from a config file
get_workspace

Get an existing workspace
install_azureml

Install azureml sdk package
interactive_login_authentication

Manages authentication and acquires an authorization token in interactive login workflows.
get_workspace_details

Get the details of a workspace
list_supported_vm_sizes

List the supported VM sizes in a region
grid_parameter_sampling

Define grid sampling over a hyperparameter search space
log_predictions_to_run

Log a predictions metric to a run
list_workspaces

List all workspaces that the user has access to in a subscription ID
log_residuals_to_run

Log a residuals metric to a run
log_confusion_matrix_to_run

Log a confusion matrix metric to a run
log_table_to_run

Log a table metric to a run
mount_file_dataset

Create a context manager for mounting file streams defined by the dataset as local files.
lognormal

Specify a normal distribution of the form exp(normal(mu, sigma))
primary_metric_goal

Define supported metric goals for hyperparameter tuning
log_image_to_run

Log an image metric to a run
normal

Specify a real value that is normally-distributed with mean mu and standard deviation sigma
quniform

Specify a uniform distribution of the form round(uniform(min_value, max_value) / q) * q
loguniform

Specify a log uniform distribution
register_dataset

Register a Dataset in the workspace
r_environment

Create an environment
register_do_azureml_parallel

Registers AMLCompute as a parallel backend with the foreach package.
set_secrets

Add secrets to a keyvault
log_list_to_run

Log a vector metric value to a run
log_metric_to_run

Log a metric to a run
qloguniform

Specify a uniform distribution of the form round(exp(uniform(min_value, max_value) / q) * q
promote_headers_behavior

Defines options for how column headers are processed when reading data from files to create a dataset.
take_from_dataset

Take a sample of file streams from top of the dataset by the specified count.
take_sample_from_dataset

Take a random sample of file streams in the dataset approximately by the probability specified.
wait_for_run_completion

Wait for the completion of a run
write_workspace_config

Write out the workspace configuration details to a config file
skip_from_dataset

Skip file streams from the top of the dataset by the specified count.
median_stopping_policy

Define a median stopping policy for early termination of HyperDrive runs
merge_results

Combine the results from the parallel training.
pull_model_package_image

Pull the Docker image from a ModelPackage to your local Docker environment
random_split_dataset

Split file streams in the dataset into two parts randomly and approximately by the percentage specified.
qlognormal

Specify a normal distribution of the form round(exp(normal(mu, sigma)) / q) * q
register_azure_blob_container_datastore

Register an Azure blob container as a datastore
randint

Specify a set of random integers in the range [0, upper)
plot_run_details

Generate table of run details
random_parameter_sampling

Define random sampling over a hyperparameter search space
register_environment

Register an environment in the workspace
register_azure_postgre_sql_datastore

Initialize a new Azure PostgreSQL Datastore.
submit_child_run

Submit an experiment and return the active child run
register_azure_sql_database_datastore

Initialize a new Azure SQL database Datastore.
truncation_selection_policy

Define a truncation selection policy for early termination of HyperDrive runs
submit_experiment

Submit an experiment and return the active created run
register_azure_data_lake_gen2_datastore

Initialize a new Azure Data Lake Gen2 Datastore.
service_principal_authentication

Manages authentication using a service principle instead of a user identity.
register_model_from_run

Register a model for operationalization.
qnormal

Specify a normal distribution of the form round(normal(mu, sigma) / q) * q
register_azure_file_share_datastore

Register an Azure file share as a datastore
register_model

Register a model to a given workspace
split_tasks

Splits the job into parallel tasks.
update_aci_webservice

Update a deployed ACI web service
reload_local_webservice_assets

Reload a local web service's entry script and dependencies
start_logging_run

Create an interactive logging run
update_aks_webservice

Update a deployed AKS web service
set_default_datastore

Set the default datastore for a workspace
unregister_all_dataset_versions

Unregister all versions under the registration name of this dataset from the workspace.
unregister_datastore

Unregister a datastore from its associated workspace
upload_folder_to_run

Upload a folder to a run
resource_configuration

Initialize the ResourceConfiguration.
uniform

Specify a uniform distribution of options to sample from
upload_to_datastore

Upload a local directory to the Azure storage a datastore points to
save_model_package_files

Save a Dockerfile and dependencies from a ModelPackage to your local file system
upload_files_to_datastore

Upload files to the Azure storage a datastore points to
update_local_webservice

Update a local web service
update_aml_compute

Update scale settings for an AmlCompute cluster
view_run_details

Initialize run details widget
upload_files_to_run

Upload files to a run
wait_for_deployment

Wait for a web service to finish deploying
wait_for_model_package_creation

Wait for a model package to finish creating
wait_for_provisioning_completion

Wait for a cluster to finish provisioning