Learn R Programming

deeplearning

Create and train deep neural network of ReLU type with SGD and batch normalization

About

The deeplearning package is an R package that implements deep neural networks in R. It employes Rectifier Linear Unit functions as its building blocks and trains a neural network with stochastic gradient descent method with batch normalization to speed up the training and promote regularization. Neural networks of such kind of architecture and training methods are state of the art and even achieved suplassing human-level performance in ImageNet competition. The deeplearning package is inspired by another R package darch which implements layerwise Restricted Boltzmann Machine pretraining and dropout and uses its class DArch as the default class.

Installtion

Install deeplearning from CRAN

install.packages("deeplearning")

Or install it from github

devtools::install_github("rz1988/deeplearning")

Use deeplearning

Using the deeplearning package is designed to be easy and fun. It only takes two steps to run your first neural network.

In step one, the user will create a new neural network. You will need to specify the strucutre of the neural network which are the number of layers and neurons in the network and the type of activation functions. The default activation is rectifier linear unit function for the hidden layers but you can also use other types of activation such as sigmoidal function or write your own activation function.

In step two, the user will train the neural network with a training input and a traing target. There are a number of other training parameters. For how to choose these training parameters please refer to https://github.com/rz1988/deeplearning.

Examples

Train a neural networ for regression

input <- matrix(runif(1000), 500, 2)
input_valid <- matrix(runif(100), 50, 2)
target <- rowSums(input + input^2)
target_valid <- rowSums(input_valid + input_valid^2)


# create a new deep neural network for classificaiton
dnn_regression <- new_dnn(
                          c(2, 50, 50, 20, 1),  # The layer structure of the deep neural network.
                                                # The first element is the number of input variables.
                                                # The last element is the number of output variables.
                          hidden_layer_default = rectified_linear_unit_function, 
                          # for hidden layers, use rectified_linear_unit_function
                          output_layer_default = linearUnitDerivative # for regression, use linearUnitDerivative function
                          )

dnn_regression <- train_dnn(
                     dnn_regression,

                     # training data
                     input, # input variable for training
                     target, # target variable for training
                     input_valid, # input variable for validation
                     target_valid, # target variable for validation

                     # training parameters
                     learn_rate_weight = exp(-8) * 10, # learning rate for weights, higher if use dropout
                     learn_rate_bias = exp(-8) * 10, # learning rate for biases, hihger if use dropout
                     learn_rate_gamma = exp(-8) * 10, # learning rate for the gamma factor used
                     batch_size = 10, # number of observations in a batch during training. Higher for faster training. Lower for faster convergence
                     batch_normalization = T, # logical value, T to use batch normalization
                     dropout_input = 0.2, # dropout ratio in input.
                     dropout_hidden = 0.5, # dropout ratio in hidden layers
                     momentum_initial = 0.6, # initial momentum in Stochastic Gradient Descent training
                     momentum_final = 0.9, # final momentum in Stochastic Gradient Descent training
                     momentum_switch = 100, # after which the momentum is switched from initial to final momentum
                     num_epochs = 300, # number of iterations in training

                     # Error function
                     error_function = meanSquareErr, # error function to minimize during training. For regression, use meanSquareErr
                     report_classification_error = F # whether to print classification error during training
)

# the prediciton by dnn_regression
pred <- predict(dnn_regression)

# calculate the r-squared of the prediciton
rsq(dnn_regression)

# calcualte the r-squared of the prediciton in validation
rsq(dnn_regression, input = input_valid, target = target_valid)

Train a neural network for classification

input <- matrix(runif(1000), 500, 2)
input_valid <- matrix(runif(100), 50, 2)
target <- (cos(rowSums(input + input^2)) > 0.5) * 1
target_valid <- (cos(rowSums(input_valid + input_valid^2)) > 0.5) * 1

# create a new deep neural network for classificaiton
dnn_classification <- new_dnn(
  c(2, 50, 50, 20, 1),  # The layer structure of the deep neural network.
                        # The first element is the number of input variables.
                        # The last element is the number of output variables.
  hidden_layer_default = rectified_linear_unit_function, # for hidden layers, use rectified_linear_unit_function
  output_layer_default = sigmoidUnitDerivative # for classification, use sigmoidUnitDerivative function
)

dnn_classification <- train_dnn(
  dnn_classification,

  # training data
  input, # input variable for training
  target, # target variable for training
  input_valid, # input variable for validation
  target_valid, # target variable for validation

  # training parameters
  learn_rate_weight = exp(-8) * 10, # learning rate for weights, higher if use dropout
  learn_rate_bias = exp(-8) * 10, # learning rate for biases, hihger if use dropout
  learn_rate_gamma = exp(-8) * 10, # learning rate for the gamma factor used
  batch_size = 10, # number of observations in a batch during training. Higher for faster training. Lower for faster convergence
  batch_normalization = T, # logical value, T to use batch normalization
  dropout_input = 0.2, # dropout ratio in input.
  dropout_hidden = 0.5, # dropout ratio in hidden layers
  momentum_initial = 0.6, # initial momentum in Stochastic Gradient Descent training
  momentum_final = 0.9, # final momentum in Stochastic Gradient Descent training
  momentum_switch = 100, # after which the momentum is switched from initial to final momentum
  num_epochs = 100, # number of iterations in training

  # Error function
  error_function = crossEntropyErr, # error function to minimize during training. For regression, use crossEntropyErr
  report_classification_error = T # whether to print classification error during training
)

# the prediciton by dnn_regression
pred <- predict(dnn_classification)

hist(pred)

# calculate the r-squared of the prediciton
AR(dnn_classification)

# calcualte the r-squared of the prediciton in validation
AR(dnn_classification, input = input_valid, target = target_valid)

# print the layer weights
# this function can print heatmap, histogram, or a surface
print_weight(dnn_regression, 1, type = "heatmap")

print_weight(dnn_regression, 2, type = "surface")

print_weight(dnn_regression, 3, type = "histogram")

References

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, 2013, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014) 1929-1958

Sergey Ioffe, Christian Szegedy, 2015, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, 2015.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, 2015, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, arXiv

X. Glorot, A. Bordes, and Y. Bengio, 2011,Deep sparse rectifier networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, pages 315–323

Drees, Martin (2013). "Implementierung und Analyse von tiefen Architekturen in R". German. Master's thesis. Fachhochschule Dortmund.

Rueckert, Johannes (2015). "Extending the Darch library for deep architectures". Project thesis. Fachhochschule Dortmund. URL: saviola.de

Copy Link

Version

Install

install.packages('deeplearning')

Monthly Downloads

10

Version

0.1.0

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

April 11th, 2016

Functions in deeplearning (0.1.0)

convert_categorical

Data proprosess function that covnerts a categorical input to continuous input or vectorize it
batch_normalization_differential

Function that calcualtes the differentials in the batch normalization mode
classification_error

Calculates the classification error
rectified_linear_unit_function

Rectified Linear Unit Function
rsq

Calculate the RSQ of a regression model Utilitiy function that calcualtes RSQ of a model. It measures the goodness-of- fit of a regression model.
verticalize

Creates a matrix by repeating a row vector N times
batch_normalization

Batch Normalization Function that normalizes the input before applying non-linearity
crossEntropyErr

Calculates the cross entropy error
AR.numeric

Calculates the Accruacy Ratio of a given set of probability
reset_population_mu_sigma

Resets the mu and sigmas of a darch instance to 0 and 1
AR.default

Calculates the Accruacy Ratio of a given set of probability
AR

Calculates the Accuracy Ratio of a classifier
generateDropoutMasksForDarch

Generates dropout masks for dnn
calcualte_population_mu_sigma

Calculates the mu and sigmas of a darch instance
backpropagate_delta_bn

Calculates the delta functions using backpropagation
rsq.lm

Utilitiy function that calcualtes RSQ of a linear model
generateDropoutMask

Generates the dropout mask for the deep neural network
train_dnn

Train a deep neural network
run_dnn

Execution function that runs in the batch normalization mode
finetune_SGD_bn

Updates a deep neural network's parameters using stochastic gradient descent method and batch normalization
matMult

Calculates the outer product of two matricies
new_dnn

Creats a new instance of darch class
print_weight

Prints out the weight of a deep neural network
applyDropoutMask

Applies the given dropout mask to the given data row-wise.
rsq.DArch

Utilitiy function that calcualtes RSQ of a DArch instance
meanSquareErr

Calculates the mean squared error
AR.DArch

Calculates the Accruacy Ratio of a given set of probability