Fit a deep neural network with optional pre-training and one of various fine-tuning algorithms.
darch(x, ...)# S3 method for default
darch(x, y, layers = 10, ..., autosave = F,
autosave.epochs = round(darch.numEpochs/20),
autosave.dir = "./darch.autosave", autosave.trim = F, bp.learnRate = 1,
bp.learnRateScale = 1, bootstrap = F, bootstrap.unique = T,
bootstrap.num = 0, cg.length = 2, cg.switchLayers = 1, darch = NULL,
darch.batchSize = 1, darch.dither = F, darch.dropout = 0,
darch.dropout.dropConnect = F, darch.dropout.momentMatching = 0,
darch.dropout.oneMaskPerEpoch = F, darch.elu.alpha = 1,
darch.errorFunction = if (darch.isClass) crossEntropyError else mseError,
darch.finalMomentum = 0.9, darch.fineTuneFunction = backpropagation,
darch.initialMomentum = 0.5, darch.isClass = T,
darch.maxout.poolSize = 2, darch.maxout.unitFunction = linearUnit,
darch.momentumRampLength = 1, darch.nesterovMomentum = T,
darch.numEpochs = 100, darch.returnBestModel = T,
darch.returnBestModel.validationErrorFactor = 1 - exp(-1),
darch.stopClassErr = -Inf, darch.stopErr = -Inf,
darch.stopValidClassErr = -Inf, darch.stopValidErr = -Inf,
darch.trainLayers = T, darch.unitFunction = sigmoidUnit,
darch.weightDecay = 0,
darch.weightUpdateFunction = weightDecayWeightUpdate, dataSet = NULL,
dataSetValid = NULL,
generateWeightsFunction = generateWeightsGlorotUniform, gputools = F,
gputools.deviceId = 0, logLevel = NULL, normalizeWeights = F,
normalizeWeightsBound = 15, paramsList = list(),
preProc.factorToNumeric = F, preProc.factorToNumeric.targets = F,
preProc.fullRank = T, preProc.fullRank.targets = F,
preProc.orderedToFactor.targets = T, preProc.params = F,
preProc.targets = F, rbm.allData = F, rbm.batchSize = 1,
rbm.consecutive = T, rbm.errorFunction = mseError,
rbm.finalMomentum = 0.9, rbm.initialMomentum = 0.5, rbm.lastLayer = 0,
rbm.learnRate = 1, rbm.learnRateScale = 1, rbm.momentumRampLength = 1,
rbm.numCD = 1, rbm.numEpochs = 0, rbm.unitFunction = sigmoidUnitRbm,
rbm.updateFunction = rbmUpdate, rbm.weightDecay = 2e-04, retainData = F,
rprop.decFact = 0.5, rprop.incFact = 1.2, rprop.initDelta = 1/80,
rprop.maxDelta = 50, rprop.method = "iRprop+", rprop.minDelta = 1e-06,
seed = NULL, shuffleTrainData = T, weights.max = 0.1,
weights.mean = 0, weights.min = -0.1, weights.sd = 0.01,
xValid = NULL, yValid = NULL)
# S3 method for formula
darch(x, data, layers, ..., xValid = NULL, dataSet = NULL,
dataSetValid = NULL, logLevel = NULL, paramsList = list(),
darch = NULL)
# S3 method for DataSet
darch(x, ...)
Input data matrix or data.frame
(darch.default
) or formula
(darch.formula
) or
'>DataSet
(darch.DataSet
).
Additional parameters.
Target data matrix or data.frame
, if x
is an
input data matrix or data.frame
.
Vector containing one integer for the number of neurons of
each layer. Defaults to c(a
, 10, b
), where a
is the
number of columns in the training data and b
the number of columns
in the targets. If this has length 1, it is used as the number of neurons
in the hidden layer, not as the number of layers!
After how many epochs should auto-saving happen, by default after every 5 1, the network will only be saved once when thee fine-tuning is done.
Directory for the autosave files, the file names will be e.g. autosave_010.net for the DArch instance after 10 epochs
Whether to trim the network before saving it. This will remove the dataset and the layer weights, resulting in a network that is no longer usable for predictions or training. Useful when only statistics and settings need to be stored.
Learning rates for backpropagation, length is either one or the same as the number of weight matrices when using different learning rates for each layer.
The learn rate is multiplied by this value after each epoch.
Logical indicating whether to use bootstrapping to create a training and validation data set from the given training data.
Logical indicating whether to take only unique
samples for the training (TRUE
, default) or take all drawn
samples (FALSE
), which will results in a bigger training set with
duplicates. Note: This is ignored if bootstrap.num
is
greater than 0.
If this is greater than 0, bootstrapping will draw this number of training samples without replacement.
Numbers of line search
Indicates when to train the full network instead of only the upper two layers
Batch size, i.e. the number of training samples that are presented to the network before weight updates are performed, for fine-tuning.
Whether to apply dither to numeric columns in the training input data.
Dropout rates. If this is a vector it will be treated
as the dropout rates for each individual layer. If one element is missing,
the input dropout will be set to 0. When enabling
darch.dropout.dropConnect
, this vector needs an additional element
(one element per weight matrix between two layers as opposed to one
element per layer excluding the last layer).
Whether to use DropConnect instead of
dropout for the hidden layers. Will use darch.dropout
as the
dropout rates.
How many iterations to perform during moment matching for dropout inference, 0 to disable moment matching.
Whether to generate a new mask for each
batch (FALSE
, default) or for each epoch (TRUE
).
Alpha parameter for the exponential linear unit
function. See exponentialLinearUnit
.
Error function during fine-tuning. Possible error
functions include mseError
,rmseError
, and
crossEntropyError
.
Final momentum during fine-tuning.
Fine-tuning function. Possible values include
backpropagation
(default), rpropagation
,
minimizeClassifier
and minimizeAutoencoder
(unsupervised).
Initial momentum during fine-tuning.
Whether output should be treated as class labels during fine-tuning and classification rates should be printed.
Pool size for maxout units, when
using the maxout acitvation function. See maxoutUnit
.
Inner unit function used by maxout. See
darch.unitFunction
for possible unit functions.
After how many epochs, relative to
the overall number of epochs trained, should the momentum reach
darch.finalMomentum
?
A value of 1 indicates that the darch.finalMomentum
should be
reached in the final epoch, a value of 0.5 indicates that
darch.finalMomentum
should be reached after half of the training is
complete. Note that this will lead to bumps in the momentum ramp if
training is resumed with the same parameters for
darch.initialMomentum
and darch.finalMomentum
. Set
darch.momentumRampLength
to 0 to avoid this problem when resuming
training.
Whether to use Nesterov Accelerated Momentum. (NAG) for gradient descent based fine-tuning algorithms.
Number of epochs of fine-tuning.
Logical indicating whether to return the best model at the end of training, instead of the last.
When evaluating models
with validation data, how high should the validation error be valued,
compared to the training error? This is a value between 0 and 1. By
default, this value is 1 - exp(-1)
. The training error factor
and the validation error factor will always add to 1, so if you pass 1
here, the training error will be ignored, and if you pass 0 here, the
validation error will be ignored.
When the classification error is lower than or equal to this value, training is stopped (0..100).
When the value of the error function is lower than or equal to this value, training is stopped.
When the classification error on the validation data is lower than or equal to this value, training is stopped (0..100).
When the value of the error function on the validation data is lower than or equal to this value, training is stopped.
Either TRUE to train all layers or a mask containing TRUE for all layers which should be trained and FALSE for all layers that should not be trained (no entry for the input layer).
Layer function or vector of layer functions of
length number of layers
- 1. Note that the first entry signifies
the layer function between layers 1 and 2, i.e. the output of layer 2.
Layer 1 does not have a layer function, since the input values are used
directly. Possible unit functions include linearUnit
,
sigmoidUnit
, tanhUnit
,
rectifiedLinearUnit
, softplusUnit
,
softmaxUnit
, and maxoutUnit
.
Weight decay factor, defaults to 0
. All
weights will be multiplied by (1 - darch.weightDecay
) prior to each
weight update.
Weight update function or vector of weight
update functions, very similar to darch.unitFunction
. Possible
weight update functions include weightDecayWeightUpdate
and
maxoutWeightUpdate
Note that maxoutWeightUpdate
must be used on the layer after the maxout activation function!
Weight generation function or vector of layer
generation functions of length number of layers
- 1. Possible
weight generation functions include generateWeightsUniform
(default), generateWeightsNormal
,
generateWeightsGlorotNormal
,
generateWeightsGlorotUniform
,
generateWeightsHeNormal
, and
generateWeightsHeUniform
.
Logical indicating whether to use gputools for matrix multiplication, if available.
Integer specifying the device to use for GPU
matrix multiplication. See chooseGpu
.
futile.logger
log level. Uses the currently
set log level by default, which is futile.logger::flog.info
if it
was not changed. Other available levels include, from least to most
verbose: FATAL
, ERROR
, WARN
, DEBUG
, and
TRACE
.
Logical indicating whether to normalize weights (L2
norm = normalizeWeightsBound
).
Upper bound on the L2 norm of incoming weight
vectors. Used only if normalizeWeights
is TRUE
.
List of parameters, can include and does overwrite specified parameters listed above. Primary for convenience or for use in scripts.
Whether all factors should be converted to numeric.
Whether all factors should be converted to numeric in the target data.
Whether to use full rank encoding. See preProcess for details.
Whether to use full rank encoding for target data. See preProcess for details.
Whether ordered factors in the target
data should be converted to unordered factors. Note: Ordered
factors are converted to numeric by dummyVars
and no
longer usable for classification tasks.
List of parameters to pass to the
preProcess
function for the input data or
FALSE
to disable input data pre-processing.
Whether target data is to be centered and
scaled. Unlike preProc.params
, this is just a logical
turning pre-processing for target data on or off, since this
pre-processing has to be reverted when predicting new data. Most useful
for regression tasks. Note: This will skew the raw network error.
Logical indicating whether to use training and validation data for pre-training. Note: This also applies when using bootstrapping.
Pre-training batch size.
Logical indicating whether to train the RBMs one at
a time for rbm.numEpochs
epochs (TRUE
, default) or
alternatingly training each RBM for one epoch at a time (FALSE
).
Final momentum during pre-training.
Initial momentum during pre-training.
Numeric
indicating at which layer to stop the
pre-training. Possible values include 0
, meaning that all layers
are trained; positive integers, meaning to stop training after the RBM
where rbm.lastLayer
forms the visible layer; negative integers,
meaning to stop the training at rbm.lastLayer
RBMs from the top
RBM.
Learning rate during pre-training.
The learn rates will be multiplied with this value after each epoch.
After how many epochs, relative to
rbm.numEpochs
, should the momentum reach rbm.finalMomentum
?
A value of 1 indicates that the rbm.finalMomentum
should be reached
in the final epoch, a value of 0.5 indicates that rbm.finalMomentum
should be reached after half of the training is complete.
Number of full steps for which contrastive divergence is performed. Increasing this will slow training down considerably.
Number of pre-training epochs. Note: When
passing a value other than 0
here and also passing an existing
'>DArch
instance via the darch
parameter, the
weights of the network will be completely reset!
Pre-training is essentially a
form of advanced weight initialization and it makes no sense to perform
pre-training on a previously trained network.
Unit function during pre-training. Possible
functions include sigmoidUnitRbm
(default),
tanhUnitRbm
, and linearUnitRbm
.
Update function during pre-training. Currently,
darch
only provides rbmUpdate
.
Pre-training weight decay. Weights will be multiplied
by (1 - rbm.weightDecay
) prior to each weight update.
Decreasing factor for the training. Default is 0.6
.
Increasing factor for the training Default is 1.2
.
Initialisation value for the update. Default is 0.0125
.
Upper bound for step size. Default is 50
The method for the training. Default is "iRprop+"
Lower bound for step size. Default is 0.000001
Allows the specification of a seed which will be set via
set.seed
. Used in the context of darchBench
.
Logical indicating whether to shuffle training data before each epoch.
max
parameter to the runif function.
mean
parameter to the rnorm function.
min
parameter to the runif function.
sd
parameter to the rnorm function.
Validation input data matrix or data.frame
.
Validation target data matrix or data.frame
, if
x
is a data matrix or data.frame
.
data.frame
containing the dataset, if x
is
a formula
.
The darch package implements Deep Architecture Networks and restricted Boltzmann machines.
The creation of this package is motivated by the papers from G. Hinton et. al. from 2006 (see references for details) and from the MATLAB source code developed in this context. This package provides the possibility to generate deep architecture networks (darch) like the deep belief networks from Hinton et. al.. The deep architectures can then be trained with the contrastive divergence method. After this pre-training it can be fine tuned with several learning methods like backpropagation, resilient backpropagation and conjugate gradients as well as more recent techniques like dropout and maxout.
See https://github.com/maddin79/darch for further information, documentation, and releases.
Package: | darch |
Type: | Package |
Version: | 0.10.0 |
Date: | 2015-11-12 |
License: | GPL-2 or later |
LazyLoad: | yes |
Hinton, G. E., S. Osindero, Y. W. Teh, A fast learning algorithm for deep belief nets, Neural Computation 18(7), S. 1527-1554, DOI: 10.1162/neco.2006.18.7.1527 2006.
Hinton, G. E., R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313(5786), S. 504-507, DOI: 10.1126/science.1127647, 2006.
Hinton, Geoffrey E. et al. (2012). "Improving neural networks by preventing coadaptation of feature detectors". In: Clinical Orthopaedics and Related Research abs/1207.0580. URL : http://arxiv.org/abs/1207.0580.
Goodfellow, Ian J. et al. (2013). "Maxout Networks". In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, pp. 1319-1327. URL: http://jmlr.org/proceedings/papers/v28/goodfellow13.html.
Drees, Martin (2013). "Implementierung und Analyse von tiefen Architekturen in R". German. Master's thesis. Fachhochschule Dortmund.
Rueckert, Johannes (2015). "Extending the Darch library for deep architectures". Project thesis. Fachhochschule Dortmund. URL: http://static.saviola.de/publications/rueckert_2015.pdf.
Other darch interface functions: darchBench
,
darchTest
, plot.DArch
,
predict.DArch
, print.DArch
# NOT RUN {
data(iris)
model <- darch(Species ~ ., iris)
print(model)
predictions <- predict(model, newdata = iris, type = "class")
cat(paste("Incorrect classifications:", sum(predictions != iris[,5])))
trainData <- matrix(c(0,0,0,1,1,0,1,1), ncol = 2, byrow = TRUE)
trainTargets <- matrix(c(0,1,1,0), nrow = 4)
model2 <- darch(trainData, trainTargets, layers = c(2, 10, 1),
darch.numEpochs = 500, darch.stopClassErr = 0, retainData = T)
e <- darchTest(model2)
cat(paste0("Incorrect classifications on all examples: ", e[3], " (",
e[2], "%)\n"))
plot(model2)
# }
# NOT RUN {
#
# More examples can be found at
# https://github.com/maddin79/darch/tree/v0.12.0/examples
# }
Run the code above in your browser using DataLab