AutoEncoder-class: AutoEncoder

Description

An S4 Class implementing an Autoencoder

Arguments

Slots

fun: A function that does the embedding and returns a dimRedResult object.

stdpars

The standard parameters for the function.

General usage

Dimensionality reduction methods are S4 Classes that either be used directly, in which case they have to be initialized and a full list with parameters has to be handed to the @fun() slot, or the method name be passed to the embed function and parameters can be given to the ..., in which case missing parameters will be replaced by the ones in the @stdpars.

Parameters

Autoencoder can take the following parameters:

ndim: The number of dimensions for reduction.
n_hidden: The number of neurons in the hidden layers, the length specifies the number of layers, the length must be impair, the middle number must be the same as ndim.
activation: The activation functions for the layers, one of "tanh", "sigmoid", "relu", "elu", everything else will silently be ignored and there will be no activation function for the layer.
weight_decay: the coefficient for weight decay, set to 0 if no weight decay desired.
learning_rate: The learning rate for gradient descend
graph: Optional: A list of bits and pieces that define the autoencoder in tensorflow, see details.
keras_graph: Optional: A list of keras layers that define the encoder and decoder, specifying this, will ignore all other topology related variables, see details.
batchsize: If NA, all data will be used for training, else only a random subset of size batchsize will be used
n_steps: the number of training steps.

Further training a model

If the model did not converge in the first training phase or training with different data is desired, the dimRedResult object may be passed as autoencoder parameter; In this case all topology related parameters will be ignored.

Using Keras layers

The encoder and decoder part can be specified using a list of keras layers. This requires a list with two entries, encoder should contain a LIST of keras layers WITHOUT the layer_input that will be concatenated in order to form the encoder part. decoder should be defined accordingly, the output of decoder must have the same number of dimensions as the input data.

Using Tensorflow

The model can be entirely defined in tensorflow, it must contain a list with the following entries:

encoder: A tensor that defines the encoder.
decoder: A tensor that defines the decoder.
network: A tensor that defines the reconstruction (encoder + decoder).
loss: A tensor that calculates the loss (network + loss function).
in_data: A placeholder that points to the data input of the network AND the encoder.
in_decoder: A placeholder that points to the input of the decoder.
session: A tensorflow Session object that holds the values of the tensors.

Implementation

Uses tensorflow as a backend, for details an problems relating tensorflow, see https://tensorflow.rstudio.com.

Details

There are several ways to specify an autoencoder, the simplest is to pass the number of neurons per layer in n_hidden, this must be a vector of integers of impair length and it must be symmetric and the middle number must be equal to ndim, For every layer an activation function can be specified with activation.

For regularization weight decay can be specified by setting weight_decay > 0.

Currently only a gradient descent optimizer is used, the learning rate can be specified by setting learning_rate. The learner can operate on batches if batchsize is not NA. The number of steps the learner uses is specified using n_steps.

Examples

Run this code

# NOT RUN {
dat <- loadDataSet("3D S Curve")

## use the S4 Class directly:
autoenc <- AutoEncoder()
emb <- autoenc@fun(dat, autoenc@stdpars)

## simpler, use embed():
emb2 <- embed(dat, "AutoEncoder")

plot(emb, type = "2vars")

samp <- sample(floor(nrow(dat) / 10))
embsamp <- autoenc@fun(dat[samp], autoenc@stdpars)
embother <- embsamp@apply(dat[-samp])
plot(embsamp, type = "2vars")
points(embother@data)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab