Learn R Programming

Laurae (version 0.0.0.9001)

tsne_grid: t-SNE grid search function

Description

This function allows you to search a perplexity hyperparameter range along with different seeds. Verbosity is automatic and cannot be removed. In case you need this function without verbosity, please compile the package after removing verbose messages.

Usage

tsne_grid(data, output_dims, input_dims = ncol(data),
  perplexity_range = c(1, min(floor((nrow(data) - 1)/3)), 1000), tries = 10,
  iterations = 10000, theta = 0, check_duplicates = FALSE, pca = FALSE,
  is_distance = FALSE)

Arguments

data
The data.frame input into t-SNE
output_dims
How many dimensions to output? (increases exponentially the computation time)
input_dims
How many input dimensions to use? (defaults to ncol(data)) - this should be changed when using pca to a value below the default value
perplexity_range
What hyperparameter interval to look for? (should be formatted as (min, max)) - defaults to c(1, min(floor((nrow(data)-1)/3)), 1000) - to grid search a seed for a fixed perplexity value, use min = max as inputs - the best pragmatic perpelxity for the lowest loss is typically floor((nrow(data)-1)/3). Make sure to avoid very high perplexity (like 1000) on large data (like 10000 observations). You might end up with a never ending tree creation, scaling quadratically (or even worse). By default, it is maxed to 1000.
tries
How many seeds to test t-SNE per perplexity value? (this increases linearly the computation time)
iterations
How many iterations per t-SNE are performed? (this increases approximately linearly the computation time)
theta
Use exact t-SNE (0) or Barnes-Hut t-SNE? (in ]0, 1] interval)
check_duplicates
Should t-SNE check for duplicates? (unlike common beliefs, t-SNE works perfectly with the existance of identical observations)
pca
Should a PCA (Principal Component Analysis) be performed? (note: it is performed every iteration, therefore it is computationally intensive and should be avoided - if you need PCA, please input the PCA instead of the data)
is_distance
Is the input a distance matrix? (assumes the diagonal cuts in half the input data.frame)

Value

A list with the best (lowest loss at a specific iteration) t-SNE elements from Rtsne

Examples

Run this code
#tsne_model <- tsne_grid(initial_diag = initial_diag, dims = 3,
#perplexity_range = c(floor((ncol(initial_diag)-1)/3), floor((ncol(initial_diag)-1)/3)),
#tries = 100, iterations = 10000, theta = 0.0, check_duplicates = FALSE,
#pca = FALSE, is_distance = TRUE)

Run the code above in your browser using DataLab