Usage
tsne_grid(data, output_dims, input_dims = ncol(data),
perplexity_range = c(1, min(floor((nrow(data) - 1)/3)), 1000), tries = 10,
iterations = 10000, theta = 0, check_duplicates = FALSE, pca = FALSE,
is_distance = FALSE)
Arguments
data
The data.frame input into t-SNE
output_dims
How many dimensions to output? (increases exponentially the computation time)
input_dims
How many input dimensions to use? (defaults to ncol(data)
) - this should be changed when using pca to a value below the default value
perplexity_range
What hyperparameter interval to look for? (should be formatted as (min, max)) - defaults to c(1, min(floor((nrow(data)-1)/3)), 1000)
- to grid search a seed for a fixed perplexity value, use min = max as inputs - the best pragmatic perpelxity for the lowest loss is typically floor((nrow(data)-1)/3)
. Make sure to avoid very high perplexity (like 1000) on large data (like 10000 observations). You might end up with a never ending tree creation, scaling quadratically (or even worse). By default, it is maxed to 1000.
tries
How many seeds to test t-SNE per perplexity value? (this increases linearly the computation time)
iterations
How many iterations per t-SNE are performed? (this increases approximately linearly the computation time)
theta
Use exact t-SNE (0) or Barnes-Hut t-SNE? (in ]0, 1] interval)
check_duplicates
Should t-SNE check for duplicates? (unlike common beliefs, t-SNE works perfectly with the existance of identical observations)
pca
Should a PCA (Principal Component Analysis) be performed? (note: it is performed every iteration, therefore it is computationally intensive and should be avoided - if you need PCA, please input the PCA instead of the data)
is_distance
Is the input a distance matrix? (assumes the diagonal cuts in half the input data.frame)