Coursera Learner working on a presentation with Coursera logo and


Coursera Learner working on a presentation with Coursera logo and

Below, implementations of t-SNE in various languages are available for download. a number of these implementations were developed by me, and a few by other contributors. For the quality t-SNE method, implementations in Matlab, C++, CUDA, Python, Torch, R, Julia, and JavaScript are available. additionally , we offer a Matlab implementation of parametric t-SNE (described here). Finally, we offer a Barnes-Hut implementation of t-SNE (described here), which is that the fastest t-SNE implementation so far , and which scales far better to big data sets.

You are liberal to use, modify, or redistribute this software in any way you would like , but just for non-commercial purposes. the utilization of the software is at your own risk; the authors aren’t liable for any damage as a result from errors within the software.

NOTE: t-SNE is now built-in functionality in Matlab and in SPSS!

Matlab implementation (user guide) All platforms

CUDA implementation (by David, Roshan, and Forrest; see paper) All platforms

Python implementation All platforms

Go implementation (by Daniel Salvadori) All platforms

Torch implementation All platforms

Julia implementation (by Leif Jonsson) All platforms

Java implementation (by Leif Jonsson) All platforms

R implementation (by Justin) All platforms

JavaScript implementation (by Andrej; online demonstration) All platforms

Parametric t-SNE (outdated; see here) All platforms

Barnes-Hut t-SNE (C++, Matlab, Python, Torch, and R wrappers; see here) All platforms / Github

MNIST Dataset Matlab file


Some results of our experiments with t-SNE are available for download below. within the plots of the Netflix dataset and therefore the words dataset, the dimension is encoded by means of a color encoding (similar words/movies are approximate and have an equivalent color). Most of the ‘errors’ within the embeddings (such as within the 20 newsgroups) are literally thanks to ‘errors’ within the features t-SNE was applied on. In many of those examples, the embeddings have a 1-NN error that’s like that of the first high-dimensional features.

MNIST dataset (in 2D) JPG

MNIST dataset (in 3D) MOV

Olivetti faces dataset (in 2D) JPG

COIL-20 dataset (in 2D) JPG

Netflix dataset (in 3D) on Russ’s RBM features JPG

Words dataset (in 3D) on Andriy’s semantic features JPG

20 Newsgroups dataset (in 2D) on Simon’s discLDA features JPG

Reuters dataset (in 2D) landmark t-SNE using semantic hashing JPG

NIPS dataset (in 2D) on co-authorship data (1988-2003) JPG

NORB dataset (in 2D) by Vinod JPG

Words (in 2D) by Joseph on features learned by Ronan and Jason PNG

CalTech-101 on SIFT bag-of-words features JPG

S&P 500 by Steve on information about daily returns on company stock PNG

Interactive map of scientific journals on data by Nees-Jan and Ludo, using VOSviewer Java 1.6

Relation between World Economic Forum councils Link

ImageNet by Andrej on Caffe convolutional net features Link

Multiple maps visualizations Link

Allen Brain data Link

You may right-click on the pictures and choose “Show image in new tab” to ascertain a bigger version of every of the pictures .

You may even be curious about these blog posts describing applications of t-SNE by Andrej Karpathy, Paul Mineiro, Alexander Fabisch, Justin Donaldson, Henry Tan, and Cyrille Rossant.


I can’t find out the file format for the binary implementations of t-SNE?

The format is described within the User’s guide. you furthermore may might want to possess a glance at the Matlab or Python wrapper code: it’s code that writes the data-file and reads the results-file which will be ported fairly easily to other languages. Please note that the file format is binary (so don’t attempt to write or read text!), which it doesn’t contain any spaces, separators, newlines or whatsoever.

How am i able to asses the standard of the visualizations that t-SNE constructed?

Preferably, just check out them! Notice that t-SNE doesn’t retain distances but probabilities, so measuring some error between the Euclidean distances in high-D and low-D is useless. However, if you employ an equivalent data and perplexity, you’ll compare the Kullback-Leibler divergences that t-SNE reports. it’s perfectly fine to run t-SNE ten times, and choose the answer with rock bottom KL divergence.

How should I set the perplexity in t-SNE?

The performance of t-SNE is fairly robust under different settings of the perplexity. the foremost appropriate value depends on the density of your data. Loosely speaking, one could say that a bigger / denser dataset requires a bigger perplexity. Typical values for the perplexity range between 5 and 50.

What is perplexity anyway?

Perplexity may be a measure for information that’s defined as 2 to the facility of the Shannon entropy. The perplexity of a good die with k sides is adequate to k. In t-SNE, the perplexity could also be viewed as a knob that sets the amount of effective nearest neighbors. it’s comparable the amount of nearest neighbors k that’s employed in many manifold learners.

Every time I run t-SNE, i buy a (slightly) different result?

In contrast to, e.g., PCA, t-SNE features a non-convex objective function. the target function is minimized employing a gradient descent optimization that’s initiated randomly. As a result, it’s possible that different runs offer you different solutions. Notice that it’s perfectly fine to run t-SNE variety of times (with an equivalent data and parameters), and to pick the visualization with rock bottom value of the target function as your final visualization.

When I run t-SNE, i buy a wierd ‘ball’ with uniformly distributed points?

This usually indicates you set your perplexity way too high. All points now want to be equidistant. The result you bought is that the closest you’ll get to equidistant points as is feasible in two dimensions. If lowering the perplexity doesn’t help, you would possibly have run into the matter described within the next question. Similar effects can also occur once you use highly non-metric similarities as input.

When I run t-SNE, it reports a really low error but the results look crappy?

Presumably, your data contains some very large numbers, causing the binary look for the right perplexity to fail. within the beginning of the optimization, t-SNE then reports a minimum, mean, and maximum value for sigma of 1. this is often a symbol that something went wrong! Just divide your data or distances by an enormous number, and check out again.

I tried everything you said, but t-SNE still doesn’t seem to figure very well?

Maybe there’s something weird in your data. As a sanity check, try running PCA on your data to scale back it to 2 dimensions. If this also gives bad results, then maybe there’s not considerably nice structure in your data within the first place. If PCA works well but t-SNE doesn’t, i’m fairly sure you probably did something wrong. Just check your code again until you found the bug! If nothing works, be happy to drop me a line.

Can I use a pairwise Euclidean distance matrix as input into t-SNE?

Yes you can! Download the Matlab implementation, and use your pairwise Euclidean distance matrix as input into the tsne_d.m function.

Can I use a pairwise similarity matrix as input into t-SNE?

Yes you can! as an example , we successfully applied t-SNE on a dataset of word association data. Download the Matlab implementation, confirm the diagonal of the pairwise similarity matrix contains only zeros, symmetrize the pairwise similarity matrix, and normalize it to sum up to at least one . you’ll now use the result as input into the tsne_p.m function.

Can I use t-SNE to embed data in additional than two dimensions?

Well, yes you’ll , but there’s a catch. The key characteristic of t-SNE is that it solves a drag referred to as the crowding problem. The extent to which this problem occurs depends on the ratio between the intrinsic data dimensionality and therefore the embedding dimensionality. So, if you embed in, say, thirty dimensions, the crowding problem is a smaller amount severe than once you embed in two dimensions. As a result, it often works better if you increase the degrees of freedom of the t-distribution when embedding into thirty dimensions (or if you are trying to embed intrinsically very low-dimensional data like Swiss roll). More details about this are described within the AI-STATS paper.

Why doesn’t t-SNE work also as LLE or Isomap on Swiss roll data?

When embedding Swiss roll data, the crowding problem doesn’t apply. So you’ll need to use a lighter-tailed t-distribution to embed Swiss toll successfully (see above). But frankly… who cares about Swiss rolls once you can embed complex real-world data nicely?

Once I even have a t-SNE map, how am i able to embed incoming test points therein map?

t-SNE learns a non-parametric mapping, which suggests that it doesn’t learn a particular function that maps data from the input space to the map. Therefore, it’s impossible to embed test points in an existing map (although you’ll re-run t-SNE on the complete dataset). a possible approach to affect this is able to be to coach a multivariate regressor to predict the map location from the input file . Alternatively, you’ll also make such a regressor minimize the t-SNE loss directly, which is what I did during this paper.


Weekly newsletter

No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.