# t-distributed Stochastic Neighbor Embedding says "what"

3 min read

## Contents

## What is t-SNE

t-SNE or t-distributed Stochastic Neighbor Embedding is a dimensionality reduction technique that projects data onto lower dimension (typically 2 or 3) based on a normal distribution for the initial similarity derivation and then a t-distribution for the projected space. The perplexity controls the scaling of the normal distribution used for comparing similarities.

The algorithm preserves clustering, but distorts original distances (as any dimensionality reduction technique would).

### Perplexity

Perplexity can be intuitively thought of as the following: expected density around each point; (loosely as) how to balance attention between local and global aspects of your data; a guess about the number of close neighbors each point has.

### Normal to t

The reason for first using a normal distribution and then a t-distribution is to avoid clumping the points in the projected space as the t-distribution is a bit more thicker at the tails thus providing a bit more slack for the lower similarity points.

### The algorithm

It proceeds as follows:

- Determine unscaled similarity between all points and point of interest based on original t-distributions based on a perplexity
- Do this for any point from selected point
- Iterate over all points
- For each point scale similarities so they all add up to one (dividing by sum of unscaled similarity scores)
- Average simality score for each point (in and out)
- Get matrix of similarity scores.
- Project randomly onto desired number of latent dimensions
- Repeat 1-8 on projection 7 but this time with a t-distribution to derive similarity matrix
- Make matrix derived in 8 converge to the matrix derived in 6 in tiny steps with epsilon learning rate by moving points with higher similarity closer and lower similarity further to each other.

## Example

A simple t-SNE example (supply a `df`

with `text`

column) with TF-IDF sentence vectors. Can be repeated with fastText embeddings or on Orange Data Mining for quick PoCs. The example below creates a nice plotly chart with text that is wrapped at 20 chars.