Distance Visualizations

In data science, one often encounters similarity/difference constraints (such as distances among leaves in phylogenies), and ideally one could lay out the nodes simply on a 2d page to reveal any hidden patterns.

D3 Force-Layout

D3 Force-Layout: a physics-simulation that uses repulsion-attraction dynamics from the similarity matrix to position nodes in the graph.

tSNE

TSNE minimizes (using gradient descent) with probability theory. A pairwise probability matrix is created, where each entry $P_{j|i}$ corresponds to the similarity of objects $x_i$ and $x_j$. A second, analagous pairwise probability matrix $Q$ is created with lower-dimensional equivalents ($y_i$ and $y_j$). The optimization aims to reduce the KL divergence between the high-dimensional and lower-dimensional representations, i.e. $KL(P \parallel Q)$.