Sparsity as Robustness

word2vec was an important breakthrough in NLP. For the first time, dense, distributed representations could be trained in an unsupervised way, leveraging incredible amounts of data never utilized before. It was described as sriracha sauce - improve SOTA in every NLP by sprinkling in a few word vectors. Finally, the curse of dimensionality can become a blessing of dimensionality: an exponential amount of concepts can be expressed in a fixed-size vector. In this embedding space, even analogies like $\text{Paris} - \text{France} + \text{Italy} = \text {Rome}$ can be created out of vectors with dimension as small as 300.

Jeff Hawking on the Bio Eats World podcast argues that, actually, while powerful, the distributed representation introduces fragility: concepts can’t be modified independently. While neural nets are typically frozen and deployed to production, the brain actually continues to learn while processing information. Sparse representations enable robustness; dense representations introduce fragility. With a sparse representation, new information can be added, without changing existing information. Further, sparse representations are still powerful: $200,000 \choose 200$ still provides an astronomical number of possible representations.

It can be worth revisiting sparse representations in “next gen” AI approaches.