Public Library of Science
pone.0231189.g008.tif (184.96 kB)

Word embeddings map words in a corpus of text to vector space.

Download (184.96 kB)
posted on 2020-04-21, 20:26 authored by David Rozado

Linear combinations of dimensions in vector space correlate with the semantic and syntactic roles of the words in the corpus. For illustration purposes, dimension d1 in the figure has a high positive correlation with living beings. A properly tuned word embedding model will map words with similar semantic or syntactic roles to adjacent regions in vector space. This property can be visualized through dimensionality reduction techniques such as t-SNE or PCA (see upper right quadrant of the figure). Cultural concepts are also apparent in vector space as consistent offsets between vector representations of words sharing a particular relationship. For instance, in the bottom right of the figure, the dotted vector represents a gender regularity that goes from masculinity to femininity.