Latent Semantic Analysis and related methods to extract distributed word meaning representations from corpora are not only appealing in an applied perspective, but they also have certain properties that make them attractive to linguists and cognitive scientists as models of how humans acquire and use their semantic competence: They support a fuzzy notion of semantic similarity that is very appropriate to characterize intuitions about word meaning. They are induced from naturally occurring data without explicit supervision. Finally, they learn multi-purpose word meaning representations, that can be used to address a wide array of lexical semantic tasks, from solving analogy problems to measuring the plausibility of a verb argument.
Still, the Holy Grail of computational semantics is to capture the meaning of full sentences, not just words. And, largely due to the recent renaissance of neural network methods, the last few years have indeed seen a flurry of proposals on how to derive distributed representations of sentence meaning.
In my talk, I will list some desiderata for such models, that would make them more interesting in a linguistic/cognitive perspective, and might enable novel, more ambitious applications. Specifically, I will focus on the following 3 points.
1) We should aim at learning general-purpose representations and composition operations, that can be applied to different semantic tasks with just a modicum of task-specific supervision.
2) Everybody knows that, to account for the infinite number of possible sentences, we need to derive their representation compositionally from that of their parts. However, "compositional" means different things to different researchers. I will highlight some general properties of genuine linguistic compositionality that all models should be able to account for.
3) Finally, I will sketch ideas for sentence meaning evaluation benchmarks that should not be far from the grasp of current models, but are challenging enough that addressing them would bring about non-incremental advancements in the field.
Compositional structures abound in NLP, ranging from bilexical relations, entities in context, sequences and trees. The focus of this talk is in latent-variable compositional models for structured objects that: (1) induce a latent n-dimensional representation for each element of the structure; and (2) learn operators for composing such elements into structures.
I will present a framework based on formulating the learning problem as low-rank matrix learning. The main ingredient of the framework is what we call the Hankel matrix: this collects the necessary statistics of our model, and its factorization mimicks the compositional nature of the model. We use Hankel matrices to reduce the problem of learning compositional models to a problem of estimating low-rank Hankel matrices. I will illustrate some convex formulations for different applications, from classification of entities in context, to learning sequence taggers, to unsupervised induction of context-free grammars.
Distributed representations of human language content and structure had a brief boom in the 1980s, but it quickly faded, and the past 20 years have been dominated by continued use of categorical representations of language, despite the use of probabilities or weights over elements of these categorical representations. However, the last five years have seen a resurgence, with highly successful use of distributed vector space representations, often in the context of "neural" or "deep learning" models. One great success has been distributed word representations, and I will look at some of our recent work and that of others on better understanding word representations and how they can be thought of as global matrix factorizations, much more similar to the traditional literature. But we need more than just word representations: We need to understand the larger linguistic units that are made out of words, a problem which has been much less addressed. I will discuss the use of distributed representations in tree-structured recursive neural network models, showing how they can provide sophisticated linguistic models of semantic similarity, sentiment, syntactic parse structure, and logical entailment.