NLP seminar

We usually meet on Tuesdays at 11:00 in the ITL Meeting room on the top floor. A bib file with the paper entries is available in case you need to cite some paper.

Summer 2015

Phrase meaning representation (Jun 23)

Machine translation (May 12)

Phrase similarity (May 19)

Topic models (Jun 2)

  • Rajarshi Das, Manzil Zaheer, Chris Dyer Gaussian LDA for Topic Models with Word Embeddings. 2015

Word meaning representation (Jun 23)

Spring 2015

Similarity, association and relatedness (Jan 21)

Conceptual spaces (Jan 28)

Meaning representation (Feb 3)

Meaning representation II (Feb 10)

Discourse compositionality (Feb 17)

Language modelling (Feb 24)

Meaning representation (Mar 3)

Machine learning (Mar 10)

Autumn 2014: Natural language modeling: from n-grams to vectors

There was a trend of going away from predefined categorical features, such as context words in distributional semantics, n-grams in language modeling, and bag-of-words models in information retrieval. Instead the features are learned in some way. The goal of the reading seminar is to go trough this evolution on an example of language modeling and word representation in a vector space.

Towards the end several papers are suggested per meeting. The reason is that they share the same topic. We don't need to read all of them, but rather chose the ones we like.

Introduction to n-gram models (Oct 6)

"Foundations of statistical natural language processing" chapter 6 up to section 6.2.3. In addition, we can have a look how parameters are estimated using Maximum Likelihood Estimation in Bernoulli Experiment. This is the list of books that might give the necessary background in language processing and statistics. They are available in the library.

  • Christopher D Manning, Hinrich Schütze Foundations of statistical natural language processing. 1999
  • Christopher M Bishop Pattern recognition and machine learning. 2006
  • James H Martin, Daniel Jurafsky Speech and language processing. 2000

An class-based language model (Oct 13)

Matthew leads a discussion on model validation. Stephen introduces the class based n-gram model of Brown et al.

A backing-off language model (Oct 20)

Focus on the problem of data sparsity: unseen n-grams are assigned zero probabilities. Discuss smoothing techniques.

A log-linear language model (Nov 03)

Go trough a derivation of a log-linear language model.

A neural language model (Nov 10)

Discuss how neural networks approach the problem of data sparsity. Basically, they aim to solve the issue of unseen data in the following way. Suppose, the trigram dogs chase cats is seen often in the training corpus. Also cats and kittens share a lot of context in the training corpus. Even if dogs chase kittens is never seen in the corpus, the model will assign a probability to it close to dogs chase cats because similar words are expected to have similar feature vectors.

Scaling up a neural language model

It is expensive to train a neural model, because of normalisation in softmax. Discuss possible solutions: hierarchical softmax and negative sampling.


Mikolov et al. scaled up neural networks for dictionaries of million of words.

Deeper than the deep, or understanding word2vec