bild
School of
Computer Science
and Communication

GSLT course in Clustering (Level 2) - Description

Fall semester 2008

This course is intended for students in language technology who want to know how text clustering works, what it can be used to and how to use clustering tools in language technology applications.

Course objectives

After the course, the students should be able to
  • use clustering tools to cluster data in language technology applications,
  • explain how clustering techniques can be used in different applications such as information retrieval and hypothesis generation,
  • explain how different clustering techniques work and how they differ,
  • describe how a clustering can be evaluated and why it is so hard to evaluate clusterings.

Contents

  • Clustering vs. categorization
  • Algorithms for clustering: k-means, agglomerative, spectral etc.
  • Distance measures
  • Representation of texts
  • Evaluation of clustering
  • Applications of clustering: text mining, information retrieval, social network analyzing, hypothesis generation.
  • Clustering tools

Structure

The core of this course is individual work consisting of reading the course material, completing practical assignments and carrying out and presenting a project on an individually selected application of clustering.

Prerequisites

Basic knowledge of natural language processing, corresponding to the GSLT level 1 course with the same name.

Language

The official language of GSLT courses is English. Depending on the composition of the group at any given lecture or seminar, Swedish may be used instead, as long as all participants are comfortable with this.

Teachers

^ Up to Clustering Course.

Published by: Viggo Kann <viggo@nada.kth.se>
Updated 2008-08-25