This course will be given in period 3 2002. Start Jan 25, 2002. Fridays 13:15-15 in room 1537.

NEW BOOK: Today (dec 3 2001) I received a copy of "Principles of Data Mining", MIT Press 2001, by David J. Hand, Heikki Mannila and Padhriac Smyth. It will be used as course book, and the supplementary material will be correspondingly decreased.

Preliminary information:

The first part of the course material can be obtained from
Course Package
Reading Assignments & Lecture plan

Home Works

Send me an email if you want to participate!

Tentative reading list:

A: General Methodology :

  1. Keichii Noe: Philosophical aspects of Discovery Science

B: Bayesian methods:

  1. E. T. Jaynes: Probability theory: The logic of Science, Ch 1,2,4,5,24.
  2. S.Arnborg: A survey of Bayesian Data Mining - Part I.

C: Markov Chain Monte Carlo

  1. Niclas Bergman: Recursive Bayesian Estimation, PhD Thesis, Linköping University.

D:Time series and prediction :

  1. Ch 1 of: Time Series Prediction: Forecasting the Future and Understanding the Past. Weigend, A. S., and N. A. Gershenfeld (Eds.) (1994) Santa Fe Institute Studies in the Sciences of Complexity XV. (Proceedings of the NATO Advanced Research Workshop on Comparative Time Series Analysis, Santa Fe, NM, May 1992.) Reading, MA: Addison-Wesley.

E: Stochastic Complexity and Classification (unsupervised and supervised)

  1. (J. J. Oliver and D. J. Hand, Introduction to Minimum Encoding Inference, [TR 4-94] Dept. Stats. Open Univ. and also TR 94/205 Dept. Comp. Sci. Monash Univ. )
  2. G.I. Webb: Further Experimental Evidence against the Utility of Occams Razor,
    Journal of AI research 4(1996) 397-417.
  3. (J. J. Oliver and R. A. Baxter, MML and Bayesianism: Similarities and Differences, [TR 94/206] )

F: Causality.

  1. Glymour, Cooper: Computation, Causation and Discovery(1999), Ch 2-3.
  2. ( N. Friedman, K. Murphy, S. Russel: Learning the structure of probabilistic networks, Uncertainty in Artificial Intelligence, 1998.)

G: Support vector technology, applications in genomics.

  1. "Theory of SV Machines", CSD-TR-96-17, Royal Holloway, University of London, Egham, UK, 1996.

H: Visualisation of non-geometrical data..

  1. (Buja et al (1996). Interactive High-Dimensional Data Visualization. Journal of Computation and Graphical Statististics. Vol 5, No. 1.)

Lectures&Reading

Jan 25: Course overview, planning discussion
Hand, Mannila,Smyth: Ch1-3; Arnborg: Survey of Bayesian Data Mining Jaynes
Feb 1 12:15-13:45 : Inference
Feb 13, 9:15-11:00: Hand, Mannila,Smyth: Ch4, 8, 9; Cheeseman, Stutz: Autoclass. Do you have data for graphical modeling? Then take a look at B-course below.
Feb 15: No Lecture (advising individually)
Feb 22:Graphical and other models
Bergmans thesis, Ch3, Ch6.
Weigend and Gershenfeld: Time series prediction HMS Ch 4, 5.
March 1 :Time series, dynamic models and estimation/prediction. March 8: Estimation and particle filtering
March 15:EM, Autoclass classification
March 22: Support vector Techniques.

Some resources:

The Nada /misc directory has afs address /afs/nada.kth.se/misc