Compared to the 1997 course, this one will be more structured
with readymade homework assignments and a compulsory reading list.
However, there is still room for individual adaptions of the
examination part of the course, besides the standard procedures.
As last time, the first part is a grounding in Bayesian data analysis,
but with new and easier accessible reading material.
The lectures will be of two kinds: Foundational areas like the
of Bayesian analysis, unsupervised classification, MDL,
will be covered in seminar lectures by me and invited speakers.
in areas of interest to students will be presented by students and/or
speakers and discussed in class.
U.M Fayyad, G. Piatetsky-Shapiro, P. Smyth R Uthurusamy (Eds)
Advances in Knowledge Discovery and Data Mining.
AAAI Press, Menlo Park, CA 1996. ISBN 0-262-56097-6
Handouts and articles (see below)
Tentative reading list:
A: General Methodology :
In Fayyad et al: Ch 1, 2, 4, 14, 23
Keichii Noe: Philosophical aspects of Discovery Science
B: Bayesian methods:
In Fayyad et al: Ch 3, 11, 13, 20
E. T. Jaynes: Probability theory: The logic of Science, Ch
S.Arnborg: Bayesian Data Mining - Part I.
C: Markov Chain Monte Carlo
Gilks, Richardson, Spiegelhalter: Introducing Markov Chain Monte Carlo.
Spiegelhalter, Best, Golks, Inskip: Hepatitis B: A case study in MCMC.
R.M.Neal: (1998) Markov chain sampling methods for Dirichlet process mixture models,
Technical Report No. 9815, Dept. of Statistics, University of toronto,
Green, P. J., Markov Chain Monte Carlo in Image Analysis,
Aykroyd, R.G. (May, 1997) Bayesian Estimation for Homogeneous and
Inhomogeneous Gaussian Random Fields. University of Leeds.
Mannila, Toivonen, Korhola and Olander:
Learning, mining or modeling?
A case study from Paleoecology,
in Discovery Science, LNCS 1532.
D: Stochastic Complexity and Classification (unsupervised and
In Fayyad et al: Ch 6, 7, 19
J. Rissanen: Stochastic Complexity(with discussion),
J.R. Statist. Soc B(1987) 49(3) pp 223-239 and
G.I. Webb: Further Experimental Evidence against the Utility of
Journal of AI research 4(1996) 397-417.
Heinonen, Mannila: Attribute oriented induction and conceptual
University of Helsinki Dept Computer Science, report C-1996-2.
Gyllenborg, Koski, Verlaan: Classification of binary vectors by
stochastic complexity, Journal of multi-variate analysis,63(1997) 47-72.
Ch 1 of: Time Series Prediction: Forecasting the Future and
Past. Weigend, A. S., and N. A. Gershenfeld (Eds.) (1994) Santa Fe
Institute Studies in the Sciences of Complexity XV. (Proceedings of the
NATO Advanced Research Workshop on Comparative Time Series Analysis,
Santa Fe, NM, May 1992.) Reading, MA: Addison-Wesley.
P. Vitanyi, Ming Li: On Prediction by data Compression,
in LNCS ?
F: Spatial applications:
To be defined
Buja et al (1996). Interactive
High-Dimensional Data Visualization. Journal of Computation and Graphical
Statististics. Vol 5, No. 1.
Bishop and Tipping: A hierarchical latent variable model
for data visualisation.
IEEEE PAMI March 1998, vol 20(3), pp 281-293.
Video Lectures from the ASA library
Xgobi: Dynamic Graphics for Data Analysis
Missing Data in Interactive High-Dimensional Visualization
Grand Tour and Projection Pursuit
Exploring Time Series Using Interactive Graphics
Spatial CDF Estimation & Visualization with Applications to Forest
Dynamic Graphics in a GIS: Analyzing and Exploring Multivariate Data
Homework assignments and presentation or project.
The examination is on an individual basis, but everyone has to
write down a list of the read material and a commentary.
This commentary is also an important input to course evaluation.
The project could involve data used in your research project
with your project leader).