Course
Book

Order from MIT
Press

(can also be ordered from bokus or amazon).

Compared to the 1997 course, this one will be more structured with readymade homework assignments and a compulsory reading list. However, there is still room for individual adaptions of the examination part of the course, besides the standard procedures. As last time, the first part is a grounding in Bayesian data analysis, but with new and easier accessible reading material.

The lectures will be of two kinds: Foundational areas like the basis of Bayesian analysis, unsupervised classification, MDL, will be covered in seminar lectures by me and invited speakers. Applications in areas of interest to students will be presented by students and/or invited speakers and discussed in class.

- U.M Fayyad, G. Piatetsky-Shapiro, P. Smyth R Uthurusamy (Eds)

Advances in Knowledge Discovery and Data Mining.

AAAI Press, Menlo Park, CA 1996. ISBN 0-262-56097-6

- Handouts and articles (see below)
### Tentative reading list:

**A: General Methodology :**- In Fayyad et al: Ch 1, 2, 4, 14, 23
- Keichii Noe: Philosophical aspects of Discovery Science

**B: Bayesian methods:**- In Fayyad et al: Ch 3, 11, 13, 20
- E. T. Jaynes: Probability theory: The logic of Science, Ch 1,2,4,5.
- S.Arnborg: Bayesian Data Mining - Part I.

**C: Markov Chain Monte Carlo**- Gilks, Richardson, Spiegelhalter: Introducing Markov Chain Monte Carlo.
- Spiegelhalter, Best, Golks, Inskip: Hepatitis B: A case study in MCMC.
- R.M.Neal: (1998) Markov chain sampling methods for Dirichlet process mixture models, Technical Report No. 9815, Dept. of Statistics, University of toronto,
- Green, P. J., Markov Chain Monte Carlo in Image Analysis,
- Aykroyd, R.G. (May, 1997) Bayesian Estimation for Homogeneous and Inhomogeneous Gaussian Random Fields. University of Leeds.
- Mannila, Toivonen, Korhola and Olander: Learning, mining or modeling? A case study from Paleoecology, in Discovery Science, LNCS 1532.

**D: Stochastic Complexity and Classification (unsupervised and supervised)**- In Fayyad et al: Ch 6, 7, 19
- J. Rissanen: Stochastic Complexity(with discussion), J.R. Statist. Soc B(1987) 49(3) pp 223-239 and 252-265.
- G.I. Webb: Further Experimental Evidence against the Utility of
Occams
Razor,

Journal of AI research 4(1996) 397-417. - Heinonen, Mannila: Attribute oriented induction and conceptual
clustering,

University of Helsinki Dept Computer Science, report C-1996-2. - Gyllenborg, Koski, Verlaan: Classification of binary vectors by stochastic complexity, Journal of multi-variate analysis,63(1997) 47-72.

**E:Time series and prediction :**- In Fayyad et al: Ch 9, 22
- G.L. Bretthorst - Bayesian spectrum analysis and parameter estimation LNS 48, Springer Verlag, Ch 3-5.
- Ch 1 of: Time Series Prediction: Forecasting the Future and Understanding the Past. Weigend, A. S., and N. A. Gershenfeld (Eds.) (1994) Santa Fe Institute Studies in the Sciences of Complexity XV. (Proceedings of the NATO Advanced Research Workshop on Comparative Time Series Analysis, Santa Fe, NM, May 1992.) Reading, MA: Addison-Wesley.
- P. Vitanyi, Ming Li: On Prediction by data Compression, in LNCS ?

**F: Spatial applications:**- To be defined

**G: Visualization:**- Buja et al (1996). Interactive High-Dimensional Data Visualization. Journal of Computation and Graphical Statististics. Vol 5, No. 1.
- J.H. Friedman: Exploratory projection pursuit. JASA 82(1987) pp 249--266.
- Bishop and Tipping: A hierarchical latent variable model for data visualisation. IEEEE PAMI March 1998, vol 20(3), pp 281-293.
- Video Lectures from the ASA library
Xgobi: Dynamic Graphics for Data Analysis

Missing Data in Interactive High-Dimensional Visualization

Grand Tour and Projection Pursuit

Exploring Time Series Using Interactive Graphics

Spatial CDF Estimation & Visualization with Applications to Forest Health Monitoring

Dynamic Graphics in a GIS: Analyzing and Exploring Multivariate Data

### Schedule, See news file!!

### Examination.

Homework assignments and presentation or project. The examination is on an individual basis, but everyone has to write down a list of the read material and a commentary. This commentary is also an important input to course evaluation.

The project could involve data used in your research project (check with your project leader).

### WWW pointers

To be defined.### Related Courses Elsewhere

SUNY Albany

KDDM at RPI

Spatial Statistics at Wisconson

Spatial Statistics

Statistics Refresher

Data Mining Course

INDEX6522

S98 Course of Study: Data Warehousing and Mining

Short Courses: Data Mining Techniques and Applications