Reading directives (in progress!) for
Neural Networks: A Comprehensive Foundation
by Simon Haykin: (Second edition, Prentice Hall, London 1999)
1 = Important; 2 = Partly important; 3 = For orientation;
1. Introduction 


1.1 
What is a neural network? 
2 
1.2 
Human brain 
2 
1.3 
Models of a neuron 
1 
1.4 
Neural networks viewed as directed graphs 
3 
1.5 
Feedback 
3 
1.6 
Network architectures 
2 
1.7 
Knowledge representation 
2 
1.8 
Artificial intelligence and neural networks 
3 
1.9 
Historical notes 
3 
2. Learning process 


2.1 
Introduction 
2 
2.2 
Errorcorrection learning 
2 
2.3 
Memorybased learning 
3 
2.4 
Hebbian learning 
2 
2.5 
Competitive learning 
2 
2.6 
Boltzmann learning 
3 
2.7 
The creditassignment problem 
2 
2.8 
Learning with a teacher 
2 
2.9 
Learning without a teacher 
2 
2.10 
Learning tasks 
1 
2.11 
Memory 
1 
2.12 
Adaptation 
2 
2.13 
Statistical nature of the learning process 
3 
2.14 
Statistical learning theory 
2 
2.15 
Probably approximately correct learning 
3 
2.16 
Summary and Discussion 
3 
3. Single layers perceptrons 

3.1 
Introduction 
3 
3.2 
Adaptive filtering problem 
2 
3.3 
Unconstrained optimization techniques 
2 
3.4 
Linear leastsquares filters 
2 
3.5 
Leastmeansquares algorithm 
1 
3.6 
Learning curves 
2 
3.7 
Learning rate annealing techniques 
2 
3.8 
Perceptron 
1 
3.9 
Perceptron convergence theorem 
2 
3.10 
Relation between the perceptron and Bayes classifier for a Gaussian environment 
3 
3.11 
Summary and discussion 
3 
4. Multilayer perceptrons 

4.1 
Introduction 
2 
4.2 
Some preliminaries 
1 
4.3 
Backpropagation algorithm 
1 
4.4 
Summary of the backpropagation algorithm 
1 
4.5 
XOR problem 
1 
4.6 
Heuristics for making the backpropagation algorithm ... 
2 
4.7 
Output representation and decision rule 
2 
4.8 
Computer experiment 
2 
4.9 
Feature detection 
2 
4.10 
Backpropagation and differentiation 
3 
4.11 
Hessian matrix 
3 
4.12 
Generalization 
1 
4.13 
Approximations of functions 
2 
4.14 
Crossvalidation 
1 
4.15 
Networkpruning techniques 
2 
4.16 
Virtues and limitations of backpropagation learning 
1 
4.17 
Accelerated convergence of backpropagation learning 
3 
4.18 
Supervised learning viewed as an optimization problem 
3 
4.19 
Convolution networks 
2 
4.20 
Summary and discussion 
2 
5. Radialbasis function networks 

5.1 
Introduction 
3 
5.2 
Cover's theorem on the separability of patterns 
2 
5.3 
Interpolation problem 
1 
5.4 
Supervised learning as an illposed hypersurface ... 
3 
5.5 
Regularization theory 
3 
5.6 
Regularization networks 
1 
5.7 
Generalized radialbasis function networks 
1 
5.8 
The XOR problem (revisited) 
2 
5.9 
Estimation of the regularization parameter 
3 
5.10 
Approximation properties of RBF networks 
2 
5.11 
Comparison of RBF networks and multilayer perceptrons 
2 
5.12 
Kernel regression and its relation to RBF networks 
3 
5.13 
Learning strategies 
1 
5.14 
Computer experiment 
2 
5.15 
Summary and discussion 
3 
6. Support Vector machines 

6.1 
Introduction 
3 
6.2 
Optimal hyperplane for linearily separable patterns 
3 
6.3 
Optimal hyperplane for nonseparable patterns 
3 
6.4 
How to build a support vector machine for pattern recognition 
3 
6.5 
Example: XOR Problem (revisited) 
3 
6.6 
Computer experiment 
3 
6.7 
e insensitive loss function 
3 
6.8 
Support vector machines for nonlinear regression 
3 
6.9 
Summary and discussion 
3 
7. Committee Machines 


7.1 
Introduction 
2 
7.2 
Ensemble averaging 
2 
7.3 
Computer experiment I 
3 
7.4 
Boosting 
2 
7.5 
Computer experiment II 
3 
7.6 
Associative Gaussian mixture model 
3 
7.7 
Hierarchical mixture of experts model 
3 
7.8 
Model selection using a standard decision tree 
3 
7.9 
A priori and posteriori probabilities 
3 
7.10 
Maximum likelihood estimation 
3 
7.11 
Learning strategies for the HME model 
3 
7.12 
EM algorithm 
3 
7.13 
Application of the EM algorithm to the HME model 
3 
7.14 
Summary and discussion 
3 
8. Principal component analysis 

8.1 
Introduction 
2 
8.2 
Some intuitive principles of selforganization 
3 
8.3 
Principal component analysis 
2 
8.4 
Hebbianbased maximum eigenfilter 
2 
8.5 
Hebbianbased principal components analysis 
2 
8.6 
Computer experiment: Image coding 
3 
8.7 
Adaptive principal components analysis using ... 
2 
8.8 
Two classes of PCA algorithms 
2 
8.9 
Batch and adaptive methods of computation 
3 
8.10 
Kernelbased principal components analysis? 
3 
8.11 
Summary and discussion 
3 
9. Selforganizing maps 


9.1 
Introduction 
2 
9.2 
Two basic featuremapping models 
1 
9.3 
Selforganizing map 
2 
9.4 
Summary of the SOM algorithm 
1 
9.5 
Properties of the feature map 
2 
9.6 
Computer simulation 
3 
9.7 
Learning vector quantization 
1 
9.8 
Computer experiment: adaptive pattern classification 
2 
9.9 
Hierarchical vector quantization 
3 
9.10 
Contextual maps 
2 
9.11 
Summary and discussion 
2 
10. Informationtheoretic models 

10.1 
Introduction 

10.2 
Entropy 

10.3 
Maximum entropy principle 

10.4 
Mutual information 

10.5 
KullbackLeibler divergence 

10.6 
Mutual information as an objective function to be ... 

10.7 
Maximum mutual information principle 

10.8 
Infomax and redundancy reduction 

10.9 
Spatially coherent features 

10.10 
Spatially incoherent features 

10.11 
Independent component analysis 

10.12 
Computer experiment 

10.13 
Maximum likelihood estimation 

10.14 
Maximum entropy method 

10.15 
Summary and discussion 

11. Stochastic machines and their approximates rooted in statistical physics 

11.1 
Introduction 
3 
11.2 
Statistical mechanics 
3 
11.3 
Markov Chains 
3 
11.4 
Metropolis algorithm 
2 
11.5 
Simulated annealing 
1 
11.6 
Gibbs sampling 
2 
11.7 
Boltzmann machine 
1 
11.8 
Sigmoid belief networks 
2 
11.9 
Helmholtz machine 
3 
11.10 
Meanfield theory 
2 
11.11 
Deterministic Boltzmann machine 
3 
11.12 
Deterministic sigmoid belief networks 
3 
11.13 
Deterministic annealing 
3 
11.14 
Summary and discussion 
3 
12. Neurodynamic programming 

12.1 
Introduction 

12.2 
Markivian decision processes 

12.3 
Bellmanīs optimality criterion 

12.4 
Policy iteration 

12.5 
Value iteration 

12.6 
Neurodynamic programming 

12.7 
Approximate policy iteration 

12.8 
Qlearning 

12.9 
Computer experiment 

12.10 
Summary and discussion 

13. Temporal processing using feedforward networks 

13.1 
Introduction 

13.2 
Shortterm memory structures 

13.3 
Network architectures for temporal processing 

13.4 
Focused time lagged feedforward networks 

13.5 
Computer experiment 

13.6 
Universal myopic mapping theorem 

13.7 
Spatiotemporal models of a neuron 

13.8 
Focused time lagged feedforward networks 

13.9 
Temporal backpropagation algorithm 

13.10 
Summary and discussion 

14. Neurodynamics 


14.1 
Introduction 
3 
14.2 
Dynamical systems 
2 
14.3 
Stability of equilibrium states 
2 
14.4 
Attractors 
3 
14.5 
Neurodynamical models 
2 
14.6 
Manipulation of attractors as a recurrent network par. 
2 
14.7 
Hopfield models 
1 
14.8 
Computer experiment I 
3 
14.9 
CohenGrossberg theorem 
1 
14.10 
Brainstateinabox model 
2 
14.11 
Computer experiment II 
3 
14.12 
Strange attractors and chaos 
3 
14.13 
Dynamic reconstruction of a chaotic process 
3 
14.14 
Computer experiment III 
3 
14.15 
Summary and discussion 
3 
15. Dynamically driven recurrent networks 

15.1 
Introduction 

15.2 
Recurrent network architectures 

15.3 
Statespace model 

15.4 
Nonlinear autoregressive with exgenous input model 

15.5 
Computational power of recurrent networks 

15.6 
Learning algorithms 

15.7 
Backpropagation though time 

15.8 
Realtime recurrent learning 

15.9 
Kalman filters 

15.10 
Decoupled extended Kalman filters 

15.11 
Computer experiment 

15.12 
Vanishing gradients in recurrent networks 

15.13 
System identification 

15.14 
Modelreference adaptive control 

15.15 
Summary and discussion 

Appendix AD 

 