Reading directives (in progress!) for

Neural Networks: A Comprehensive Foundation

by Simon Haykin: (Second edition, Prentice Hall, London 1999)

1 = Important; 2 = Partly important; 3 = For orientation;

1. Introduction

 

 

1.1

What is a neural network?

2

1.2

Human brain

2

1.3

Models of a neuron

1

1.4

Neural networks viewed as directed graphs

3

1.5

Feedback

3

1.6

Network architectures

2

1.7

Knowledge representation

2

1.8

Artificial intelligence and neural networks

3

1.9

Historical notes

3

2. Learning process

 

 

2.1

Introduction

2

2.2

Error-correction learning

2

2.3

Memory-based learning

3

2.4

Hebbian learning

2

2.5

Competitive learning

2

2.6

Boltzmann learning

3

2.7

The credit-assignment problem

2

2.8

Learning with a teacher

2

2.9

Learning without a teacher

2

2.10

Learning tasks

1

2.11

Memory

1

2.12

Adaptation

2

2.13

Statistical nature of the learning process

3

2.14

Statistical learning theory

2

2.15

Probably approximately correct learning

3

2.16

Summary and Discussion

3

3. Single layers perceptrons

3.1

Introduction

3

3.2

Adaptive filtering problem

2

3.3

Unconstrained optimization techniques

2

3.4

Linear least-squares filters

2

3.5

Least-mean-squares algorithm

1

3.6

Learning curves

2

3.7

Learning rate annealing techniques

2

3.8

Perceptron

1

3.9

Perceptron convergence theorem

2

3.10

Relation between the perceptron and Bayes classifier for a Gaussian environment

3

3.11

Summary and discussion

3

4. Multilayer perceptrons

4.1

Introduction

2

4.2

Some preliminaries

1

4.3

Back-propagation algorithm

1

4.4

Summary of the back-propagation algorithm

1

4.5

XOR problem

1

4.6

Heuristics for making the back-propagation algorithm ...

2

4.7

Output representation and decision rule

2

4.8

Computer experiment

2

4.9

Feature detection

2

4.10

Back-propagation and differentiation

3

4.11

Hessian matrix

3

4.12

Generalization

1

4.13

Approximations of functions

2

4.14

Cross-validation

1

4.15

Network-pruning techniques

2

4.16

Virtues and limitations of back-propagation learning

1

4.17

Accelerated convergence of back-propagation learning

3

4.18

Supervised learning viewed as an optimization problem

3

4.19

Convolution networks

2

4.20

Summary and discussion

2

5. Radial-basis function networks

5.1

Introduction

3

5.2

Cover's theorem on the separability of patterns

2

5.3

Interpolation problem

1

5.4

Supervised learning as an ill-posed hypersurface ...

3

5.5

Regularization theory

3

5.6

Regularization networks

1

5.7

Generalized radial-basis function networks

1

5.8

The XOR problem (revisited)

2

5.9

Estimation of the regularization parameter

3

5.10

Approximation properties of RBF networks

2

5.11

Comparison of RBF networks and multilayer perceptrons

2

5.12

Kernel regression and its relation to RBF networks

3

5.13

Learning strategies

1

5.14

Computer experiment

2

5.15

Summary and discussion

3

6. Support Vector machines

6.1

Introduction

3

6.2

Optimal hyperplane for linearily separable patterns

3

6.3

Optimal hyperplane for nonseparable patterns

3

6.4

How to build a support vector machine for pattern recognition

3

6.5

Example: XOR Problem (revisited)

3

6.6

Computer experiment

3

6.7

e -insensitive loss function

3

6.8

Support vector machines for nonlinear regression

3

6.9

Summary and discussion

3

7. Committee Machines

 

 

7.1

Introduction

2

7.2

Ensemble averaging

2

7.3

Computer experiment I

3

7.4

Boosting

2

7.5

Computer experiment II

3

7.6

Associative Gaussian mixture model

3

7.7

Hierarchical mixture of experts model

3

7.8

Model selection using a standard decision tree

3

7.9

A priori and posteriori probabilities

3

7.10

Maximum likelihood estimation

3

7.11

Learning strategies for the HME model

3

7.12

EM algorithm

3

7.13

Application of the EM algorithm to the HME model

3

7.14

Summary and discussion

3

8. Principal component analysis

8.1

Introduction

2

8.2

Some intuitive principles of self-organization

3

8.3

Principal component analysis

2

8.4

Hebbian-based maximum eigenfilter

2

8.5

Hebbian-based principal components analysis

2

8.6

Computer experiment: Image coding

3

8.7

Adaptive principal components analysis using ...

2

8.8

Two classes of PCA algorithms

2

8.9

Batch and adaptive methods of computation

3

8.10

Kernel-based principal components analysis?

3

8.11

Summary and discussion

3

9. Self-organizing maps

 

 

9.1

Introduction

2

9.2

Two basic feature-mapping models

1

9.3

Self-organizing map

2

9.4

Summary of the SOM algorithm

1

9.5

Properties of the feature map

2

9.6

Computer simulation

3

9.7

Learning vector quantization

1

9.8

Computer experiment: adaptive pattern classification

2

9.9

Hierarchical vector quantization

3

9.10

Contextual maps

2

9.11

Summary and discussion

2

10. Information-theoretic models

10.1

Introduction

 

10.2

Entropy

 

10.3

Maximum entropy principle

 

10.4

Mutual information

 

10.5

Kullback-Leibler divergence

 

10.6

Mutual information as an objective function to be ...

 

10.7

Maximum mutual information principle

 

10.8

Infomax and redundancy reduction

 

10.9

Spatially coherent features

 

10.10

Spatially incoherent features

 

10.11

Independent component analysis

 

10.12

Computer experiment

 

10.13

Maximum likelihood estimation

 

10.14

Maximum entropy method

 

10.15

Summary and discussion

 

11. Stochastic machines and their approximates rooted in statistical physics

11.1

Introduction

3

11.2

Statistical mechanics

3

11.3

Markov Chains

3

11.4

Metropolis algorithm

2

11.5

Simulated annealing

1

11.6

Gibbs sampling

2

11.7

Boltzmann machine

1

11.8

Sigmoid belief networks

2

11.9

Helmholtz machine

3

11.10

Mean-field theory

2

11.11

Deterministic Boltzmann machine

3

11.12

Deterministic sigmoid belief networks

3

11.13

Deterministic annealing

3

11.14

Summary and discussion

3

12. Neurodynamic programming

12.1

Introduction

 

12.2

Markivian decision processes

 

12.3

Bellmanīs optimality criterion

 

12.4

Policy iteration

 

12.5

Value iteration

 

12.6

Neurodynamic programming

 

12.7

Approximate policy iteration

 

12.8

Q-learning

 

12.9

Computer experiment

 

12.10

Summary and discussion

 

13. Temporal processing using feedforward networks

13.1

Introduction

 

13.2

Short-term memory structures

 

13.3

Network architectures for temporal processing

 

13.4

Focused time lagged feedforward networks

 

13.5

Computer experiment

 

13.6

Universal myopic mapping theorem

 

13.7

Spatio-temporal models of a neuron

 

13.8

Focused time lagged feedforward networks

 

13.9

Temporal back-propagation algorithm

 

13.10

Summary and discussion

 

14. Neurodynamics

 

 

14.1

Introduction

3

14.2

Dynamical systems

2

14.3

Stability of equilibrium states

2

14.4

Attractors

3

14.5

Neurodynamical models

2

14.6

Manipulation of attractors as a recurrent network par.

2

14.7

Hopfield models

1

14.8

Computer experiment I

3

14.9

Cohen-Grossberg theorem

1

14.10

Brain-state-in-a-box model

2

14.11

Computer experiment II

3

14.12

Strange attractors and chaos

3

14.13

Dynamic reconstruction of a chaotic process

3

14.14

Computer experiment III

3

14.15

Summary and discussion

3

15. Dynamically driven recurrent networks

15.1

Introduction

 

15.2

Recurrent network architectures

 

15.3

State-space model

 

15.4

Nonlinear autoregressive with exgenous input model

 

15.5

Computational power of recurrent networks

 

15.6

Learning algorithms

 

15.7

Back-propagation though time

 

15.8

Real-time recurrent learning

 

15.9

Kalman filters

 

15.10

Decoupled extended Kalman filters

 

15.11

Computer experiment

 

15.12

Vanishing gradients in recurrent networks

 

15.13

System identification

 

15.14

Model-reference adaptive control

 

15.15

Summary and discussion

 

Appendix A­D

 

-