Homework 1: Probabilities and sequences
(a,b,c)
The homework directory
contains Matlab data sets
A ,
B , and C .
These are time series of 0 and 1, stored as
Matlab vectors. Investigate the three data sets,
using suitable Bayesian analyses. State clearly which models you consider,
and display your results in easily understandable (e.g., graphic) form.
You are recommended to work in Matlab, but the sequences are easy to obtain
from the save files (last 10000 bytes, following a header), if you prefer
some other software. Analyze each of the data sets
A, B, C against each of
the hypotheses (a), (b), (c) below. You can decrease the number of
combinations you test using suitable visualisations of the data.

a) We assume that data was generated by flipping a coin. Investigate if
this coin was fair (probability 1/2 for 1) or biased (assuming some
noninformative prior for the probability. Compare these two possibilities
with (b) and (c).

b) We suspect that the data were generated by independent flipping of a
possibly biased coin, but at one point the coin was replaced with one giving
a different probability of heads. Under this assumption, characterize
the point in the sequence where the change occured.

c) We know that the data were generated by flipping two biased coins.
Before each toss the coin was replaced with a certain probability.
What can you say about the coins and the probability of switching
coin?

d) Define a change detector for sequences. Its input is a 01 sequence, and at
each stage it knows the plausibility of a probability change having occured
in the last d stages, as a function of d (if there is a true change,
it should have more confidence with increasing d, since its input is random).
Hint: It is nice and clean to use ChapmanKolmogorovs theorem, with suitable
assumptions.
Can you say what the most plausible generation mechanism is for
each data set, A, B and C?
The Hidden Markov Model has clearly a relation to the
coinswitching model. Can you see what the difference is?
Would you expect to draw similar probabilities from the too models?
Hint: One of the hypotheses suggests an application of the BaumWelsh
algorithm. Unfortunately, BaumWelsh is essentially a local search
ML method and does not give appropriate indication of the true
uncertainty of the suggested answer and for these data it seems to get stuck
in a completely inappropriate local ML maximum.
It is probably possible to solve these problems by using
an MCMC simulator of the latent variable describing the (unknown)
coin used at each toss. A matlab code is
mcccoin.m. When the change probability
pc is set to zero, it never changes the initial assignment of coins.
If you use this code, please inspect it and make reasonable checks that
it works as you expect (e.g., by testing it on shorter sequences
where the 'answer' is obvious).
A similar code for mcmc computation of the posterior parameters
of the HMM model can be found in this MATLAB file:
hmmsim.m.
Some shorter data, that is well known, is the recording of Old Faithful
geiser eruptions: The data is recorded with 0
for a strong eruption, and 1 for a short one.