### Homework 1: Probabilities and sequences

(a,b,c) The homework directory contains Matlab data sets A , B , and C . These are time series of 0 and 1, stored as Matlab vectors. Investigate the three data sets, using suitable Bayesian analyses. State clearly which models you consider, and display your results in easily understandable (e.g., graphic) form. You are recommended to work in Matlab, but the sequences are easy to obtain from the save files (last 10000 bytes, following a header), if you prefer some other software. Analyze each of the data sets A, B, C against each of the hypotheses (a), (b), (c) below. You can decrease the number of combinations you test using suitable visualisations of the data.
• a) We assume that data was generated by flipping a coin. Investigate if this coin was fair (probability 1/2 for 1) or biased (assuming some non-informative prior for the probability. Compare these two possibilities with (b) and (c).
• b) We suspect that the data were generated by independent flipping of a possibly biased coin, but at one point the coin was replaced with one giving a different probability of heads. Under this assumption, characterize the point in the sequence where the change occured.
• c) We know that the data were generated by flipping two biased coins. Before each toss the coin was replaced with a certain probability. What can you say about the coins and the probability of switching coin?
• d) Define a change detector for sequences. Its input is a 0-1 sequence, and at each stage it knows the plausibility of a probability change having occured in the last d stages, as a function of d (if there is a true change, it should have more confidence with increasing d, since its input is random). Hint: It is nice and clean to use Chapman-Kolmogorovs theorem, with suitable assumptions.
Can you say what the most plausible generation mechanism is for each data set, A, B and C? The Hidden Markov Model has clearly a relation to the coin-switching model. Can you see what the difference is? Would you expect to draw similar probabilities from the too models? Hint: One of the hypotheses suggests an application of the Baum-Welsh algorithm. Unfortunately, Baum-Welsh is essentially a local search ML method and does not give appropriate indication of the true uncertainty of the suggested answer and for these data it seems to get stuck in a completely inappropriate local ML maximum. It is probably possible to solve these problems by using an MCMC simulator of the latent variable describing the (unknown) coin used at each toss. A matlab code is mcccoin.m. When the change probability pc is set to zero, it never changes the initial assignment of coins. If you use this code, please inspect it and make reasonable checks that it works as you expect (e.g., by testing it on shorter sequences where the 'answer' is obvious). A similar code for mcmc computation of the posterior parameters of the HMM model can be found in this MATLAB file: hmmsim.m. Some shorter data, that is well known, is the recording of Old Faithful geiser eruptions: The data is recorded with 0 for a strong eruption, and 1 for a short one.