Homework 1: Probabilities and sequences

(a,b,c) The homework directory contains Matlab data sets A , B , and C . These are time series of 0 and 1, stored as Matlab vectors. Investigate the three data sets, using suitable Bayesian analyses. State clearly which models you consider, and display your results in easily understandable (e.g., graphic) form. You are recommended to work in Matlab, but the sequences are easy to obtain from the save files (last 10000 bytes, following a header), if you prefer some other software. Analyze each of the data sets A, B, C against each of the hypotheses (a), (b), (c) below. You can decrease the number of combinations you test using suitable visualisations of the data. Can you say what the most plausible generation mechanism is for each data set, A, B and C? The Hidden Markov Model has clearly a relation to the coin-switching model. Can you see what the difference is? Would you expect to draw similar probabilities from the too models? Hint: One of the hypotheses suggests an application of the Baum-Welsh algorithm. Unfortunately, Baum-Welsh is essentially a local search ML method and does not give appropriate indication of the true uncertainty of the suggested answer and for these data it seems to get stuck in a completely inappropriate local ML maximum. It is probably possible to solve these problems by using an MCMC simulator of the latent variable describing the (unknown) coin used at each toss. A matlab code is mcccoin.m. When the change probability pc is set to zero, it never changes the initial assignment of coins. If you use this code, please inspect it and make reasonable checks that it works as you expect (e.g., by testing it on shorter sequences where the 'answer' is obvious). A similar code for mcmc computation of the posterior parameters of the HMM model can be found in this MATLAB file: hmmsim.m. Some shorter data, that is well known, is the recording of Old Faithful geiser eruptions: The data is recorded with 0 for a strong eruption, and 1 for a short one.