### Homework 2: Real Data

• The homework directory contains a data set res, which is a data matrix for some students. The first column gives birthyear, the second sex and the third is the score on an exam. The problem is to find dependencies between variables. Visualize the data set using (e.g.) Xgobi and write down your feelings about the data. The score zero is actually a missing data value (walk-over). Would you say that it is missing at random? Make a strict model comparison problem suitable for these questions and check in on the data. Report your results in the form of Bayes factors.
• A web-crawler tries to keep an up-to-date index of a number of web-pages. Since it does not know that a page was updated until it loads it, it wants to estimate the probability that it has been changed, and then it looks pages up in order of decreasing update probability. The data it can use is a historical log of updates, a set of time intervals during which the page was updated at least once and a set of time intervals during which the page was not updated. All these intervals are disjoint. Assume that the update process is stationary (this is necessary, since we only know the lengths of the intervals in the update history, not their relative positions on the time axis). How would you estimate the probability that the page has changed since it was last retrieved t days ago? What is the probability that the page was updated during the last 5 days, given the update history, for two pages with update histories in data sets weblog1(matlab save set), (text version) and weblog2(matlab save set), (text version)?? How can you check that your modeling assumptions are reasonable? Are they?
• The data set Brain.ras is a part of a byte/voxel raster file depicting the contents of a brain, as seen with an MR camera and reconstructed by some proprietary process designed for easy visualization. The intensity of a voxel is supposed to show tissue type. Some voxels contain more than one tissue type, and have a weighted average of the two intensities.

• Examine the data set in Matlab and report what it looks like. The matlab code his.m is a skeleton for reading (and displaying) the voxels.
• Try to detect classes in the voxel values. How many can you detect using some Bayesian model?