Homework 2: Real Data
The homework directory
contains a data set
res, which is a data matrix for some students.
The first column gives birthyear, the second sex and the third
is the score on an exam. The problem is to find dependencies between
variables. Visualize the data set using (e.g.) Xgobi and write down
your feelings about the data. The score zero is actually a missing data
value (walk-over). Would you say that it is missing at random?
Make a strict model comparison problem suitable for these questions and
check in on the data. Report your results in the form of Bayes factors.
A web-crawler tries to keep an up-to-date index of a number of web-pages.
Since it does not know that a page was updated until it loads it,
it wants to estimate the probability that it has been changed, and
then it looks pages up in order of decreasing update probability.
The data it can use is a historical log of updates, a set of
time intervals during which the page was updated at least once
and a set of time intervals during which the page was not updated.
All these intervals are disjoint. Assume that the update process is
stationary (this is necessary, since we only know the lengths
of the intervals in the update history, not their relative positions
on the time axis).
How would you estimate the probability
that the page has changed since it was last retrieved t days ago?
What is the probability that the page was updated during the last 5 days,
given the update history, for two pages with update histories in data sets
weblog1(matlab save set),
(text version) and
weblog2(matlab save set),
How can you check that your modeling assumptions are reasonable?
The data set
is a part of a byte/voxel raster file depicting the
contents of a brain, as seen with an MR camera and reconstructed by some
proprietary process designed for easy visualization. The intensity of
a voxel is supposed to show tissue type. Some voxels contain more than
one tissue type, and have a weighted average of the two intensities.
- Examine the data set in Matlab and report what it looks like.
The matlab code his.m is a skeleton for reading (and displaying) the voxels.
- Try to detect classes in the voxel values. How many can you
detect using some Bayesian model?