Things you need to know to do Assignment 3: 1) You should avoid actually computing the likelihood of the training data for given parameter values. Likelihoods can get very large or small, causing floating-point overflow or underflow. Instead, you should compute the log likelihood, which is just the sum of the logs of the probabilities of each training case. The probability density for a single training case might also too large or small to represent as a floating point number, but for this assignment, in which there are only two variables, you should be OK computing the probability density of a case and then taking the log of it. 2) In R, you can use the dnorm function to compute Gaussian probability densities, though of course just using the formula is also pretty easy. 3) You can generate a uniform (1,0) random number in R with "runif(1)". In Matlab, "rand" gets you one. You set the seed with "set.seed(s)" in R or with "rand('seed',s)" in Matlab. 4) To generate p1,...,pK uniformly over the region where they are non-negative and sum to one, you can generate e1,...,eK from exponential distributions with mean one, and then set ri equal to ei divided by the sum of all the e's. 5) You can generate n random exponentially-distributed values with "rexp(n)" in R. You can also easily generate one by taking the log of a uniform (0,1) random variable. (Actually rather than computing log(u), it's better to compute log(1-u), since that avoids the small chance of computing log(0).) 6) You need to sample N parameter vectors from a set of M parameter vectors drawn from the prior, using probabilities proportional to the likelihood. Of course, you should do this by sampling indexes of these vectors in some matrix, with the indexes being integers from 1 to M. In R, you can sample a random integer from 1 to M with the probability of integer i being wi with the function call "sample(1:M,1,prob=w)", where w is a vector of the M probabilities. You can sample n of them independently with the function call "sample(1:M,n,prob=w,replace=TRUE)". Sampling according to the probabilities in w can also be done by first computing the sums ci = w1 + ... + wi, then sampling a uniform (0,1) random value, u, and finally finding the largest i such that u is less than ci. 7) When converting the log likelihoods you have computed into probabilities for sampling from the M parameteter vectors, you should not, of course, start by exponentiating the log likelihoods - that would defeat the whole point of avoiding floating point overflow/underflow! Instead, you should first find the maximum log likelihood and subtract that maximum from all the log likelihood values. That doesn't change the relative likelihoods. You can then safely exponentiate, since there can't be any overflows, and though there could be some underflows, they won't all underflow.