THE BLAG. 1: Research has shown that many things about human perception is Bayesian. Humans accomplish many tasks the way a Bayesian system: that is, the mistakes humans make are similar, qualitatively and quantitatively, to the mistakes the system makes. By looking at the priors and the likelihoods of these Bayesian models, we can learn something about the assumptions made by the brain. We also get to answer the question "how does the brain work." But here's a dumb question: why is the brain Bayesian? Is there a special reason for it? Is the Bayesian nature of our brain is encoded in our DNA? And if so, why? The answer, I claim, is "obvious". For any problem where we have some inputs X and we try to get an output Y that minimizes some loss, the optimal predictor would be equivalent to a very specific Bayesian model. If we assume that every day is i.i.d., then the data can be described by the world's distribution P(X,Y). That which we wish to infer has a prior P(Y), and a it is corrupted by a certain likelihood P(X|Y) to produce X. In this setting, the posterior distribution P(Y|X) contains all the information X has about Y. This is "Bayesian". And here is our argument: since the optimal system is Bayesian, by continuity, nearly-optimal systems are nearly-Bayesian. In particular, if the "X" variable consists of images and audio (eyes and ears), then our system will do the right Bayesian thing regarding the fusion of the sources. That's why it makes sense for our DNA to build Bayesian brains. But since any optimal system is Bayesian, the brain could obtain "Bayesian capabilities" by plain loss minimization. Specifically, if we observe X and we wish to infer Y, and we have lots of training data (X,Y) from the true distribution P, and if we learn a really good function X->Y, then this function will exhibit all the right Bayesian properties. We could even talk about its likelihood and posterior. 2: When should a learning system be Bayesian? There is a very simple answer to that question. If there is lots of data, there is little benefit to a Bayesian model -- a traditional parametric model will do just as well, with less work. A Bayesian model shines when we have a very small dataset (20 training cases) and where we have quite concrete intuitions about how this data is produced. If our intuitions are correct, the resulting Bayesian model will make a very good use of this scarce data. Things like medical trials where volunteers aren't plenty. But there is one large-data regime where Bayesian models will likely be used in one way or another: tasks involving huge numbers of classes will necessarily have very many sparse classes, which will require powerful generalization using something more Bayesian. Like object recognition. Some object categories are plain sparse and nothing can be done about that. Collaborative filtering is an example of such a situation: any model will remain uncertain about rare users and rare movies, and similarly for the sparse categories in large image collections. 3: What do we mean when we say that a problem is "hard"? We say that computer vision is hard. Calculating the partition function in large, interesting models is also hard. Rumor has it that proving P is not NP is hard as well (but I don't really know--never tried). The game of Go is also hard. What do we really mean by these claims? There is only one meaning: many people have tried and failed so far. Vision is hard only because a good vision system does not exist yet. We often say that Go is hard "because of its large state space", but that's only part of its hardness. The size of its state space is of any relevance at all because our best approaches brute-force their way, to a large extent, to find a good move, and such approaches don't like large state spaces. But there are other games with even larger state spaces (for example, strategy computer games), but that alone doesn't make them difficult. There is one imprecision in many learning papers: computing the partition function in a powerful model is hard not because is written as the sum of exponentially many terms, but because there is no (and there will never be an) algorithm that can compute the partition function. In some models, this exponential sum can be calculated using algorithmic trickery, but most models don't have this luck. Indeed, we really don't care if something is a sum of exponentially many terms if there is an algorithm that can get the answer without evaluating most of them. 4: Life's error rate. ------------------------- Knowing, or predicting, the consequences of our actions is something we cannot do (we aren't smart enough, but if we were, we'd be living in a world populated with much smarter entities, so predicting the consequences of our actions would be as impossible as it is now). That's why experience is so valuable. Experience is the equivalent of knowing the future before it happens, although admittedly in a limited domain. An experienced person knows how things turn out to be in a similar situation, and this knowledge allows them to make much better decisions. For example, after starting 2 businesses, starting the third one is probably going to be easier due to the smaller uncertainty. Uncertainty is really scary because as long our brain believes that there's an non-negligible chance of a complete disaster, it's "irrational" to act, unless we are not afraid of complete disasters. So experience is useful, and it wouldn't hurt to get it as quickly as possible. An easy way to ensure that we constantly grow our experience, is by always making mistakes. By constantly making mistakes, a person can be sure that they are doing things they are not absolutely certain of. That's how we push our envelopes. 4: The meaning of life. I claim to completely know the answer to this one (it's simple too), but it won't fit in the margin. 5: Personality. We all know where babies come from but what about older people? How can it be that adults often completely fail to understand and sympathize with their teenage children? The only explanation is that they've become different persons. Sure, the persons are similar, but they are different enough. They have different experiences, different desires, different ambitions, different tastes. Perhaps their personalities too are different. So although an adult and their former teenage-self have many things in common, their differences make them into legitimately different persons. So the answer is that adults "grow out" of teenagers who slowly vanish into nonexistence. In particular, it implies that even if we lived forever, our current self would be a temporary phenomenon that would slowly fade into a different being with different views and tastes. If we really lived forever, since the "set of distinct personalities" is essentially finite, we would eventually "be" the same personality any (every) number of times! 6: Writing about stuff and having a diary. There is claim that writing helps us understand a subject. If we are confused, we should write a clear essay about the topic, and chances are we'll become less confused. Turns out a similar thing is true about papers. If we write them, we understand really what it is that we do and the additional experiments that need to be done, in case we have some uncertainty over it.