Under Construction.
By doing the above procedure repetitively in the hierarchy, we end up with what is called a tree structured graphical model. At each node in the graphical model we have a conditional probability distribution - probability of occurence of children given the parent. At the very top of the model is the object concepts. At the very bottom is the input. Figure ** shows the schematic of a graphical model.
Given a graphical model, we an perform inference. This means that, given any input, we can find out what is the most likely explanation of that input in our model. More formally, given any set of observations E = .., we can find the Maximum a-posteriori explanation of the combination of all internal variable states that best explains the observations. This is given by
Remember that this is what we wanted to do when we started out. Given an image, we wanted to identify the cause. The graphical model representation allows us to do this at each level of the hierarchy. Although this is a computation of global nature, it can be done using local message passing. That algorithm, derived by Judea Pearl, is well known in the machine learning community and its known as Bayesian Belief Propagation.
Where do the object concepts (the category labels) at the highest level come from? My experiments indicate the objects can be learned at the highest level by repeating the above method. However, all objects need not be learned in such an unsupervised fashion. Many instances of human learning happens with a supervisor. In such cases, the context for a high level object could come from another modality, for example hearing or touch.