In this version of the system, we train the level 2 in a supervised fashion. We tell each level2 region the label of the image category that its watching now. You can think of this supervision as coming from a single Level 3 region. The same procedure of obtaining the conditional probabilities for Level 1 is repeated for Level 2.