CS228: Probabilistic Methods in AI
Winter 2008
Weekly Quiz


Overall Score Statistics:
Mean Median Mode LowestHighest
80.35% 80% 80% (15) 50% 100%

Score Frequency Table
Frequency of Scores
ScoreNumber of Students
100%10
90%12
80%15
70%12
60%6
50%2

Detailed Question/Answer Statistics:
Question 1: Which of the following is NOT true about constraint based approaches to learning graph structures?
Answers: 1. They test for dependencies and independencies 2. They are robust to failures in individual independence tests 3. They try to find an equivalence of networks that best explain found dependencies and independencies
TypeAnswerResponsesPercent
Correct:They are robust to failures in individual independence tests5495%
Distractor:They test for dependencies and independencies00%
Distractor:They try to find an equivalence of networks that best explain found dependencies and independencies35%
Question 2: Which of the following terms could be penalized in a structure prior that satisfies structure modularity? A) the number of loops in the graph. B) the number of independent parameters in the graph. C) the number of v-structures in the graph. D) the minimum induced width of the graph.
Answers: 1. all of them 2. A) and B) only 3. A) B) and C) only 4. B) and C) only
TypeAnswerResponsesPercent
Correct:B) and C) only4274%
Distractor:all of them59%
Distractor:A) and B) only35%
Distractor:A) B) and C) only712%
Question 3: Learning Bayesian Network parameters with missing data (partially observed instances) is more difficult for which of the following reasons?
Answers: 1. We require more training data, because we must throw out all incomplete instances. 2. We lose local decomposition, whereby each CPD can be estimated independently. 3. While there is still always a single optimal value for the parameters, it can only be found using an iterative method.
TypeAnswerResponsesPercent
Correct:We lose local decomposition, whereby each CPD can be estimated independently.5698%
Distractor:We require more training data, because we must throw out all incomplete instances.00%
Distractor:While there is still always a single optimal value for the parameters, it can only be found using an iterative method.12%
Question 4: In a Bayesian Network with partially observed training data, computing the likelihood of observed data for a given set of parameters:
Answers: 1. Requires probabilistic inference, AS IN the case of fully observed data. 2. Cannot be achieved by probabilistic inference, while it CAN in the case of fully observed data. 3. Requires probabilistic inference, while it DOES NOT in the case of fully observed data.
TypeAnswerResponsesPercent
Correct:Requires probabilistic inference, while it DOES NOT in the case of fully observed data.4477%
Distractor:Requires probabilistic inference, AS IN the case of fully observed data.1323%
Distractor:Cannot be achieved by probabilistic inference, while it CAN in the case of fully observed data.00%
Question 5: Models with hidden (always unobserved) variables:
Answers: 1. Are never identifiable. 2. Are always identifiable. 3. Are sometimes identifiable. 4. Are the only models that might be non-identifiable.
TypeAnswerResponsesPercent
Correct:Are never identifiable.5189%
Distractor:Are always identifiable.00%
Distractor:Are sometimes identifiable.611%
Distractor:Are the only models that might be non-identifiable.00%
Question 6: Suppose we are given data generated from the Bayesian network $X \rightarrow Y  \rightarrow Z$. Under what circumstances can we perform a Bayesian estimate of the parameters $\theta_X$, $\theta_{Y \mid X}$, and $\theta_{Z \mid Y}$ independently?
Answers: 1. The data is completely observed. 2. The parameter prior factors as $P(\theta_X) P(\theta_{Y \mid X}) P(\theta_{Z \mid Y})$. 3. Both (A) and (B) must hold. 4. Either (A) or (B) must hold.
TypeAnswerResponsesPercent
Correct:Both (A) and (B) must hold.3154%
Distractor:The data is completely observed.611%
Distractor:The parameter prior factors as <IMG WIDTH="163" HEIGHT="32" ALIGN="MIDDLE" BORDER="0" SRC="http://www.stanford.edu/class/cs228/Images/Quiz4/img23.png" ALT="$P(\theta_X) P(\theta_{Y \mid X}) P(\theta_{Z \mid Y})$">.12%
Distractor:Either (A) or (B) must hold.1933%
Question 7: Given the network G (shown below), and the data instances shown below, how do I compute the expected sufficient statistic for a particular value of the parameters?

Answers: 1. 2. 3. 4.
TypeAnswerResponsesPercent
Correct:<img src="../Images/Quiz6/Q4-A2.jpg">4070%
Distractor:<img src="../Images/Quiz6/Q4-A1.jpg">00%
Distractor:<img src="../Images/Quiz6/Q4-A3.jpg">59%
Distractor:<img src="../Images/Quiz6/Q4-A4.jpg">1221%
Question 8: EM Algorithm: E-step. We use the posterior probability when we compute our expected sufficient statistics rather than the prior because:
Answers: 1. The posterior is usually easier to compute. 2. We cannot compute coherent expected sufficient statistics using the prior. 3. The posterior takes the observed data into account, in addition to the current parameter estimates.
TypeAnswerResponsesPercent
Correct:The posterior takes the observed data into account, in addition to the current parameter estimates.5495%
Distractor:The posterior is usually easier to compute.12%
Distractor:We cannot compute coherent expected sufficient statistics using the prior.24%
Question 9: What is the difference between hard-assignment EM and soft- assignment EM for Bayesian Networks?
Answers: 1. (a) The objective of hard-assignment EM involves both learned parameters and the learned assignment that completes the data D+ 2. (b) In the E-step, hard-assignment EM optimizes the values of the hidden variables, while soft-assignment EM optimizes distribution over the hidden variables 3. (c) Hard-assignment EM converges to a global maximum of its objective function, which is different from the objective for soft-assignment EM. 4. (a) and (b) 5. (b) and (c)
TypeAnswerResponsesPercent
Correct:(a) and (b)4477%
Distractor:(a) The objective of hard-assignment EM involves both learned parameters and the learned assignment that completes the data <i>D<sup>+</sup></i>59%
Distractor:(b) In the E-step, hard-assignment EM optimizes the values of the hidden variables, while soft-assignment EM optimizes distribution over the hidden variables47%
Distractor:(c) Hard-assignment EM converges to a global maximum of its objective function, which is different from the objective for soft-assignment EM.24%
Distractor:(b) and (c)24%
Question 10: Starting with the network shown below, if we remove the hidden class variable H but still capture the dependencies in the network, the number of independent parameters for the model will be
Answers: 1. Exponentially larger than the original model 2. Polynomially larger than the original model 3. The same as the original model
TypeAnswerResponsesPercent
Correct:Exponentially larger than the original model4274%
Distractor:Polynomially larger than the original model814%
Distractor:The same as the original model712%




CS228 Class Page
Powered by QuizTest v3.0.31