![]() |
CS228: Probabilistic Methods in AI Winter 2008 Weekly Quiz |
| Overall Score Statistics: | ||||
|---|---|---|---|---|
| Mean | Median | Mode | Lowest | Highest |
| 80.35% | 80% | 80% (15) | 50% | 100% |
| Frequency of Scores | |
|---|---|
| Score | Number of Students |
| 100% | 10 |
| 90% | 12 |
| 80% | 15 |
| 70% | 12 |
| 60% | 6 |
| 50% | 2 |
| Question
1: Which of the following is NOT true about constraint based approaches
to learning graph structures? Answers: 1. They test for dependencies and independencies 2. They are robust to failures in individual independence tests 3. They try to find an equivalence of networks that best explain found dependencies and independencies | |||
|---|---|---|---|
| Type | Answer | Responses | Percent |
| Correct: | They are robust to failures in individual independence tests | 54 | 95% |
| Distractor: | They test for dependencies and independencies | 0 | 0% |
| Distractor: | They try to find an equivalence of networks that best explain found dependencies and independencies | 3 | 5% |
| Question
2: Which of the following terms could be penalized in a structure prior
that satisfies structure modularity? A) the number of loops in the
graph. B) the number of independent parameters in the graph. C) the
number of v-structures in the graph. D) the minimum induced width of
the graph. Answers: 1. all of them 2. A) and B) only 3. A) B) and C) only 4. B) and C) only | |||
| Type | Answer | Responses | Percent |
| Correct: | B) and C) only | 42 | 74% |
| Distractor: | all of them | 5 | 9% |
| Distractor: | A) and B) only | 3 | 5% |
| Distractor: | A) B) and C) only | 7 | 12% |
| Question
3: Learning Bayesian Network parameters with missing data (partially
observed instances) is more difficult for which of the following
reasons? Answers: 1. We require more training data, because we must throw out all incomplete instances. 2. We lose local decomposition, whereby each CPD can be estimated independently. 3. While there is still always a single optimal value for the parameters, it can only be found using an iterative method. | |||
| Type | Answer | Responses | Percent |
| Correct: | We lose local decomposition, whereby each CPD can be estimated independently. | 56 | 98% |
| Distractor: | We require more training data, because we must throw out all incomplete instances. | 0 | 0% |
| Distractor: | While there is still always a single optimal value for the parameters, it can only be found using an iterative method. | 1 | 2% |
| Question
4: In a Bayesian Network with partially observed training data,
computing the likelihood of observed data for a given set of
parameters: Answers: 1. Requires probabilistic inference, AS IN the case of fully observed data. 2. Cannot be achieved by probabilistic inference, while it CAN in the case of fully observed data. 3. Requires probabilistic inference, while it DOES NOT in the case of fully observed data. | |||
| Type | Answer | Responses | Percent |
| Correct: | Requires probabilistic inference, while it DOES NOT in the case of fully observed data. | 44 | 77% |
| Distractor: | Requires probabilistic inference, AS IN the case of fully observed data. | 13 | 23% |
| Distractor: | Cannot be achieved by probabilistic inference, while it CAN in the case of fully observed data. | 0 | 0% |
| Question 5: Models with hidden (always unobserved) variables: Answers: 1. Are never identifiable. 2. Are always identifiable. 3. Are sometimes identifiable. 4. Are the only models that might be non-identifiable. | |||
| Type | Answer | Responses | Percent |
| Correct: | Are never identifiable. | 51 | 89% |
| Distractor: | Are always identifiable. | 0 | 0% |
| Distractor: | Are sometimes identifiable. | 6 | 11% |
| Distractor: | Are the only models that might be non-identifiable. | 0 | 0% |
| Question 6: Suppose we are given data generated from the Bayesian network Answers: 1. The data is completely observed. 2. The parameter prior factors as | |||
| Type | Answer | Responses | Percent |
| Correct: | Both (A) and (B) must hold. | 31 | 54% |
| Distractor: | The data is completely observed. | 6 | 11% |
| Distractor: | The parameter prior factors as <IMG WIDTH="163" HEIGHT="32" ALIGN="MIDDLE" BORDER="0" SRC="http://www.stanford.edu/class/cs228/Images/Quiz4/img23.png" ALT="$P(\theta_X) P(\theta_{Y \mid X}) P(\theta_{Z \mid Y})$">. | 1 | 2% |
| Distractor: | Either (A) or (B) must hold. | 19 | 33% |
Question
7: Given the network G (shown below), and the data instances shown
below, how do I compute the expected sufficient statistic for a
particular value of the parameters? Answers: 1. 2. 3. 4. | |||
| Type | Answer | Responses | Percent |
| Correct: | <img src="../Images/Quiz6/Q4-A2.jpg"> | 40 | 70% |
| Distractor: | <img src="../Images/Quiz6/Q4-A1.jpg"> | 0 | 0% |
| Distractor: | <img src="../Images/Quiz6/Q4-A3.jpg"> | 5 | 9% |
| Distractor: | <img src="../Images/Quiz6/Q4-A4.jpg"> | 12 | 21% |
| Question
8: EM Algorithm: E-step. We use the posterior probability when we
compute our expected sufficient statistics rather than the prior
because: Answers: 1. The posterior is usually easier to compute. 2. We cannot compute coherent expected sufficient statistics using the prior. 3. The posterior takes the observed data into account, in addition to the current parameter estimates. | |||
| Type | Answer | Responses | Percent |
| Correct: | The posterior takes the observed data into account, in addition to the current parameter estimates. | 54 | 95% |
| Distractor: | The posterior is usually easier to compute. | 1 | 2% |
| Distractor: | We cannot compute coherent expected sufficient statistics using the prior. | 2 | 4% |
| Question 9: What is the difference between hard-assignment EM and soft- assignment EM for Bayesian Networks? Answers: 1. (a) The objective of hard-assignment EM involves both learned parameters and the learned assignment that completes the data D+ 2. (b) In the E-step, hard-assignment EM optimizes the values of the hidden variables, while soft-assignment EM optimizes distribution over the hidden variables 3. (c) Hard-assignment EM converges to a global maximum of its objective function, which is different from the objective for soft-assignment EM. 4. (a) and (b) 5. (b) and (c) | |||
| Type | Answer | Responses | Percent |
| Correct: | (a) and (b) | 44 | 77% |
| Distractor: | (a) The objective of hard-assignment EM involves both learned parameters and the learned assignment that completes the data <i>D<sup>+</sup></i> | 5 | 9% |
| Distractor: | (b) In the E-step, hard-assignment EM optimizes the values of the hidden variables, while soft-assignment EM optimizes distribution over the hidden variables | 4 | 7% |
| Distractor: | (c) Hard-assignment EM converges to a global maximum of its objective function, which is different from the objective for soft-assignment EM. | 2 | 4% |
| Distractor: | (b) and (c) | 2 | 4% |
Question
10: Starting with the network shown below, if we remove the hidden
class variable H but still capture the dependencies in the network, the
number of independent parameters for the model will be Answers: 1. Exponentially larger than the original model 2. Polynomially larger than the original model 3. The same as the original model | |||
| Type | Answer | Responses | Percent |
| Correct: | Exponentially larger than the original model | 42 | 74% |
| Distractor: | Polynomially larger than the original model | 8 | 14% |
| Distractor: | The same as the original model | 7 | 12% |