(a) i. (1 point) P(Theft = true) = 0.0012 P(Theft = true | RiskAversion = 1) = 0.0036 ii. (1 point) P(Theft = true | RiskAversion = 1, GoodStudent = 1) = 0.0042 iii. (1 point) RiskAversion -- Age -- GoodStudent -- SocioEcon -- AntiTheft -- Theft RiskAversion -- Age -- GoodStudent -- SocioEcon -- HomeBase -- Theft RiskAversion -- Age -- GoodStudent -- SocioEcon -- VehicleYear -- CarValue -- Theft RiskAversion -- Age -- GoodStudent -- SocioEcon -- MakeModel -- CarValue -- Theft (b) (2 points) Number of entries in Insurance joint distribution: 26,091,926,323,200 (2.6092e+013) TAS1: 2.5792e+134 TAS2: 6.7448e+143 (c) (3 points) INSURANCE: Clique 16 (variables 3,4,5,8,9,10,17,24) and Clique 17 (variables 3,4,5,7,8,9,17,24) are each of size 38,400. Cluster 17 (5,9,11,17) has 200 entries. TAS1: 4374 in many cliques containing 7 Segment variables and one Detection variable (e.g. Seg 1-7, Det 47) 18 for cluster graph TAS2: 13122 in many cliques containing 8 Segment variables and one Detection variable (e.g. Seg 1-8, Det 37) 18 for cluster graph (d) (3 points) You can't use the same clique tree or cluster graph because the networks are defined over different variables and have different sets of edges. (e) (4 points) You would use approximate inference when the clique potentials are too large to fit in memory, or when you don't need an exact answer and approximate inference converges quickly and consistently. (f) i. (3 points) No difference for either final marginals or running time. ii. (3 points) No difference in final marginals (approximate inference run on a clique tree is exact), but running time may increase. i. (4 points) May result in different final marginals and different running time. ii. (4 points) The messages do not converge at the same rate. In our implementation, 29->1 converges after just one iteration, 27->54 converges next, and 35->6 converges last. (g) i. (4 points) Our implementation gives Maximum difference: 0.0858 Average difference: 0.0115 (h) (4 points) The detection results are slightly worse when the correct segmentation is observed, though the ROC is similar. Incorrectly observing Segment 2 to be "road" causes false positives to appear on the roof that was incorrectly labeled, and the precision drops while the recall stays the same. (i) (4 points) When observing both the relations and the segmentation labels, all the detections are independent, since there are no edges or active v-structures between them. Therefore, the posterior over T_i when observing the segmentations and the relations is: P(T_i = t | R,S) = (P(S) P(T_i = t) P(R' | S, T_i = t)) / sum[(P(S) P(T_i = y) P(R' | S, T_i = y))] where the sum is over all y in Val(T_i), R' are the R nodes which have T_i as a parent, P(S) is the prior over S and cancels because it appears in both nominator and denominator, and P(T_i = y) is the prior probability that T_i equals y. This inference method is a simple computation of |Val(T_i)| values and normalizing them. The computation is simple: P(R' | S, T_i = y) = product[R_j | S, T_i = y] for R_j in R'. (j) (2 points) Inference does not help segmentation. The posteriors are more confident about roads being roads, but are also very confident about roofs being trees, which is incorrect.