+++++++++++++++++++++++++++++++++++++++++++++ Elementary Counting and Probability reference -- taken from App. C of CLRS +++++++++++++++++++++++++++++++++++++++++++++ I. Counting (a) Rule of Sum - set of items we want to count - can be expressed as aunion of disjoint sets or - can be expressed as the Cartesian product of sets or - ... ------------ rule of sum: The number of ways to choose an element from 1 of 2 ------------ disjoint sets is the sum of the cardinalities of the sets If ( (A INTERSECT B) == {} ) then |A UNION B| = |A| + |B| (b) Rule of Product The number of ways to choose an _ordered pair_ is: (# of ways to choose 1st element) * (# of ways to choose 2nd element) |A x B| = |A| * |B| (c) Strings: a string over a finite set S is a _sequence_ of elements of S k-string: string of length k substring s' of string s is an ordered sequence of consecutive elements of s k-substring of s: a substring of s of length k k-string over a set S: there are |S|^k strings of length k - e.g. the # of binary k-strings is 2^k To construct a k-string over an n-site, - n ways to choose 1st element - n ways to choose 2nd element - ... - n ways to choose kth element So there are n^k k-strings over an n-set. (d) PERMUTATIONs: a permutation of a finite set S is an ordered sequence of all the elements of S - e.g. if S = {a,b,c}, there are 6 permutations of it - There are n! permutations of a set of n elements - 1st element chosen from n - 2nd element chosen from n-1 - 3rd element chosen from n-2 - ... - nth element chosen from 1 (n) * (n-1) * (n-2) * ... * (1) = n! - a k-permutation of S is an ordered sequence of k elements of S -- no element appears more than once - there are n!/(n-k)! k-permutations of n - e.g. if S = {a,b,c,d}, there are 4!/(4-2)! 2-permutations of it ab,ac,ad,ba,bc,bd,ca,cb,cd,da,db,dc (e) COMBINATIONs a k-combination of an n-set S is a k-subset of S (and since it's a k-subSET, order is irrelevant) - there are: n!/(k! * (n-k)!) k-combinations of an n-set (f) Binomial Coefficients: used with combinations; e.g. "n choose k" - note that: "n choose k" == "n choose (n - k)" (x + y)^n = SUM (from k=0 to n) of "n choose k" * x^k * y^(n-k) - special case is when x = y = 1; then have: 2^n = SUM(from k=0 to n) of "n choose k" -- what we are doing here is counting the binary n-strings by the number of 1s they contain. Let's say, n=4 and k=3; so all binary strings are: 0000 1000 0001 1001 0010 1010 0011 1011 0100 1100 0101 1101 0110 1110 0111 1111 So: "4 choose 3" = 4! / ((3!) * (1!)) = 4 So our four strings are: 0111, 1011, 1101, 1110 And note that the ZERO can appear in four places: the 1st, 2nd, 3rd, or 4th... yielding our four possible satisfying strings. (g) Binomial bounds - sometimes need to bound size of a binomial coefficient (1) for 1 <= k <= 1 Lower bound: "n choose k" >= (n/k)^k Upper bound: "n choose k" <= (en/k)^k (2) for 0 <= k <= n Upper bound: "n choose k" <= n^n / (k^k * (n-k)^(n-k)) II. Probability - define probability in terms of a sample space S - S's elements are "elementary events" - each elementary event is a possible outcome of an experiment If the experiment is flipping two (distinguishable) coins, then S = {HH, HT, TH, TT} An event is a subset of the sample space S. - i.e. the event of getting one head and one tail is {HT,TH} The event S is called the "certain event" and the event {} is called the "null event." Two events A, B are mutually exclusive if (A INTERSECT B) = {} - by definition, all elementary events are mutually exclusive (a) AXIOMS of Probability - A probability distribution on a sample space S is a mapping from events of S to real numbers such that the following axioms are satisfied: (1) Pr[A] >= 0 for any event A (2) Pr[S] = 1 (3) Pr[A UNION B] = Pr[A] + Pr[B] - for any two mutually exclusive events, A and B For any sequence of events, A_1, A_2, ..., that are pairwise mutually exclusive, Pr{ A_1 UNION A_2 UNION ... } = SUM (over all i) Pr[A_i] The null event, {}, has probability 0. If A is a subset of B, then Pr[A] <= Pr[B] !A = the complement of A = S - A Pr[!A] = 1 - Pr[A] For any two events A and B, Pr[A UNION B] = Pr[A] + Pr[B] - Pr[A INTERSECT B] <= Pr[A] + Pr[B] (b) Discrete probability distributions A probability distribution is discrete if it is defined over a countably infinite or finite sample space. Let S be the sample space. Then for any event A, Pr[A] = SUM (over all s in A) Pr[s] - where "s" is an elementary event - and we're only concerned with the s values that satisfy A - e.g. for getting one head and one tail, the values of s would be: HT and TH since that's the subset represented by S - elementary events are by definition MUT-EX, recall. - If S is finite and every elementary event s in S has probability, Pr[s] = 1 / |S| then we have a uniform (probability) distribution So if the sample space S is the outcome of flipping a fair coin n times, then we have: S = {H,T}^n - each elementary event is a string of length n over {H,T} - and each occurs w/probability 1/2^n So the probability that we get exactly k heads and (n-k) tails is: "n choose k" / 2^n (c) Continuous uniform probability distribution - defined over a closed interval of the reals, [a,b], a < b - this includes uncountably many events so cannot assign an equal probability to each b/c too many of them! - and so the probability of any one point within that interval is considered 0; e.g. for [x,x], Pr[x,x] = 0. - so want to associate a probability with some of the subsets of S - for any closed interval, [c,d], where a <= c <= d <= b, the continuous, uniform probability distribution defines the probability of the event [c,d] to be: Pr[[c,d]] = (d - c) / (b - a) - if we just want to consider (c,d) and not point c nor point d, then we have: [c,d] = [c,c] UNION (c,d) UNION [d,d] and since Pr[c,c] = Pr[d,d] = Pr[x,x] = 0 then Pr[[c,d]] = Pr[(c,d)] = (d - c) / (b - a) - the set of events here is any subset of [a,b] that can be obtained by a finite or countable union of open and closed intervals. (d) conditional probability and independence - usually we have some partial knowledge of an event - this should help us to get more accurate probabilities for that event - e.g. if we know a friend has flipped two fair coins and ALSO that she got at least one head, then we can arrive at a more accurate estimate of whether she got 2 heads THAN if we hadn't known about the one. - the conditional probability of an event A given that another event B occurred is: Pr[A | B] = Pr[A INTERSECT B] / Pr[B] - we're only interested in the portion of the sample space that includes the event B (that's why Pr[B] is in the denominator rather than the Pr[S]) -- whenever Pr[B] != 0 - and we're only interested in the events in which A and B occurred since events where only A occurred are no longer relevant since we know FOR SURE that B happened. - this explains the numerator - "the probability of A given B" - Pr[got both heads | got at least one head] = 1/4 / 3/4 = 1/3 - Two events are said to be INDEPENDENT if Pr[A INTERSECT B] = Pr[A] * Pr[B] - which is the same as saying: Pr[A | B] = Pr[A] - a collection of events, A_1, A_2, ..., A_n, is said to be pairwise independent if Pr[A_i INTERSECT A_j] = Pr[A_i] * Pr[A_j] for all 1 <= i < j <= n - a collection of events are mutually independent if every k-subset, A_i1, A_i2, ..., A_ik, of the collection (where 2 <= k <= n and 1 <= i1 < i2 < ... < ik <= n) satisfies: Pr[A_i1 INTERSECT A_i2 INTERSECT ... INTERSECT A_ik] = Pr[A_i1] * Pr[A_i2] * ... * Pr[A_ik] - e.g. flipping two fair coins A_1 = first coin is heads A_2 = second coin is heads A_3 = first and second coin are different Pr[A_1] = 1/2 Pr[A_2] = 1/2 Pr[A_3] = 1/2 Pr[A_1 INTERSECT A_2] = 1/4 Pr[A_1 INTERSECT A_3] = 1/4 Pr[A_2 INTERSECT A_3] = 1/4 Pr[A_1 INTERSECT A_2 INTERSECT A_3] = 0 - the events are pairwise independent - but not mutually independent SINCE Pr[A_1 INTERSECT A_2 INTERSECT A_3] != Pr[A_1] * Pr[A_2] * Pr[A_3] (e) Bayes's Theorem Pr[A INTERSECT B] = Pr[B] * Pr[A | B] = Pr[A] * Pr[B | A] Pr[A | B] = Pr[A] * Pr[B | A] / Pr[B] <---| This is Bayes's theorem B = (B INTERSECT A) UNION (B INTERSECT !A) (B INTERSECT A) INTERSECT (B INTERSECT !A) = {} Pr[B] = Pr[B INTERSECT A] + Pr[B INTERSECT !A] = Pr[A] * Pr[B | A] + Pr[!A] * Pr[B | !A] E.g. imagine we have a fair coin and a biased coin that always comes up heads. - so we pick one of those two physically indistinguishable coins and flip it two times and it comes up heads both times; --> what's the likelihood the coin chosen was the biased one? A = "biased coin is chosen" B = "coin comes up heads both times" want to find: Pr[A | B] Pr[A] = 1/2 = Pr[!A] Pr[B | A] = Pr[A INTERSECT B] / Pr[B] = 1/1 = 1 Pr[B | !A] = Pr[!A INTERSECT B] / Pr[B] = 1/4/1 = 1/4 Pr[A | B] = 1/2 * 1 / ( (1/2 * 1) + (1/2 * 1/4) ) = 4/5 III. Discrete Random Variables A discrete random variable X is a function from a finite or countably infinite sample space to the real numbers. It associates a real # with each possible outcome of an experiement. For a RV X and a real number x, we define the event X = x to be, [s in S : X(s) = x] - the case where the RV takes on the value x The function: f(x) = Pr[X=x] is the probability density function Pr[X=x] <= 0 and the SUM (over all x) Pr[X=x] = 1 - e.g. rolling two, six-sided dice; 36 elementary events in this space; assume uniform probability distribution Define X as the max of the two values shown on the dice. Pr[X=3] = 5/36 b/c for these rolls, (1,3),(2,3),(3,3),(3,1),(3,2), X = 3 is the max value Pr[X=2] = 3/36 (1,2),(2,2),(2,1) Pr[X=1] = 1/36 (1,1) Pr[X=4] = 7/36 (1,4),(2,4),(3,4),(4,4),(4,1),(4,2),(4,3) Pr[X=5] = 9/36 (1,5),(2,5),(3,5),(4,5),(5,5),(5,1),... Pr[X=6] = 11/36 (1,6),(2,6),(3,6),(4,6),(5,6),(6,6) JOINT probability density function; if X and Y are RVs, then f(x,y) = Pr[X=x and Y=y] Pr[Y=y] = SUM (over all x) Pr[X=x and Y=y] -- for fixed y Pr[X=x] = SUM (over all y) Pr[X=x and Y=y] -- for fixed x Pr[X=x | Y=y] = Pr[X=x and Y=y] / Pr[Y=y] We define two RVs X and Y to be independent if for all x and y, the events X=x and Y=y are independent; equivalently, if for all x and y, we have: Pr[X=x and Y=y] = Pr[X=x] * Pr[Y=y] (a) Expected value of an RV == average == mean E[X] = SUM (over all x) x * Pr[X=x] e.g. game where flip 2 fair coins; get +3 for every head and -2 for every tail. E[X] = Pr[HH]*6 + Pr[HT]*1 + Pr[TH]*1 + Pr[TT]*-4 = 6/4 + 1/4 + 1/4 -1 = 1 The expectation of the sum of two RVs is the sum of their expectations. E[X + Y] = E[X] + E[Y] -- we call this "linearity of expectation" -- holds whether X and Y are independent or not. If X is any RV, then g(x) defines a new RV, g(X) E[g(X)] = SUM (over all x) g(x) * Pr[X=x] If g(x) = ax then E[aX] = a * E[X] E[aX + Y] = a*E[X] + E[Y] -- expectations are linear When two RVs X and Y are independent then E[XY] = SUM (over all x) SUM (over all y) x * y * Pr[X=x and Y=y] = SUM (over all x) SUM (over all y) x * y * Pr[X=x] * Pr[Y=y] = (SUM (over all x) x * Pr[X=x]) * (SUM (over all y) Pr[Y=y]) = E[X] * E[Y] When n RVs, X_1, X_2, ..., X_n, are mutually independent, then E[X_1 X_2 ... X_n] = E[X_1] * E[X_2] * ... * E[X_n] If a RV takes on values from the set of natural numbers, N, then E[X] = SUM (over all i from 0 to inf) i * Pr[X=i] = SUM (over all i from 0 to inf) i * (Pr[X>=i] - Pr[X>=i+1) = SUM (over all i from 0 to inf) Pr[X>=i] (b) Variance and Standard deviation variance, intuitively: how far from the mean are a RV's values likely to be? The variance of an RV X with mean E[X] is: Var[X] = E[(X - E[X])^2] = E[X^2 - 2*X*E[X] + E^2[X]] = E[X^2] - 2*E[X*E[X]] + E[E^2[X]] = E[X^2] - 2*E^2[X] + E^2[X] = E[X^2] - E^2[X] Var[a*X] = a^2 * Var[X] When X & Y are independent random variables, Var[X+Y] = Var[X] + Var[Y] - this is true for n RVs if they are pairwise independent The standard deviation of an RV X is the positive square root of the Var[X] IV. Geometric & binomial distributions A coin flip is an instance of a Bernoulli trial: only has two outcomes - success with probability p - failure with probability 1-p = q - when we talk of Bernoulli trialS, the trials are mutually independent and each has the same success probability unless otherwise noted. - 2 distributions arise from the Bernoulli trials (1) the geometric distribution - how many trials occur before we have a success? Let RV X be the # of trials we need to have in order to obtain success Pr[X=k] = q^(k-1) * p for k >= 1 X has values: [1,2,...] -- q^(k-1) == we have (k-1) failures before we have success -- p^1 == our first success A probability distribution expressing this is a geometric distribution. E[X] = SUM (from k = 1 to inf) k * q^(k-1) * p = p/q * SUM (from k = 1 to inf) k * q^k = p/q * q/(1-q)^2 -- see page 1061, CLRS = 1/p -- b/c p = 1-q So on average it takes 1/p trials before we obtain a success. Var[X] = q/p^2 E.g. p = 2/9; q = 7/9; E[X] = 1/(2/9) = 4.5 Var[X] = (7/9) / (2/9)*(2/9) ~= 16 (2) the binomial distribution - how many successes occur in n trials? Let RV X be the # of successes in n trials; so X has values in [0,1,...,n] and k = 0, ..., n Pr[X=k] = "n choose k" * p^k * q^(n-k) - of n trials, choose k successes - take probability of success, raise it to that power - take probability of failure, raise it to difference b/n that power & n A probability distribution expressing this is a binomial distribution. E[X] = SUM (from k = 0 to n) k * Pr[X=k] = SUM (from k=1 to n) k * "n choose k" * p^k * q^(n-k) = np Var[X] = npq