# 1 Probability Theory

## 1.1 Set Theory

Definition 1.1.1 The set, S, of all possible outcomes of a particular experiment is called the sample space for the experiment.

Definition 1.1.2 An event is any collection of possible outcomes of an experiment, that is, any subset of S(including S itself).

Definition 1.1.5 Two events $A$ and $B$ are disjoint if $A \cap B = \emptyset$. The events $A_1, A_2, \dots$ are pairwise disjoint if $A_i \cap B_j = \emptyset$ for all $i \neq j$.

Definition 1.1.6 If $A_1, A_2, \dots$ are pairwise disjoint and $\cup_{i=1}^ {\infty}A_i = S$, then the collection $A_1, A_2, \dots$ forms a partition of $S$.

## 1.2 Basics of Probability Theory

Definition 1.2.1 A collection of subset of $S$ is called a sigma algebra (or Borel field), denoted by B, if it satisfies the following three properties:

• $\emptyset \in B$.
• If $A \in B$, then $A^c \in B$.
• If $A_1, A_2, \dots \in B$, then $\cup_{i=1}^{\infty}A_i \in B$.

Definition 1.2.4 Given a sample space $S$ and an associated sigma algebra $B$, a probability function $P$ with domain $B$ that satisfies

• $P(A) \ge 0$ for all $A \in B$.
• $P(S) = 1$.
• If $A_1, A_2, \dots \in B$ are pairwise disjoint, then $P(\cup_{i=1}^{\infty}A_i)= \sum_{i=1}^{\infty}P(A_i)$.

# 2 Transformations and Expectations

## 2.1 Distribution of Functions of a Random Variable

Theorem 2.1.4 Let $X$ have pdf $f_X(x)$ and let $Y = g(X)$, where $g$ is a monotone function. Let $X$ and $Y$ be defined by (2.1.7). Suppose that $f_X(x)$ is continuous on $X$ and that $g^{-1}(y)$ has a continuous derivative on $Y$. Then the pdf of $Y$ is given by

## 2.3 Moments and Moment Generating Functions

Definition 2.3.6 Let $X$ be a random variable with cdf $F_X$. The moment generating function (mgf) of $X$ (or $F_X$), denoted by $M_X(t)$, is

**Theorem 2.3.7 If $X$ has mgf $M_X(t)$, then

## 2.4 Differentiating Under an Integral Sign

Theorem 2.4.1 (Leibnitz’s Rule) If $f(x,\theta), a(\theta), b(\theta)$ are differentiable with respect to $\theta$, then

$$\frac{d}{d\theta}\int_{a(\theta)}^{b(\theta)}f(x,\theta)dx=f(b(\theta),\theta)\frac{d}{d\theta}b(\theta) - f(a(\theta),\theta)\frac{d}{d\theta}a(\theta)+\int_{a(\theta)}^{b(\theta)}\frac{d}{d\theta}f(x,\theta)dx$$


Notice that if $a(\theta)$ and $b(\theta)$ are constant, we have a special case of Leibnitz’s Rule:

# 3 Common Families of Distributions

## 3.2 Discrete Distribution

Discrete Uniform Distribution

Hypergeometric Distribution

Binomial Distribution

Poisson Distribution

Negative Binomial Distribution

Geometric Distribution

## 3.3 Continuous Distribution

Uniform Distribution

Gamma Distribution

Normal Distribution

Beta Distribution

Cauchy Distribution

Lognormal Distribution

Exponential Distribution

Double Exponential Distribution

## 3.4 Exponential Families

A family of pdfs or pmfs is called an exponential family if it can be expressed as

## 3.6 Inequalities and Identities

### 3.6.1 Probability Inequalities

Theorem 3.6.1 (Chebychev’s Inequalilty) Let $X$ be a random variable and let $g(x)$ be a nonnegative function. Then, for any $r > 0$,

# 4 Multiple Random Variables

The marginal pmf

## 4.4 Hierarchical Models and Mixture Distributions

Theorem 4.4.3 If X and Y are any two random variables, then

Theorem 4.4.7 For any two random variables X and Y,

## 4.5 Covariance and Correlation

covariance: $\mathrm{Cov}(X,Y) = \mathrm{E}((X - \mu_X)(Y - \mu_Y))$

correlation: $\rho_{XY} = \mathrm{Cov}(X,Y)/(\sigma_X \sigma_Y)$

Theorem 4.5.3 For any random variables $X$ and $Y$,

Theorem 4.5.5 If $X$ and $Y$ are independent random variables, then $\mathrm{Cov}(X,Y) = 0$ and $\rho_{XY} = 0$.

Theorem 4.5.6 If $X$ and $Y$ are any two random variables and $a$ and $b$ are any two constants, then

Theorem 4.5.7 For any random variables $X$ and $Y$,

• $-1 \le \rho_{XY} \le 1$.
• $\rho_{XY}^2 = 1$ if and only if there exist numbers $a \neq 0$ and $b$ such that $P(Y = aX + b) = 1$. If $\rho_{XY}=1$, then $a > 0$, and if $\rho_{XY} = -1$, then $a < 0$.

## 4.7 Inequalities

### 4.7.1 Numerical Inequalities

Lemma 4.7.1 Let a and b be any positive numbers, and let p and q be any positive numbers (necessarily greater than 1) satisfying

Then,

with equality if and only $a^p = b^q$.

Theorem 4.7.2 (Holder’s Inequality) Let X and Y be any two random variables, and let p and q satisfy. Then

Theorem 4.7.5 (Minkowski’s Inequality) Let X and Y be any two random variables. Then for $1 \le p < \infty$,

### 4.7.2 Functional Inequalities

Theorem 4.7.7 (Jensen’s Inequality) For any random variable X, if g(x) is a convex function, then

Theorem 4.7.9 (Covariance Inequality) Let $X$ be any random variable and $g(x)$ and $h(x)$ any functions such that $\mathrm{E}g(X)$, $\mathrm{E}h(X)$, and $\mathrm{E}(g(X)h(X))$ exist.

• If $g(x)$ is a nondecreasing function and $h(x)$ is a nonincreasing function, then

• If $g(x)$ and $h(x)$ are either both nondecreasing or both nonincreasing, then