# 5 Properties of a Random Sample

## 5.2 Sums of Random Variables from a Random Sample

Lemma 5.2.5: $\mathrm{E}(\sum_{i=1}^n g(X_i)) = n(\mathrm{E}g(X_1))$ and $\mathrm{Var}(\sum_{i=1}^n g(X_i)) = n(\mathrm{Var}g(X_1))$

Theorem 5.2.6:

• $\mathrm{E}\bar{X} = \mu$,
• $\mathrm{Var}\bar{X} = \sigma^2/n$,
• $\mathrm{E}S^2 = \sigma^2$.

Theorem 5.2.7: $M_{\bar{X}}(t) = [M_X(t/n)]^n$

Example 5.2.8:Let $X_1, \dots, X_n$ be a random sample from a $n(\mu, \sigma^2)$ population. Then the mgf of the sample mean is

Thus, $\bar{X}$ has a $n(\mu, \sigma^2/n)$ distribution.

Another simple example is given by a $gamma(\alpha, \beta)$ random sameple. Here, we can also easily derive the distribution of the sample mean. The mgf of the sample mean is

which we recognize as the mgf of a $gamma(n\alpha, \beta/n)$, the distribution of $\bar{X}$.

Theorem 5.2.9 If $X$ and $Y$ are independent continuous random variables with pdfs $f_X(x)$ and $f_Y(y)$, then the pdf of $Z = X + Y$ is

Example 5.2.10 (Sum of Cauchy random variables)

Theorem 5.2.11 Suppose $X_1, \dots, X_n$ is a random sample from a pdf or pmf $f(x|\theta)$, where

is a member of an exponential family. Define statistics $T_1, \dots, T_k$ by

If the set ${(w_1(\theta), w_2(\theta), \dots, w_k(\theta))}$ contains an open subset of $R^k$, then the distribution of $(T_1, \dots, T_k)$ is an exponential family of the form

Example 5.2.12 (Sum of Bernoulli random variables)

## 5.3 Sampling from the Normal Distribution

### 5.3.1 Properties of the Sample Mean and Variance

Theorem 5.3.1: Let $X_1, \dots, X_n$ be a random sample from a $n(\mu, \sigma^2)$ distribution, and let $\bar{X}$ and $S^2$. Then

• $\bar{X}$ and $S^2$ are independent random variables,
• $\bar{X}$ has a $n(\mu, \sigma^2/n)$ distribution,
• $(n-1)S^2/\sigma^2$ has a chi squared distribution with $n - 1$ degrees of freedom.

Lemma 5.3.2:

• If $Z$ is a $n(0,1)$ random variable, then $Z^2 \thicksim \chi_1^2$.
• If $X_1, \dots, X_n$ are independent and $X_i \thicksim \chi_{p_i}^2$, then $X_1 + \dots + X_n \thicksim \chi_{p_1 + \dots + p_n}^2$.

Theorem 5.4.3:

and

Theorem 5.4.4:

Theorem 5.4.6:

## 5.5 Convergence Concepts

### 5.5.1 Convergence in Probability

Definition 5.5.1:

or,

Theorem 5.5.2 (Weak Law of Large Numbers):

### 5.5.2 Almost Sure Convergence

Definition 5.5.6:

Theorem 5.5.9 (Strong Law of Large Numbers):

### 5.5.3 Convergence in Distribution

Definition 5.5.10:

Theorem 5.5.14 (Central Limit Theorem): Let $X_1, X_2, \dots$ be a sequence of iid random variables whose mgfs exist in neighborhood of 0 (that is , $M_{X_i}(t)$ exists for $|t| < h$, for some positive $h$). Let $\mathrm{E}X_i = \mu$ and $\mathrm{Var}X_i = \sigma^2 > 0$. (Both $\mu$ and $\sigma^2$ are finite since the mgf exists.) Define $\bar{X_n} = (1/n)\sum_{i=1}^n X_i$. Let $G_n(x)$ denote the cdf of $\sqrt{n}(\bar{X_n}-\mu)/\sigma$. Then, for any $x$, $-\infty < x < \infty$,

## 5.6 Generating a Random Sample

### 5.6.(4) The MCMC methods

Gibbs Sampler

Metropolis Algorithm

# 6 Principles of Data Reduction

## 6.1 Introduction

Three principles of data reduction:

• The Sufficiency Principle
• The Likelihood Principle
• The Equivariance Principle

## 6.2 The Sufficiency Principle

### 6.2.1 Sufficient Statistics

Definition 6.2.1: A statistic $T(X)$ is a sufficient statistic of $\theta$ if the conditional distribution of the sample $X$ given the value of $T(X)$ does not depend on $\theta$.

Theorem 6.2.2: If $p(x | \theta)$ is the joint pdf or pmf of $X$ and $q(t | \theta)$ is the pdf or pmf of $T(X)$, then $T(X)$ is a sufficient statistic for $\theta$ if, for every $x$ in the sample space, the ratio $p(x | \theta)/q(T(X) | \theta)$ is constant as a function of $\theta Theorem 6.2.6 (Factorization Theorem): Let $f(x | \theta)$ denote the joint pdf or pmf of a sample$X$. A statistic$T(X)$is a sufficient statistic for$\theta$if and only if there exist functions $g(x | \theta)$ and$h(x)$such that, for all sample points$x$and all parameter points$\theta$, ### 6.2.2 Minimal Sufficient Statistics ### 6.2.3 Ancillary Statistics Definition 6.2.16: A statistic$S(X)$whose distribution does not depend on the parameter$\theta$is called an ancillary statistic. ### 6.2.4 Sufficient, Ancillary, and Complete Statistics Example 6.2.20 (Ancillary precision) Theorem 6.2.24 (Basu’s Theorem): If T(X) is a complete and minimal sufficient statistic is independent of every ancillary statistic. Theorem 6.2.25 (Complete statistic in the exponential family): Let$X_1, \dots, X_n$be iid observations from an exponential family with pdf or pmf of the form where$\theta = (\theta_1, \dots, \theta_k)$. Then the statistic is complete as long as the parameter space$\Theta$contains an open set in$R^k$. ## 6.3 The Likelihood Principle ## 6.4 The Equivariance Principle # 7 Point Estimation ## 7.1 Introduction This chapter is divided into two parts. The first part deals with methods for finding estimators, and the second part deals with evaluating these estimators. Definition 7.1.1 A point estimator is any function$W(X_1, \dots, X_x)$of a sample; that is, any statistic is a point estimator. ## 7.2 Methods of Finding Estimators ### 7.2.1 Method of Moments ### 7.2.2 Maximum Likelihood Estimators ### 7.2.3 Bayes Estimators ### 7.2.4 The EM Algorithm ## 7.3 Methos of Evaluating Estimators ### 7.3.1 Mean Squared Error Definition 7.3.1 The mean squared error of an estimator$W$of a parameter$\theta$is the function of$\theta$defined by$\mathrm{E}_\theta(W - \theta)^2$. ### 7.3.2 Best Unbiased Estimators Definition 7.3.7 An estimator $W^*$ is a best unbiased estimator of$\tau(\theta)$if it satisfies $E_\theta W^* = \tau(\theta)$ for all$\theta$and, for any other estimator$W$with$E_\theta W = \tau(\theta)$, we have $\mathrm{Var}_\theta W^* \le \mathrm{Var}_\theta W$ for all$\theta$. $W^*$ is also called a uniform minimum variance unbiased estimator (UMVUE) of$\tau(\theta)$. Theorem 7.3.9 (Cramer-Rao Inequality): Let$X_1, \dots, X_n$be a sample with pdf $f(x | \theta)$, and let$W(X) = W(X_1, \dots, X_n)$be any estimator satisfying and Then Corollary 7.3.10 (Cramer-Rao Inequality, iid case) If the assumptions of Theorem 7.3.9 are satisfied and, additionally, if$X_1, \dots, X_n$are iid with pdf $f(x|\theta)$, then Lemma 7.3.11 If $f(x|\theta)$ satisfies (true for an exponential family), then **Corollary 7.3.15 (Attainment Let$X_1, \dots, X_n$be iid $f(x|\theta)$, where $f(x|theta)$ satisfies the conditions of Cramer-Rao Theorem. Let $L(\theta|X)=\prod_{i=1}^n f(x_i|\theta)$ ### 7.3.3 Sufficiency and Unbiasedness Theorem 7.3.17 (Rao-Blackwell) Let W be any unbiased estimator of$\tau(\theta)$, and let T be a sufficient statistic for$\theta$. Define $\phi(T) = \mathrm{E}(W|T)$. Then $\mathrm{E}_\theta \phi(T) = \tau(\theta)$ and $\mathrm{Var}_\theta \phi(T) \le \mathrm{Var}_\theta W$ for all$\theta$; that is,$\phi(T)$is a uniformly better estimator of$\tau(\theta)$. Theorem 7.3.19 If W is a best unbiased estimator of$\tau(\theta)$, then W is unique. Theorem 7.3.20 If$\mathrm{E}_\theta W = \tau(\theta)$, W is the best unbiased estimator of$\tau(\theta)$if and only if W is uncorrelated with all unbiased estimators of 0. Theorem 7.3.23 Let T be a complete sufficient statistic for a parameter$\theta$, and let$\phi(T)$be any estimator based only on T. Then$\phi(T)$is the unique best unbiased estimator of its expected value. ### 7.3.4 Loss Function Optimality layout: post title: “Statistical Inference 5~8” date: 2016-01-11 11:51:00 categories: Math — # Statistical Inference 5~8 # 5 Properties of a Random Sample ## 5.1 Basic Concepts of Random Samples ## 5.2 Sums of Random Variables from a Random Sample Lemma 5.2.5: $\mathrm{E}(\sum_{i=1}^n g(X_i)) = n(\mathrm{E}g(X_1))$ and $\mathrm{Var}(\sum_{i=1}^n g(X_i)) = n(\mathrm{Var}g(X_1))$ Theorem 5.2.6: •$\mathrm{E}\bar{X} = \mu$, •$\mathrm{Var}\bar{X} = \sigma^2/n$, •$\mathrm{E}S^2 = \sigma^2$. Theorem 5.2.7: Example 5.2.8:Let$X_1, \dots, X_n$be a random sample from a$n(\mu, \sigma^2)$population. Then the mgf of the sample mean is Thus,$\bar{X}$has a$n(\mu, \sigma^2/n)$distribution. Another simple example is given by a$gamma(\alpha, \beta)$random sameple. Here, we can also easily derive the distribution of the sample mean. The mgf of the sample mean is which we recognize as the mgf of a$gamma(n\alpha, \beta/n)$, the distribution of$\bar{X}$. Theorem 5.2.9 If$X$and$Y$are independent continuous random variables with pdfs$f_X(x)$and$f_Y(y)$, then the pdf of$Z = X + Y$is Example 5.2.10 (Sum of Cauchy random variables) Theorem 5.2.11 Suppose$X_1, \dots, X_n$is a random sample from a pdf or pmf $f(x|\theta)$, where is a member of an exponential family. Define statistics$T_1, \dots, T_k$by If the set${(w_1(\theta), w_2(\theta), \dots, w_k(\theta))}$contains an open subset of$R^k$, then the distribution of$(T_1, \dots, T_k)$is an exponential family of the form Example 5.2.12 (Sum of Bernoulli random variables) ## 5.3 Sampling from the Normal Distribution ### 5.3.1 Properties of the Sample Mean and Variance Theorem 5.3.1: Let$X_1, \dots, X_n$be a random sample from a$n(\mu, \sigma^2)$distribution, and let$\bar{X}$and$S^2$. Then •$\bar{X}$and$S^2$are independent random variables, •$\bar{X}$has a$n(\mu, \sigma^2/n)$distribution, •$(n-1)S^2/\sigma^2$has a chi squared distribution with$n - 1$degrees of freedom. Lemma 5.3.2: • If$Z$is a$n(0,1)$random variable, then$Z^2 \thicksim \chi_1^2$. • If$X_1, \dots, X_n$are independent and$X_i \thicksim \chi_{p_i}^2$, then$X_1 + \dots + X_n \thicksim \chi_{p_1 + \dots + p_n}^2$. ## 5.4 Order Statistics Theorem 5.4.3: and Theorem 5.4.4: Theorem 5.4.6: ## 5.5 Convergence Concepts ### 5.5.1 Convergence in Probability Definition 5.5.1: or, Theorem 5.5.2 (Weak Law of Large Numbers): ### 5.5.2 Almost Sure Convergence Definition 5.5.6: Theorem 5.5.9 (Strong Law of Large Numbers): ### 5.5.3 Convergence in Distribution Definition 5.5.10: Theorem 5.5.14 (Central Limit Theorem): Let$X_1, X_2, \dots $be a sequence of iid random variables whose mgfs exist in neighborhood of 0 (that is ,$M_{X_i}(t)$exists for $|t| < h$, for some positive$h$). Let$\mathrm{E}X_i = \mu$and$\mathrm{Var}X_i = \sigma^2 > 0$. (Both$\mu$and$\sigma^2$are finite since the mgf exists.) Define$\bar{X_n} = (1/n)\sum_{i=1}^n X_i$. Let$G_n(x)$denote the cdf of$\sqrt{n}(\bar{X_n}-\mu)/\sigma$. Then, for any$x$,$-\infty < x < \infty$, ### 5.5.4 The Delta Method ## 5.6 Generating a Random Sample ### 5.6.1 Direct Methods ### 5.6.2 Indirect Methods ### 5.6.3 The Accept/Reject Algorithm ### 5.6.(4) The MCMC methods Gibbs Sampler Metropolis Algorithm # 6 Principles of Data Reduction ## 6.1 Introduction Three principles of data reduction: • The Sufficiency Principle • The Likelihood Principle • The Equivariance Principle ## 6.2 The Sufficiency Principle ### 6.2.1 Sufficient Statistics Definition 6.2.1: A statistic$T(X)$is a sufficient statistic of$\theta$if the conditional distribution of the sample$X$given the value of$T(X)$does not depend on$\theta$. Theorem 6.2.2: If $p(x | \theta)$ is the joint pdf or pmf of$X$and $q(t | \theta)$ is the pdf or pmf of$T(X)$, then$T(X)$is a sufficient statistic for$\theta$if, for every$x$in the sample space, the ratio $p(x | \theta)/q(T(X) | \theta)$ is constant as a function of$\theta

Theorem 6.2.6 (Factorization Theorem): Let $f(x | \theta)$ denote the joint pdf or pmf of a sample $X$. A statistic $T(X)$ is a sufficient statistic for $\theta$ if and only if there exist functions $g(x | \theta)$ and $h(x)$ such that, for all sample points $x$ and all parameter points $\theta$,

### 6.2.3 Ancillary Statistics

Definition 6.2.16: A statistic $S(X)$ whose distribution does not depend on the parameter $\theta$ is called an ancillary statistic.

### 6.2.4 Sufficient, Ancillary, and Complete Statistics

Example 6.2.20 (Ancillary precision)

Theorem 6.2.24 (Basu’s Theorem): If T(X) is a complete and minimal sufficient statistic is independent of every ancillary statistic.

Theorem 6.2.25 (Complete statistic in the exponential family): Let $X_1, \dots, X_n$ be iid observations from an exponential family with pdf or pmf of the form

where $\theta = (\theta_1, \dots, \theta_k)$. Then the statistic

is complete as long as the parameter space $\Theta$ contains an open set in $R^k$.

# 7 Point Estimation

## 7.1 Introduction

This chapter is divided into two parts. The first part deals with methods for finding estimators, and the second part deals with evaluating these estimators.

Definition 7.1.1 A point estimator is any function $W(X_1, \dots, X_x)$ of a sample; that is, any statistic is a point estimator.

## 7.3 Methos of Evaluating Estimators

### 7.3.1 Mean Squared Error

Definition 7.3.1 The mean squared error of an estimator $W$ of a parameter $\theta$ is the function of $\theta$ defined by $\mathrm{E}_\theta(W - \theta)^2$.

### 7.3.2 Best Unbiased Estimators

Definition 7.3.7 An estimator $W^*$ is a best unbiased estimator of $\tau(\theta)$ if it satisfies $E_\theta W^* = \tau(\theta)$ for all $\theta$ and, for any other estimator $W$ with $E_\theta W = \tau(\theta)$, we have $\mathrm{Var}_\theta W^* \le \mathrm{Var}_\theta W$ for all $\theta$. $W^*$ is also called a uniform minimum variance unbiased estimator (UMVUE) of $\tau(\theta)$.

Theorem 7.3.9 (Cramer-Rao Inequality): Let $X_1, \dots, X_n$ be a sample with pdf $f(x | \theta)$, and let $W(X) = W(X_1, \dots, X_n)$ be any estimator satisfying

and

Then

Corollary 7.3.10 (Cramer-Rao Inequality, iid case) If the assumptions of Theorem 7.3.9 are satisfied and, additionally, if $X_1, \dots, X_n$ are iid with pdf $f(x|\theta)$, then

Lemma 7.3.11 If $f(x|\theta)$ satisfies

(true for an exponential family), then

**Corollary 7.3.15 (Attainment Let $X_1, \dots, X_n$ be iid $f(x|\theta)$, where $f(x|theta)$ satisfies the conditions of Cramer-Rao Theorem. Let $L(\theta|X)=\prod_{i=1}^n f(x_i|\theta)$

### 7.3.3 Sufficiency and Unbiasedness

Theorem 7.3.17 (Rao-Blackwell) Let W be any unbiased estimator of $\tau(\theta)$, and let T be a sufficient statistic for $\theta$. Define $\phi(T) = \mathrm{E}(W|T)$. Then $\mathrm{E}_\theta \phi(T) = \tau(\theta)$ and $\mathrm{Var}_\theta \phi(T) \le \mathrm{Var}_\theta W$ for all $\theta$; that is, $\phi(T)$ is a uniformly better estimator of $\tau(\theta)$.

Theorem 7.3.19 If W is a best unbiased estimator of $\tau(\theta)$, then W is unique.

Theorem 7.3.20 If $\mathrm{E}_\theta W = \tau(\theta)$, W is the best unbiased estimator of $\tau(\theta)$ if and only if W is uncorrelated with all unbiased estimators of 0.

Theorem 7.3.23 Let T be a complete sufficient statistic for a parameter $\theta$, and let $\phi(T)$ be any estimator based only on T. Then $\phi(T)$ is the unique best unbiased estimator of its expected value.

# 8 Hypothesis Testing

## 8.1 Introduction

Definition 8.1.1 A hypothesis is a statement about a population parameter.

Definition 8.1.2 The two complementary hypotheses in a hypothesis testing problem are called null hypothesis and the alternative hypothesis. They are denoted by $H_0$ and $H_1$, respectively.

Definition 8.1.3 A hypothesis testing procedure or hypothesis test is a rule that specifies:

• For which sample values the decision is made to accept $H_0$ as true.
• For which sample values $H_0$ is rejected and $H_1$ is accepted as true.

The subset of the sample space for which $H_0$ will be rejected is called the rejection region or critical region. The complement of the rejection region is called the acceptance region.

## 8.2 Methods of Finding Tests

### 8.2.1 Likelihood Ratio Tests

Definition 8.2.1 The likelihood ratio test statistic for testing $H_0 : \theta \in \Theta_0$ versus $H_1 : \theta \in \Theta_0^c$ is

A likelihood ratio test (LRT) is any test that has a rejection region of the form ${x: \lambda (x) \le c }$, where $c$ is any number satisfying $0 \le c \le 1$.

Example 8.2.2 (Normal LRT) Let $X_1, \dots, X_n$ be a random sample from a $n(\theta, 1)$ population. Consider testing $H_0 : \theta = \theta_0$ versus $\theta \neq \theta_0$.

### 8.2.2 Bayesian Tests

prior distribution -> posterior distribution

### 8.2.3 Union-Intersection and Intersection-Union Tests

Example 8.2.9 (Acceptance sampling)

## 8.3 Methods of Evaluating Tests

### 8.3.1 Error Probabilities and the Power Function

Type I Error and Type II

Accept $H_0$ Reject $H_1$
Truth $H_0$ Correct Type I Error
Truth $H_1$ Type II Error Correct

Definition 8.3.1 The power function of a hypothesis test with rejection region $R$ is the functiuon of $\theta$ defined by $\beta(\theta) = P_\theta(X \in R)$.

Example 8.3.3 (Normal power function) Let $X_1, \dots, X_n$ be a random sample from a $n(\theta, \sigma^2)$ population, $\sigma^2$ known. An LRT of $H_0 : \theta \le \theta_0$ versus $H_1 : \theta > \theta_0$ is a test that rejects $H_0$ if $(\bar{X} - \theta_0) / (\sigma / \sqrt{n}) > c$. The constant $c$ can be any positive number. The power function of this test is

Example 8.3.4 (Coninuation of Example 8.3.3)

Suppose that we want to have a maximum Type I Error probability of $0.1$, and a maximum Type II Error probability of $0.2$ if $\theta \ge \theta_0 + \sigma$. How to choose $c$ and $n$ to achieve these goals.

Goals: $\beta(\theta_0) = 0.1$ and $\beta(\theta_0 + \sigma) = 0.8$.

• By choosing $c = 1.28$, we achieve $\beta(\theta_0) = P(Z > 1.28) = 0.1$
• By choosing $n=5$, we achieve $\beta(\theta_0 + \sigma) = P(Z > 1.28 - \sqrt{n}) = 0.8$

Definition 8.3.5 size $\alpha$ test if $\sup_{\theta \in \Theta_0} \beta(\theta) = \alpha$ Definition 8.3.6 level $\alpha$ test if $\sup_{\theta \in \Theta_0} \beta(\theta) \le \alpha$

Definition 8.3.9 unbiased power function.

### 8.3.2 Most Powerful Tests

Definition 8.3.11 uniformly most powerful class(UMP)

 Theorem 8.3.12 (Neyman-Pearson Lemma) Consider testing $H_0 : \theta = \theta_0$ versus $H_1 : \theta = \theta_1$, where the pdf or pmf corresponding to $\theta_i$ is $f(x \theta_i), i = 0, 1$, using a test with rejection region $R$that satisfies

and

for some $k \ge 0$, and $\alpha = P_{\theta_0}(X \in R).$

Then,

• (Sufficiency) Any test that satisifes these two inequalities is a UMP level $\alpha$ test.
• (Necessity) If there exists a testing satisfying these two inequalities with $k > 0$, then every UMP level $\alpha$ test is a size $\alpha$ test and every UMP level $\alpha$ test satisfies except perhpas on a set A satisifying $P_{\theta_0}(X \in A) = P_{\theta_1}(X \in A) = 0$.

Example 8.3.14 (UMP binomial test) Let $X \sim binmomial(2, \theta)$. We want to test $H_0 : \theta = \frac{1}{2}$ versus $H_1 : \theta = \frac{3}{4}$.

Example 8.3.15 (UMP normal test)

### 8.3.4 p-Values

Definition 8.3.26 A p-value $p(X)$ is a test statistic satisfying $0 \le p(x) \le 1$ for every sample point $x$. Small values of $p(X)$ give evidence that $H_1$ is true. A p-value is valid if, for every $\theta \in \Theta_0$ and every $0 \le \alpha \le 1$,