# Statistical Inference 5~8

# Statistical Inference 5~8

# 5 Properties of a Random Sample

## 5.1 Basic Concepts of Random Samples

## 5.2 Sums of Random Variables from a Random Sample

**Lemma 5.2.5**:
and

**Theorem 5.2.6**:

- $\mathrm{E}\bar{X} = \mu$,
- $\mathrm{Var}\bar{X} = \sigma^2/n$,
- $\mathrm{E}S^2 = \sigma^2$.

**Theorem 5.2.7**:

**Example 5.2.8**:Let $X_1, \dots, X_n$ be a random sample from a $n(\mu, \sigma^2)$ population. Then the mgf of the sample mean is

Thus, $\bar{X}$ has a $n(\mu, \sigma^2/n)$ distribution.

Another simple example is given by a $gamma(\alpha, \beta)$ random sameple. Here, we can also easily derive the distribution of the sample mean. The mgf of the sample mean is

which we recognize as the mgf of a $gamma(n\alpha, \beta/n)$, the distribution of $\bar{X}$.

**Theorem 5.2.9** If $X$ and $Y$ are independent continuous random variables with pdfs $f_X(x)$ and $f_Y(y)$, then the pdf of $Z = X + Y$ is

**Example 5.2.10 (Sum of Cauchy random variables)**

**Theorem 5.2.11** Suppose $X_1, \dots, X_n$ is a random sample from a pdf or pmf `\(f(x|\theta)\)`

, where

is a member of an exponential family. Define statistics $T_1, \dots, T_k$ by

If the set ${(w_1(\theta), w_2(\theta), \dots, w_k(\theta))}$ contains an open subset of $R^k$, then the distribution of $(T_1, \dots, T_k)$ is an exponential family of the form

**Example 5.2.12 (Sum of Bernoulli random variables)**

## 5.3 Sampling from the Normal Distribution

### 5.3.1 Properties of the Sample Mean and Variance

**Theorem 5.3.1**: Let $X_1, \dots, X_n$ be a random sample from a $n(\mu, \sigma^2)$ distribution, and let $\bar{X}$ and $S^2$. Then

- $\bar{X}$ and $S^2$ are independent random variables,
- $\bar{X}$ has a $n(\mu, \sigma^2/n)$ distribution,
- $(n-1)S^2/\sigma^2$ has a chi squared distribution with $n - 1$ degrees of freedom.

**Lemma 5.3.2**:

- If $Z$ is a $n(0,1)$ random variable, then $Z^2 \thicksim \chi_1^2$.
- If $X_1, \dots, X_n$ are independent and $X_i \thicksim \chi_{p_i}^2$, then $X_1 + \dots + X_n \thicksim \chi_{p_1 + \dots + p_n}^2$.

## 5.4 Order Statistics

**Theorem 5.4.3**:

and

**Theorem 5.4.4**:

**Theorem 5.4.6**:

## 5.5 Convergence Concepts

### 5.5.1 Convergence in Probability

**Definition 5.5.1**:

or,

**Theorem 5.5.2 (Weak Law of Large Numbers)**:

### 5.5.2 Almost Sure Convergence

**Definition 5.5.6**:

**Theorem 5.5.9 (Strong Law of Large Numbers)**:

### 5.5.3 Convergence in Distribution

**Definition 5.5.10**:

**Theorem 5.5.14 (Central Limit Theorem)**: Let $X_1, X_2, \dots $ be a sequence of iid random variables whose mgfs exist in neighborhood of 0 (that is , $M_{X_i}(t)$ exists for `\(|t| < h\)`

, for some positive $h$). Let $\mathrm{E}X_i = \mu$ and $\mathrm{Var}X_i = \sigma^2 > 0$. (Both $\mu$ and $\sigma^2$ are finite since the mgf exists.) Define $\bar{X_n} = (1/n)\sum_{i=1}^n X_i$. Let $G_n(x)$ denote the cdf of $\sqrt{n}(\bar{X_n}-\mu)/\sigma$. Then, for any $x$, $-\infty < x < \infty$,

### 5.5.4 The Delta Method

## 5.6 Generating a Random Sample

### 5.6.1 Direct Methods

### 5.6.2 Indirect Methods

### 5.6.3 The Accept/Reject Algorithm

### 5.6.(4) The MCMC methods

**Gibbs Sampler**

**Metropolis Algorithm**

# 6 Principles of Data Reduction

## 6.1 Introduction

Three principles of data reduction:

- The Sufficiency Principle
- The Likelihood Principle
- The Equivariance Principle

## 6.2 The Sufficiency Principle

### 6.2.1 Sufficient Statistics

**Definition 6.2.1**: A statistic $T(X)$ is a sufficient statistic of $\theta$ if the conditional distribution of the sample $X$ given the value of $T(X)$ does not depend on $\theta$.

**Theorem 6.2.2**: If `\(p(x | \theta)\)`

is the joint pdf or pmf of $X$ and `\(q(t | \theta)\)`

is the pdf or pmf of $T(X)$, then $T(X)$ is a sufficient statistic for $\theta$ if, for every $x$ in the sample space, the ratio `\(p(x | \theta)/q(T(X) | \theta)\)`

is constant as a function of $\theta

**Theorem 6.2.6 (Factorization Theorem)**: Let `\(f(x | \theta)\)`

denote the joint pdf or pmf of a sample $X$. A statistic $T(X)$ is a sufficient statistic for $\theta$ if and only if there exist functions `\(g(x | \theta)\)`

and $h(x)$ such that, for all sample points $x$ and all parameter points $\theta$,

### 6.2.2 Minimal Sufficient Statistics

### 6.2.3 Ancillary Statistics

**Definition 6.2.16**: A statistic $S(X)$ whose distribution does not depend on the parameter $\theta$ is called an ancillary statistic.

### 6.2.4 Sufficient, Ancillary, and Complete Statistics

**Example 6.2.20 (Ancillary precision)**

**Theorem 6.2.24 (Basu’s Theorem)**: If T(X) is a complete and minimal sufficient statistic is independent of every ancillary statistic.

**Theorem 6.2.25 (Complete statistic in the exponential family)**: Let $X_1, \dots, X_n$ be iid observations from an exponential family with pdf or pmf of the form

where $\theta = (\theta_1, \dots, \theta_k)$. Then the statistic

is complete as long as the parameter space $\Theta$ contains an open set in $R^k$.

## 6.3 The Likelihood Principle

## 6.4 The Equivariance Principle

# 7 Point Estimation

## 7.1 Introduction

This chapter is divided into two parts. The first part deals with methods for finding estimators, and the second part deals with evaluating these estimators.

**Definition 7.1.1** A point estimator is any function $W(X_1, \dots, X_x)$ of a sample; that is, any statistic is a point estimator.

## 7.2 Methods of Finding Estimators

### 7.2.1 Method of Moments

### 7.2.2 Maximum Likelihood Estimators

### 7.2.3 Bayes Estimators

### 7.2.4 The EM Algorithm

## 7.3 Methos of Evaluating Estimators

### 7.3.1 Mean Squared Error

**Definition 7.3.1** The mean squared error of an estimator $W$ of a parameter $\theta$ is the function of $\theta$ defined by $\mathrm{E}_\theta(W - \theta)^2$.

### 7.3.2 Best Unbiased Estimators

**Definition 7.3.7** An estimator `$W^*$`

is a best unbiased estimator of $\tau(\theta)$ if it satisfies `$E_\theta W^* = \tau(\theta)$`

for all $\theta$ and, for any other estimator $W$ with $E_\theta W = \tau(\theta)$, we have `$\mathrm{Var}_\theta W^* \le \mathrm{Var}_\theta W$`

for all $\theta$. `$W^*$`

is also called a uniform minimum variance unbiased estimator (UMVUE) of $\tau(\theta)$.

**Theorem 7.3.9 (Cramer-Rao Inequality)**: Let $X_1, \dots, X_n$ be a sample with pdf `$f(x | \theta)$`

, and let $W(X) = W(X_1, \dots, X_n)$ be any estimator satisfying

and

Then

**Corollary 7.3.10 (Cramer-Rao Inequality, iid case)** If the assumptions of Theorem 7.3.9 are satisfied and, additionally, if $X_1, \dots, X_n$ are iid with pdf `\(f(x|\theta)\)`

, then

**Lemma 7.3.11** If `\(f(x|\theta)\)`

satisfies

(true for an exponential family), then

**Corollary 7.3.15 (Attainment Let $X_1, \dots, X_n$ be iid `\(f(x|\theta)\)`

, where `\(f(x|theta)\)`

satisfies the conditions of Cramer-Rao Theorem. Let `\(L(\theta|X)=\prod_{i=1}^n f(x_i|\theta)\)`

### 7.3.3 Sufficiency and Unbiasedness

**Theorem 7.3.17 (Rao-Blackwell)** Let W be any unbiased estimator of $\tau(\theta)$, and let T be a sufficient statistic for $\theta$. Define `\(\phi(T) = \mathrm{E}(W|T)\)`

. Then `$\mathrm{E}_\theta \phi(T) = \tau(\theta)$`

and `$\mathrm{Var}_\theta \phi(T) \le \mathrm{Var}_\theta W$`

for all $\theta$; that is, $\phi(T)$ is a uniformly better estimator of $\tau(\theta)$.

**Theorem 7.3.19** If W is a best unbiased estimator of $\tau(\theta)$, then W is unique.

**Theorem 7.3.20** If $\mathrm{E}_\theta W = \tau(\theta)$, W is the best unbiased estimator of $\tau(\theta)$ if and only if W is uncorrelated with all unbiased estimators of 0.

**Theorem 7.3.23** Let T be a complete sufficient statistic for a parameter $\theta$, and let $\phi(T)$ be any estimator based only on T. Then $\phi(T)$ is the unique best unbiased estimator of its expected value.

### 7.3.4 Loss Function Optimality

layout: post title: “Statistical Inference 5~8” date: 2016-01-11 11:51:00 categories: Math —

# Statistical Inference 5~8

# 5 Properties of a Random Sample

## 5.1 Basic Concepts of Random Samples

## 5.2 Sums of Random Variables from a Random Sample

**Lemma 5.2.5**:
and

**Theorem 5.2.6**:

- $\mathrm{E}\bar{X} = \mu$,
- $\mathrm{Var}\bar{X} = \sigma^2/n$,
- $\mathrm{E}S^2 = \sigma^2$.

**Theorem 5.2.7**:

**Example 5.2.8**:Let $X_1, \dots, X_n$ be a random sample from a $n(\mu, \sigma^2)$ population. Then the mgf of the sample mean is

Thus, $\bar{X}$ has a $n(\mu, \sigma^2/n)$ distribution.

Another simple example is given by a $gamma(\alpha, \beta)$ random sameple. Here, we can also easily derive the distribution of the sample mean. The mgf of the sample mean is

which we recognize as the mgf of a $gamma(n\alpha, \beta/n)$, the distribution of $\bar{X}$.

**Theorem 5.2.9** If $X$ and $Y$ are independent continuous random variables with pdfs $f_X(x)$ and $f_Y(y)$, then the pdf of $Z = X + Y$ is

**Example 5.2.10 (Sum of Cauchy random variables)**

**Theorem 5.2.11** Suppose $X_1, \dots, X_n$ is a random sample from a pdf or pmf `\(f(x|\theta)\)`

, where

is a member of an exponential family. Define statistics $T_1, \dots, T_k$ by

If the set ${(w_1(\theta), w_2(\theta), \dots, w_k(\theta))}$ contains an open subset of $R^k$, then the distribution of $(T_1, \dots, T_k)$ is an exponential family of the form

**Example 5.2.12 (Sum of Bernoulli random variables)**

## 5.3 Sampling from the Normal Distribution

### 5.3.1 Properties of the Sample Mean and Variance

**Theorem 5.3.1**: Let $X_1, \dots, X_n$ be a random sample from a $n(\mu, \sigma^2)$ distribution, and let $\bar{X}$ and $S^2$. Then

- $\bar{X}$ and $S^2$ are independent random variables,
- $\bar{X}$ has a $n(\mu, \sigma^2/n)$ distribution,
- $(n-1)S^2/\sigma^2$ has a chi squared distribution with $n - 1$ degrees of freedom.

**Lemma 5.3.2**:

- If $Z$ is a $n(0,1)$ random variable, then $Z^2 \thicksim \chi_1^2$.
- If $X_1, \dots, X_n$ are independent and $X_i \thicksim \chi_{p_i}^2$, then $X_1 + \dots + X_n \thicksim \chi_{p_1 + \dots + p_n}^2$.

## 5.4 Order Statistics

**Theorem 5.4.3**:

and

**Theorem 5.4.4**:

**Theorem 5.4.6**:

## 5.5 Convergence Concepts

### 5.5.1 Convergence in Probability

**Definition 5.5.1**:

or,

**Theorem 5.5.2 (Weak Law of Large Numbers)**:

### 5.5.2 Almost Sure Convergence

**Definition 5.5.6**:

**Theorem 5.5.9 (Strong Law of Large Numbers)**:

### 5.5.3 Convergence in Distribution

**Definition 5.5.10**:

**Theorem 5.5.14 (Central Limit Theorem)**: Let $X_1, X_2, \dots $ be a sequence of iid random variables whose mgfs exist in neighborhood of 0 (that is , $M_{X_i}(t)$ exists for `\(|t| < h\)`

, for some positive $h$). Let $\mathrm{E}X_i = \mu$ and $\mathrm{Var}X_i = \sigma^2 > 0$. (Both $\mu$ and $\sigma^2$ are finite since the mgf exists.) Define $\bar{X_n} = (1/n)\sum_{i=1}^n X_i$. Let $G_n(x)$ denote the cdf of $\sqrt{n}(\bar{X_n}-\mu)/\sigma$. Then, for any $x$, $-\infty < x < \infty$,

### 5.5.4 The Delta Method

## 5.6 Generating a Random Sample

### 5.6.1 Direct Methods

### 5.6.2 Indirect Methods

### 5.6.3 The Accept/Reject Algorithm

### 5.6.(4) The MCMC methods

**Gibbs Sampler**

**Metropolis Algorithm**

# 6 Principles of Data Reduction

## 6.1 Introduction

Three principles of data reduction:

- The Sufficiency Principle
- The Likelihood Principle
- The Equivariance Principle

## 6.2 The Sufficiency Principle

### 6.2.1 Sufficient Statistics

**Definition 6.2.1**: A statistic $T(X)$ is a sufficient statistic of $\theta$ if the conditional distribution of the sample $X$ given the value of $T(X)$ does not depend on $\theta$.

**Theorem 6.2.2**: If `\(p(x | \theta)\)`

is the joint pdf or pmf of $X$ and `\(q(t | \theta)\)`

is the pdf or pmf of $T(X)$, then $T(X)$ is a sufficient statistic for $\theta$ if, for every $x$ in the sample space, the ratio `\(p(x | \theta)/q(T(X) | \theta)\)`

is constant as a function of $\theta

**Theorem 6.2.6 (Factorization Theorem)**: Let `\(f(x | \theta)\)`

denote the joint pdf or pmf of a sample $X$. A statistic $T(X)$ is a sufficient statistic for $\theta$ if and only if there exist functions `\(g(x | \theta)\)`

and $h(x)$ such that, for all sample points $x$ and all parameter points $\theta$,

### 6.2.2 Minimal Sufficient Statistics

### 6.2.3 Ancillary Statistics

**Definition 6.2.16**: A statistic $S(X)$ whose distribution does not depend on the parameter $\theta$ is called an ancillary statistic.

### 6.2.4 Sufficient, Ancillary, and Complete Statistics

**Example 6.2.20 (Ancillary precision)**

**Theorem 6.2.24 (Basu’s Theorem)**: If T(X) is a complete and minimal sufficient statistic is independent of every ancillary statistic.

**Theorem 6.2.25 (Complete statistic in the exponential family)**: Let $X_1, \dots, X_n$ be iid observations from an exponential family with pdf or pmf of the form

where $\theta = (\theta_1, \dots, \theta_k)$. Then the statistic

is complete as long as the parameter space $\Theta$ contains an open set in $R^k$.

## 6.3 The Likelihood Principle

## 6.4 The Equivariance Principle

# 7 Point Estimation

## 7.1 Introduction

This chapter is divided into two parts. The first part deals with methods for finding estimators, and the second part deals with evaluating these estimators.

**Definition 7.1.1** A point estimator is any function $W(X_1, \dots, X_x)$ of a sample; that is, any statistic is a point estimator.

## 7.2 Methods of Finding Estimators

### 7.2.1 Method of Moments

### 7.2.2 Maximum Likelihood Estimators

### 7.2.3 Bayes Estimators

### 7.2.4 The EM Algorithm

## 7.3 Methos of Evaluating Estimators

### 7.3.1 Mean Squared Error

**Definition 7.3.1** The mean squared error of an estimator $W$ of a parameter $\theta$ is the function of $\theta$ defined by $\mathrm{E}_\theta(W - \theta)^2$.

### 7.3.2 Best Unbiased Estimators

**Definition 7.3.7** An estimator `$W^*$`

is a best unbiased estimator of $\tau(\theta)$ if it satisfies `$E_\theta W^* = \tau(\theta)$`

for all $\theta$ and, for any other estimator $W$ with $E_\theta W = \tau(\theta)$, we have `$\mathrm{Var}_\theta W^* \le \mathrm{Var}_\theta W$`

for all $\theta$. `$W^*$`

is also called a uniform minimum variance unbiased estimator (UMVUE) of $\tau(\theta)$.

**Theorem 7.3.9 (Cramer-Rao Inequality)**: Let $X_1, \dots, X_n$ be a sample with pdf `$f(x | \theta)$`

, and let $W(X) = W(X_1, \dots, X_n)$ be any estimator satisfying

and

Then

**Corollary 7.3.10 (Cramer-Rao Inequality, iid case)** If the assumptions of Theorem 7.3.9 are satisfied and, additionally, if $X_1, \dots, X_n$ are iid with pdf `\(f(x|\theta)\)`

, then

**Lemma 7.3.11** If `\(f(x|\theta)\)`

satisfies

(true for an exponential family), then

**Corollary 7.3.15 (Attainment Let $X_1, \dots, X_n$ be iid `\(f(x|\theta)\)`

, where `\(f(x|theta)\)`

satisfies the conditions of Cramer-Rao Theorem. Let `\(L(\theta|X)=\prod_{i=1}^n f(x_i|\theta)\)`

### 7.3.3 Sufficiency and Unbiasedness

**Theorem 7.3.17 (Rao-Blackwell)** Let W be any unbiased estimator of $\tau(\theta)$, and let T be a sufficient statistic for $\theta$. Define `\(\phi(T) = \mathrm{E}(W|T)\)`

. Then `$\mathrm{E}_\theta \phi(T) = \tau(\theta)$`

and `$\mathrm{Var}_\theta \phi(T) \le \mathrm{Var}_\theta W$`

for all $\theta$; that is, $\phi(T)$ is a uniformly better estimator of $\tau(\theta)$.

**Theorem 7.3.19** If W is a best unbiased estimator of $\tau(\theta)$, then W is unique.

**Theorem 7.3.20** If $\mathrm{E}_\theta W = \tau(\theta)$, W is the best unbiased estimator of $\tau(\theta)$ if and only if W is uncorrelated with all unbiased estimators of 0.

**Theorem 7.3.23** Let T be a complete sufficient statistic for a parameter $\theta$, and let $\phi(T)$ be any estimator based only on T. Then $\phi(T)$ is the unique best unbiased estimator of its expected value.

### 7.3.4 Loss Function Optimality

# 8 Hypothesis Testing

## 8.1 Introduction

**Definition 8.1.1** A hypothesis is a statement about a population parameter.

**Definition 8.1.2** The two complementary hypotheses in a hypothesis testing problem are called null hypothesis and the alternative hypothesis. They are denoted by $H_0$ and $H_1$, respectively.

**Definition 8.1.3** A hypothesis testing procedure or hypothesis test is a rule that specifies:

- For which sample values the decision is made to accept $H_0$ as true.
- For which sample values $H_0$ is rejected and $H_1$ is accepted as true.

The subset of the sample space for which $H_0$ will be rejected is called the *rejection region* or *critical region*. The complement of the rejection region is called the *acceptance region*.

## 8.2 Methods of Finding Tests

### 8.2.1 Likelihood Ratio Tests

**Definition 8.2.1** The likelihood ratio test statistic for testing $H_0 : \theta \in \Theta_0$ versus $H_1 : \theta \in \Theta_0^c$ is

A likelihood ratio test (LRT) is any test that has a rejection region of the form ${x: \lambda (x) \le c }$, where $c$ is any number satisfying $0 \le c \le 1$.

**Example 8.2.2 (Normal LRT)** Let $X_1, \dots, X_n$ be a random sample from a $n(\theta, 1)$ population. Consider testing $H_0 : \theta = \theta_0$ versus $\theta \neq \theta_0$.

### 8.2.2 Bayesian Tests

prior distribution -> posterior distribution

### 8.2.3 Union-Intersection and Intersection-Union Tests

**Example 8.2.9 (Acceptance sampling)**

## 8.3 Methods of Evaluating Tests

### 8.3.1 Error Probabilities and the Power Function

Type I Error and Type II

Accept $H_0$ | Reject $H_1$ | |
---|---|---|

Truth $H_0$ | Correct | Type I Error |

Truth $H_1$ | Type II Error | Correct |

**Definition 8.3.1** The power function of a hypothesis test with rejection region $R$ is the functiuon of $\theta$ defined by $\beta(\theta) = P_\theta(X \in R)$.

**Example 8.3.3 (Normal power function)** Let $X_1, \dots, X_n$ be a random sample from a $n(\theta, \sigma^2)$ population, $\sigma^2$ known. An LRT of $H_0 : \theta \le \theta_0$ versus $H_1 : \theta > \theta_0$ is a test that rejects $H_0$ if $(\bar{X} - \theta_0) / (\sigma / \sqrt{n}) > c$. The constant $c$ can be any positive number. The power function of this test is

**Example 8.3.4 (Coninuation of Example 8.3.3)**

Suppose that we want to have a maximum Type I Error probability of $0.1$, and a maximum Type II Error probability of $0.2$ if $\theta \ge \theta_0 + \sigma$. How to choose $c$ and $n$ to achieve these goals.

Goals: $\beta(\theta_0) = 0.1$ and $\beta(\theta_0 + \sigma) = 0.8$.

- By choosing $c = 1.28$, we achieve $\beta(\theta_0) = P(Z > 1.28) = 0.1$
- By choosing $n=5$, we achieve $\beta(\theta_0 + \sigma) = P(Z > 1.28 - \sqrt{n}) = 0.8$

**Definition 8.3.5** size $\alpha$ test if $\sup_{\theta \in \Theta_0} \beta(\theta) = \alpha$
**Definition 8.3.6** level $\alpha$ test if $\sup_{\theta \in \Theta_0} \beta(\theta) \le \alpha$

**Definition 8.3.9** unbiased power function.

### 8.3.2 Most Powerful Tests

**Definition 8.3.11** uniformly most powerful class(UMP)

Theorem 8.3.12 (Neyman-Pearson Lemma) Consider testing $H_0 : \theta = \theta_0$ versus $H_1 : \theta = \theta_1$, where the pdf or pmf corresponding to $\theta_i$ is $f(x |
\theta_i), i = 0, 1$, using a test with rejection region $R $that satisfies |

and

for some $k \ge 0$, and

Then,

- (Sufficiency) Any test that satisifes these two inequalities is a UMP level $\alpha$ test.
- (Necessity) If there exists a testing satisfying these two inequalities with $k > 0$, then every UMP level $\alpha$ test is a size $\alpha$ test and every UMP level $\alpha$ test satisfies except perhpas on a set A satisifying $P_{\theta_0}(X \in A) = P_{\theta_1}(X \in A) = 0$.

**Example 8.3.14 (UMP binomial test)** Let $X \sim binmomial(2, \theta)$. We want to test $H_0 : \theta = \frac{1}{2}$ versus $H_1 : \theta = \frac{3}{4}$.

**Example 8.3.15 (UMP normal test)**

### 8.3.3 Sizes of Union-Intersection and Intersection-Union Tests

### 8.3.4 p-Values

**Definition 8.3.26** A p-value $p(X)$ is a test statistic satisfying $0 \le p(x) \le 1$ for every sample point $x$. Small values of $p(X)$ give evidence that $H_1$ is true. A p-value is valid if, for every $\theta \in \Theta_0$ and every $0 \le \alpha \le 1$,