Statistical Inference 5~8

5 Properties of a Random Sample

5.1 Basic Concepts of Random Samples

5.2 Sums of Random Variables from a Random Sample

Lemma 5.2.5: and

Theorem 5.2.6:

  • $\mathrm{E}\bar{X} = \mu$,
  • $\mathrm{Var}\bar{X} = \sigma^2/n$,
  • $\mathrm{E}S^2 = \sigma^2$.

Theorem 5.2.7:

Example 5.2.8:Let $X_1, \dots, X_n$ be a random sample from a $n(\mu, \sigma^2)$ population. Then the mgf of the sample mean is

Thus, $\bar{X}$ has a $n(\mu, \sigma^2/n)$ distribution.

Another simple example is given by a $gamma(\alpha, \beta)$ random sameple. Here, we can also easily derive the distribution of the sample mean. The mgf of the sample mean is

which we recognize as the mgf of a $gamma(n\alpha, \beta/n)$, the distribution of $\bar{X}$.

Theorem 5.2.9 If $X$ and $Y$ are independent continuous random variables with pdfs $f_X(x)$ and $f_Y(y)$, then the pdf of $Z = X + Y$ is

Example 5.2.10 (Sum of Cauchy random variables)

Theorem 5.2.11 Suppose $X_1, \dots, X_n$ is a random sample from a pdf or pmf \(f(x|\theta)\), where

is a member of an exponential family. Define statistics $T_1, \dots, T_k$ by

If the set ${(w_1(\theta), w_2(\theta), \dots, w_k(\theta))}$ contains an open subset of $R^k$, then the distribution of $(T_1, \dots, T_k)$ is an exponential family of the form

Example 5.2.12 (Sum of Bernoulli random variables)

5.3 Sampling from the Normal Distribution

5.3.1 Properties of the Sample Mean and Variance

Theorem 5.3.1: Let $X_1, \dots, X_n$ be a random sample from a $n(\mu, \sigma^2)$ distribution, and let $\bar{X}$ and $S^2$. Then

  • $\bar{X}$ and $S^2$ are independent random variables,
  • $\bar{X}$ has a $n(\mu, \sigma^2/n)$ distribution,
  • $(n-1)S^2/\sigma^2$ has a chi squared distribution with $n - 1$ degrees of freedom.

Lemma 5.3.2:

  • If $Z$ is a $n(0,1)$ random variable, then $Z^2 \thicksim \chi_1^2$.
  • If $X_1, \dots, X_n$ are independent and $X_i \thicksim \chi_{p_i}^2$, then $X_1 + \dots + X_n \thicksim \chi_{p_1 + \dots + p_n}^2$.

5.4 Order Statistics

Theorem 5.4.3:

and

Theorem 5.4.4:

Theorem 5.4.6:

5.5 Convergence Concepts

5.5.1 Convergence in Probability

Definition 5.5.1:

or,

Theorem 5.5.2 (Weak Law of Large Numbers):

5.5.2 Almost Sure Convergence

Definition 5.5.6:

Theorem 5.5.9 (Strong Law of Large Numbers):

5.5.3 Convergence in Distribution

Definition 5.5.10:

Theorem 5.5.14 (Central Limit Theorem): Let $X_1, X_2, \dots $ be a sequence of iid random variables whose mgfs exist in neighborhood of 0 (that is , $M_{X_i}(t)$ exists for \(|t| < h\), for some positive $h$). Let $\mathrm{E}X_i = \mu$ and $\mathrm{Var}X_i = \sigma^2 > 0$. (Both $\mu$ and $\sigma^2$ are finite since the mgf exists.) Define $\bar{X_n} = (1/n)\sum_{i=1}^n X_i$. Let $G_n(x)$ denote the cdf of $\sqrt{n}(\bar{X_n}-\mu)/\sigma$. Then, for any $x$, $-\infty < x < \infty$,

5.5.4 The Delta Method

5.6 Generating a Random Sample

5.6.1 Direct Methods

5.6.2 Indirect Methods

5.6.3 The Accept/Reject Algorithm

5.6.(4) The MCMC methods

Gibbs Sampler

Metropolis Algorithm

6 Principles of Data Reduction

6.1 Introduction

Three principles of data reduction:

  • The Sufficiency Principle
  • The Likelihood Principle
  • The Equivariance Principle

6.2 The Sufficiency Principle

6.2.1 Sufficient Statistics

Definition 6.2.1: A statistic $T(X)$ is a sufficient statistic of $\theta$ if the conditional distribution of the sample $X$ given the value of $T(X)$ does not depend on $\theta$.

Theorem 6.2.2: If \(p(x | \theta)\) is the joint pdf or pmf of $X$ and \(q(t | \theta)\) is the pdf or pmf of $T(X)$, then $T(X)$ is a sufficient statistic for $\theta$ if, for every $x$ in the sample space, the ratio \(p(x | \theta)/q(T(X) | \theta)\) is constant as a function of $\theta

Theorem 6.2.6 (Factorization Theorem): Let \(f(x | \theta)\) denote the joint pdf or pmf of a sample $X$. A statistic $T(X)$ is a sufficient statistic for $\theta$ if and only if there exist functions \(g(x | \theta)\) and $h(x)$ such that, for all sample points $x$ and all parameter points $\theta$,

6.2.2 Minimal Sufficient Statistics

6.2.3 Ancillary Statistics

Definition 6.2.16: A statistic $S(X)$ whose distribution does not depend on the parameter $\theta$ is called an ancillary statistic.

6.2.4 Sufficient, Ancillary, and Complete Statistics

Example 6.2.20 (Ancillary precision)

Theorem 6.2.24 (Basu’s Theorem): If T(X) is a complete and minimal sufficient statistic is independent of every ancillary statistic.

Theorem 6.2.25 (Complete statistic in the exponential family): Let $X_1, \dots, X_n$ be iid observations from an exponential family with pdf or pmf of the form

where $\theta = (\theta_1, \dots, \theta_k)$. Then the statistic

is complete as long as the parameter space $\Theta$ contains an open set in $R^k$.

6.3 The Likelihood Principle

6.4 The Equivariance Principle

7 Point Estimation

7.1 Introduction

This chapter is divided into two parts. The first part deals with methods for finding estimators, and the second part deals with evaluating these estimators.

Definition 7.1.1 A point estimator is any function $W(X_1, \dots, X_x)$ of a sample; that is, any statistic is a point estimator.

7.2 Methods of Finding Estimators

7.2.1 Method of Moments

7.2.2 Maximum Likelihood Estimators

7.2.3 Bayes Estimators

7.2.4 The EM Algorithm

7.3 Methos of Evaluating Estimators

7.3.1 Mean Squared Error

Definition 7.3.1 The mean squared error of an estimator $W$ of a parameter $\theta$ is the function of $\theta$ defined by $\mathrm{E}_\theta(W - \theta)^2$.

7.3.2 Best Unbiased Estimators

Definition 7.3.7 An estimator $W^*$ is a best unbiased estimator of $\tau(\theta)$ if it satisfies $E_\theta W^* = \tau(\theta)$ for all $\theta$ and, for any other estimator $W$ with $E_\theta W = \tau(\theta)$, we have $\mathrm{Var}_\theta W^* \le \mathrm{Var}_\theta W$ for all $\theta$. $W^*$ is also called a uniform minimum variance unbiased estimator (UMVUE) of $\tau(\theta)$.

Theorem 7.3.9 (Cramer-Rao Inequality): Let $X_1, \dots, X_n$ be a sample with pdf $f(x | \theta)$, and let $W(X) = W(X_1, \dots, X_n)$ be any estimator satisfying

and

Then

Corollary 7.3.10 (Cramer-Rao Inequality, iid case) If the assumptions of Theorem 7.3.9 are satisfied and, additionally, if $X_1, \dots, X_n$ are iid with pdf \(f(x|\theta)\), then

Lemma 7.3.11 If \(f(x|\theta)\) satisfies

(true for an exponential family), then

**Corollary 7.3.15 (Attainment Let $X_1, \dots, X_n$ be iid \(f(x|\theta)\), where \(f(x|theta)\) satisfies the conditions of Cramer-Rao Theorem. Let \(L(\theta|X)=\prod_{i=1}^n f(x_i|\theta)\)

7.3.3 Sufficiency and Unbiasedness

Theorem 7.3.17 (Rao-Blackwell) Let W be any unbiased estimator of $\tau(\theta)$, and let T be a sufficient statistic for $\theta$. Define \(\phi(T) = \mathrm{E}(W|T)\). Then $\mathrm{E}_\theta \phi(T) = \tau(\theta)$ and $\mathrm{Var}_\theta \phi(T) \le \mathrm{Var}_\theta W$ for all $\theta$; that is, $\phi(T)$ is a uniformly better estimator of $\tau(\theta)$.

Theorem 7.3.19 If W is a best unbiased estimator of $\tau(\theta)$, then W is unique.

Theorem 7.3.20 If $\mathrm{E}_\theta W = \tau(\theta)$, W is the best unbiased estimator of $\tau(\theta)$ if and only if W is uncorrelated with all unbiased estimators of 0.

Theorem 7.3.23 Let T be a complete sufficient statistic for a parameter $\theta$, and let $\phi(T)$ be any estimator based only on T. Then $\phi(T)$ is the unique best unbiased estimator of its expected value.

7.3.4 Loss Function Optimality


layout: post title: “Statistical Inference 5~8” date: 2016-01-11 11:51:00 categories: Math —

Statistical Inference 5~8

5 Properties of a Random Sample

5.1 Basic Concepts of Random Samples

5.2 Sums of Random Variables from a Random Sample

Lemma 5.2.5: and

Theorem 5.2.6:

  • $\mathrm{E}\bar{X} = \mu$,
  • $\mathrm{Var}\bar{X} = \sigma^2/n$,
  • $\mathrm{E}S^2 = \sigma^2$.

Theorem 5.2.7:

Example 5.2.8:Let $X_1, \dots, X_n$ be a random sample from a $n(\mu, \sigma^2)$ population. Then the mgf of the sample mean is

Thus, $\bar{X}$ has a $n(\mu, \sigma^2/n)$ distribution.

Another simple example is given by a $gamma(\alpha, \beta)$ random sameple. Here, we can also easily derive the distribution of the sample mean. The mgf of the sample mean is

which we recognize as the mgf of a $gamma(n\alpha, \beta/n)$, the distribution of $\bar{X}$.

Theorem 5.2.9 If $X$ and $Y$ are independent continuous random variables with pdfs $f_X(x)$ and $f_Y(y)$, then the pdf of $Z = X + Y$ is

Example 5.2.10 (Sum of Cauchy random variables)

Theorem 5.2.11 Suppose $X_1, \dots, X_n$ is a random sample from a pdf or pmf \(f(x|\theta)\), where

is a member of an exponential family. Define statistics $T_1, \dots, T_k$ by

If the set ${(w_1(\theta), w_2(\theta), \dots, w_k(\theta))}$ contains an open subset of $R^k$, then the distribution of $(T_1, \dots, T_k)$ is an exponential family of the form

Example 5.2.12 (Sum of Bernoulli random variables)

5.3 Sampling from the Normal Distribution

5.3.1 Properties of the Sample Mean and Variance

Theorem 5.3.1: Let $X_1, \dots, X_n$ be a random sample from a $n(\mu, \sigma^2)$ distribution, and let $\bar{X}$ and $S^2$. Then

  • $\bar{X}$ and $S^2$ are independent random variables,
  • $\bar{X}$ has a $n(\mu, \sigma^2/n)$ distribution,
  • $(n-1)S^2/\sigma^2$ has a chi squared distribution with $n - 1$ degrees of freedom.

Lemma 5.3.2:

  • If $Z$ is a $n(0,1)$ random variable, then $Z^2 \thicksim \chi_1^2$.
  • If $X_1, \dots, X_n$ are independent and $X_i \thicksim \chi_{p_i}^2$, then $X_1 + \dots + X_n \thicksim \chi_{p_1 + \dots + p_n}^2$.

5.4 Order Statistics

Theorem 5.4.3:

and

Theorem 5.4.4:

Theorem 5.4.6:

5.5 Convergence Concepts

5.5.1 Convergence in Probability

Definition 5.5.1:

or,

Theorem 5.5.2 (Weak Law of Large Numbers):

5.5.2 Almost Sure Convergence

Definition 5.5.6:

Theorem 5.5.9 (Strong Law of Large Numbers):

5.5.3 Convergence in Distribution

Definition 5.5.10:

Theorem 5.5.14 (Central Limit Theorem): Let $X_1, X_2, \dots $ be a sequence of iid random variables whose mgfs exist in neighborhood of 0 (that is , $M_{X_i}(t)$ exists for \(|t| < h\), for some positive $h$). Let $\mathrm{E}X_i = \mu$ and $\mathrm{Var}X_i = \sigma^2 > 0$. (Both $\mu$ and $\sigma^2$ are finite since the mgf exists.) Define $\bar{X_n} = (1/n)\sum_{i=1}^n X_i$. Let $G_n(x)$ denote the cdf of $\sqrt{n}(\bar{X_n}-\mu)/\sigma$. Then, for any $x$, $-\infty < x < \infty$,

5.5.4 The Delta Method

5.6 Generating a Random Sample

5.6.1 Direct Methods

5.6.2 Indirect Methods

5.6.3 The Accept/Reject Algorithm

5.6.(4) The MCMC methods

Gibbs Sampler

Metropolis Algorithm

6 Principles of Data Reduction

6.1 Introduction

Three principles of data reduction:

  • The Sufficiency Principle
  • The Likelihood Principle
  • The Equivariance Principle

6.2 The Sufficiency Principle

6.2.1 Sufficient Statistics

Definition 6.2.1: A statistic $T(X)$ is a sufficient statistic of $\theta$ if the conditional distribution of the sample $X$ given the value of $T(X)$ does not depend on $\theta$.

Theorem 6.2.2: If \(p(x | \theta)\) is the joint pdf or pmf of $X$ and \(q(t | \theta)\) is the pdf or pmf of $T(X)$, then $T(X)$ is a sufficient statistic for $\theta$ if, for every $x$ in the sample space, the ratio \(p(x | \theta)/q(T(X) | \theta)\) is constant as a function of $\theta

Theorem 6.2.6 (Factorization Theorem): Let \(f(x | \theta)\) denote the joint pdf or pmf of a sample $X$. A statistic $T(X)$ is a sufficient statistic for $\theta$ if and only if there exist functions \(g(x | \theta)\) and $h(x)$ such that, for all sample points $x$ and all parameter points $\theta$,

6.2.2 Minimal Sufficient Statistics

6.2.3 Ancillary Statistics

Definition 6.2.16: A statistic $S(X)$ whose distribution does not depend on the parameter $\theta$ is called an ancillary statistic.

6.2.4 Sufficient, Ancillary, and Complete Statistics

Example 6.2.20 (Ancillary precision)

Theorem 6.2.24 (Basu’s Theorem): If T(X) is a complete and minimal sufficient statistic is independent of every ancillary statistic.

Theorem 6.2.25 (Complete statistic in the exponential family): Let $X_1, \dots, X_n$ be iid observations from an exponential family with pdf or pmf of the form

where $\theta = (\theta_1, \dots, \theta_k)$. Then the statistic

is complete as long as the parameter space $\Theta$ contains an open set in $R^k$.

6.3 The Likelihood Principle

6.4 The Equivariance Principle

7 Point Estimation

7.1 Introduction

This chapter is divided into two parts. The first part deals with methods for finding estimators, and the second part deals with evaluating these estimators.

Definition 7.1.1 A point estimator is any function $W(X_1, \dots, X_x)$ of a sample; that is, any statistic is a point estimator.

7.2 Methods of Finding Estimators

7.2.1 Method of Moments

7.2.2 Maximum Likelihood Estimators

7.2.3 Bayes Estimators

7.2.4 The EM Algorithm

7.3 Methos of Evaluating Estimators

7.3.1 Mean Squared Error

Definition 7.3.1 The mean squared error of an estimator $W$ of a parameter $\theta$ is the function of $\theta$ defined by $\mathrm{E}_\theta(W - \theta)^2$.

7.3.2 Best Unbiased Estimators

Definition 7.3.7 An estimator $W^*$ is a best unbiased estimator of $\tau(\theta)$ if it satisfies $E_\theta W^* = \tau(\theta)$ for all $\theta$ and, for any other estimator $W$ with $E_\theta W = \tau(\theta)$, we have $\mathrm{Var}_\theta W^* \le \mathrm{Var}_\theta W$ for all $\theta$. $W^*$ is also called a uniform minimum variance unbiased estimator (UMVUE) of $\tau(\theta)$.

Theorem 7.3.9 (Cramer-Rao Inequality): Let $X_1, \dots, X_n$ be a sample with pdf $f(x | \theta)$, and let $W(X) = W(X_1, \dots, X_n)$ be any estimator satisfying

and

Then

Corollary 7.3.10 (Cramer-Rao Inequality, iid case) If the assumptions of Theorem 7.3.9 are satisfied and, additionally, if $X_1, \dots, X_n$ are iid with pdf \(f(x|\theta)\), then

Lemma 7.3.11 If \(f(x|\theta)\) satisfies

(true for an exponential family), then

**Corollary 7.3.15 (Attainment Let $X_1, \dots, X_n$ be iid \(f(x|\theta)\), where \(f(x|theta)\) satisfies the conditions of Cramer-Rao Theorem. Let \(L(\theta|X)=\prod_{i=1}^n f(x_i|\theta)\)

7.3.3 Sufficiency and Unbiasedness

Theorem 7.3.17 (Rao-Blackwell) Let W be any unbiased estimator of $\tau(\theta)$, and let T be a sufficient statistic for $\theta$. Define \(\phi(T) = \mathrm{E}(W|T)\). Then $\mathrm{E}_\theta \phi(T) = \tau(\theta)$ and $\mathrm{Var}_\theta \phi(T) \le \mathrm{Var}_\theta W$ for all $\theta$; that is, $\phi(T)$ is a uniformly better estimator of $\tau(\theta)$.

Theorem 7.3.19 If W is a best unbiased estimator of $\tau(\theta)$, then W is unique.

Theorem 7.3.20 If $\mathrm{E}_\theta W = \tau(\theta)$, W is the best unbiased estimator of $\tau(\theta)$ if and only if W is uncorrelated with all unbiased estimators of 0.

Theorem 7.3.23 Let T be a complete sufficient statistic for a parameter $\theta$, and let $\phi(T)$ be any estimator based only on T. Then $\phi(T)$ is the unique best unbiased estimator of its expected value.

7.3.4 Loss Function Optimality

8 Hypothesis Testing

8.1 Introduction

Definition 8.1.1 A hypothesis is a statement about a population parameter.

Definition 8.1.2 The two complementary hypotheses in a hypothesis testing problem are called null hypothesis and the alternative hypothesis. They are denoted by $H_0$ and $H_1$, respectively.

Definition 8.1.3 A hypothesis testing procedure or hypothesis test is a rule that specifies:

  • For which sample values the decision is made to accept $H_0$ as true.
  • For which sample values $H_0$ is rejected and $H_1$ is accepted as true.

The subset of the sample space for which $H_0$ will be rejected is called the rejection region or critical region. The complement of the rejection region is called the acceptance region.

8.2 Methods of Finding Tests

8.2.1 Likelihood Ratio Tests

Definition 8.2.1 The likelihood ratio test statistic for testing $H_0 : \theta \in \Theta_0$ versus $H_1 : \theta \in \Theta_0^c$ is

A likelihood ratio test (LRT) is any test that has a rejection region of the form ${x: \lambda (x) \le c }$, where $c$ is any number satisfying $0 \le c \le 1$.

Example 8.2.2 (Normal LRT) Let $X_1, \dots, X_n$ be a random sample from a $n(\theta, 1)$ population. Consider testing $H_0 : \theta = \theta_0$ versus $\theta \neq \theta_0$.

8.2.2 Bayesian Tests

prior distribution -> posterior distribution

8.2.3 Union-Intersection and Intersection-Union Tests

Example 8.2.9 (Acceptance sampling)

8.3 Methods of Evaluating Tests

8.3.1 Error Probabilities and the Power Function

Type I Error and Type II

  Accept $H_0$ Reject $H_1$
Truth $H_0$ Correct Type I Error
Truth $H_1$ Type II Error Correct

Definition 8.3.1 The power function of a hypothesis test with rejection region $R$ is the functiuon of $\theta$ defined by $\beta(\theta) = P_\theta(X \in R)$.

Example 8.3.3 (Normal power function) Let $X_1, \dots, X_n$ be a random sample from a $n(\theta, \sigma^2)$ population, $\sigma^2$ known. An LRT of $H_0 : \theta \le \theta_0$ versus $H_1 : \theta > \theta_0$ is a test that rejects $H_0$ if $(\bar{X} - \theta_0) / (\sigma / \sqrt{n}) > c$. The constant $c$ can be any positive number. The power function of this test is

Example 8.3.4 (Coninuation of Example 8.3.3)

Suppose that we want to have a maximum Type I Error probability of $0.1$, and a maximum Type II Error probability of $0.2$ if $\theta \ge \theta_0 + \sigma$. How to choose $c$ and $n$ to achieve these goals.

Goals: $\beta(\theta_0) = 0.1$ and $\beta(\theta_0 + \sigma) = 0.8$.

  • By choosing $c = 1.28$, we achieve $\beta(\theta_0) = P(Z > 1.28) = 0.1$
  • By choosing $n=5$, we achieve $\beta(\theta_0 + \sigma) = P(Z > 1.28 - \sqrt{n}) = 0.8$

Definition 8.3.5 size $\alpha$ test if $\sup_{\theta \in \Theta_0} \beta(\theta) = \alpha$ Definition 8.3.6 level $\alpha$ test if $\sup_{\theta \in \Theta_0} \beta(\theta) \le \alpha$

Definition 8.3.9 unbiased power function.

8.3.2 Most Powerful Tests

Definition 8.3.11 uniformly most powerful class(UMP)

Theorem 8.3.12 (Neyman-Pearson Lemma) Consider testing $H_0 : \theta = \theta_0$ versus $H_1 : \theta = \theta_1$, where the pdf or pmf corresponding to $\theta_i$ is $f(x \theta_i), i = 0, 1$, using a test with rejection region $R $that satisfies

and

for some $k \ge 0$, and

Then,

  • (Sufficiency) Any test that satisifes these two inequalities is a UMP level $\alpha$ test.
  • (Necessity) If there exists a testing satisfying these two inequalities with $k > 0$, then every UMP level $\alpha$ test is a size $\alpha$ test and every UMP level $\alpha$ test satisfies except perhpas on a set A satisifying $P_{\theta_0}(X \in A) = P_{\theta_1}(X \in A) = 0$.

Example 8.3.14 (UMP binomial test) Let $X \sim binmomial(2, \theta)$. We want to test $H_0 : \theta = \frac{1}{2}$ versus $H_1 : \theta = \frac{3}{4}$.

Example 8.3.15 (UMP normal test)

8.3.3 Sizes of Union-Intersection and Intersection-Union Tests

8.3.4 p-Values

Definition 8.3.26 A p-value $p(X)$ is a test statistic satisfying $0 \le p(x) \le 1$ for every sample point $x$. Small values of $p(X)$ give evidence that $H_1$ is true. A p-value is valid if, for every $\theta \in \Theta_0$ and every $0 \le \alpha \le 1$,

8.3.5 Loss Function Optimality