ผลต่างระหว่างรุ่นของ "Probstat/notes/hypothesis testing"

จาก Theory Wiki
ไปยังการนำทาง ไปยังการค้นหา
 
(ไม่แสดง 49 รุ่นระหว่างกลางโดยผู้ใช้คนเดียวกัน)
แถว 8: แถว 8:
 
You friend gives you a coin and claims that this coin is special.  (It is unclear what special is about this coin.)
 
You friend gives you a coin and claims that this coin is special.  (It is unclear what special is about this coin.)
  
You want to prove it so you toss is for 20 times.
+
You want to prove it so you toss the coin for 20 times.
  
 
If you get 10 heads, do you believe your friend that the coin is special?
 
If you get 10 heads, do you believe your friend that the coin is special?
แถว 15: แถว 15:
  
 
How about 15 heads?  How about 18 heads?  How about 20 heads?
 
How about 15 heads?  How about 18 heads?  How about 20 heads?
 +
 +
Let's consider each case.
 +
 +
'''10:''' What is the probability that a normal coin turns up at least 10 heads from 20 coin tosses?  58%  So that this does not show anything special about this coin.
 +
 +
'''12:''' What is the probability that a normal coin turns up at least 12 heads from 20 coin tosses?  25% So this coin might be a bit special?
 +
 +
'''15:''' What is the probability that a normal coin turns up at least 15 heads from 20 coin tosses?  2% This coin is special or I am very lucky.
 +
 +
'''18:''' What is the probability that a normal coin turns up at least 18 heads from 20 coin tosses?  0.02% This coin is special or I am extremely lucky.
 +
 +
'''20:''' What is the probability that a normal coin turns up at least 20 heads from 20 coin tosses? about 1 in a million.  I definitely should believe that this coin is special.
 +
 +
Let's go back to our reasoning in the previous coin example.
 +
 +
We want to reject some belief, i.e., that the coin is normal.  In this case, that the normality is in the probability of turning up head.  So the hypothesis that we want to test (or reject) is the following:
 +
 +
: '''H<sub>0</sub>''': "the probability that the coin turns up head is 0.5".
 +
 +
If the experimental result contradicts this hypothesis, we can reject it.  However, note that it is impossible to completely contradict this hypothesis, even with a result that shows 1000 heads in 1000 coin tosses does not contradict this hypothesis because there is non-zero probability to obtain that result.  Therefore, we are happy with a result which is "unlikely" enough.  The degree of "unlikely" matters in our confidence in rejecting the hypothesis. 
 +
 +
Consider this criteria:
 +
 +
: We shall reject <math>H_0</math>, if after tossing the coin for 20 times, we get at least 18 heads.
 +
 +
We know that if the hypothesis <math>H_0</math> is true, the probability that we reject it is at most 0.02%.  Therefore, if we reject it under this assumption, it is extremely unlikely because of chance.  The probability that we reject <math>H_0</math> when it is actually true is the '''significant level''' of the test; in this case, the level of significant of the test is <math>\alpha = 0.0002</math>.  (Note that if the significant level is very small, it means that if we reject <math>H_0</math>, it is very significant.)
  
 
== The null hypothesis ==
 
== The null hypothesis ==
 +
: ''See also the [http://en.wikipedia.org/wiki/Null_hypothesis wikipedia article].''
 +
 +
When we perform hypothesis testing, we usually start with a hypothesis that describe a "normal" situation, usually referred to as the '''null hypothesis'''.  This hypothesis is there so that we can ''accept'' or ''reject'' it with experimental data. 
 +
 +
In the previous example, the null hypothesis specifies that the head probability of the coin is 1/2.  Let's consider another example.  Suppose that we have know that on average students will get 80 points from the final exam for the probability class.  In this semester, we try something different.  We add another review section for each week and we would like to test if this review section improves the test score.  Let <math>\mu</math> denote the average score of students taking review sections.  Our null hypothesis is
 +
 +
: '''H<sub>0</sub>:''' <math>\mu \leq 80</math>,
 +
 +
which says that the review sections do not improve the score.
 +
 +
After we set up the null hypothesis, we should create a criteria for accept or reject the null hypothesis.  For example, we may say that we shall reject the null hypothesis that the review section helps if the average score <math>\bar{X}</math> of <math>n</math> students is greater than 90.
  
 
=== Error types ===
 
=== Error types ===
 +
: ''See also [http://en.wikipedia.org/wiki/Type_I_and_type_II_errors wikipedia article on Type I and II errors]''
 +
 +
For any criteria for testing a null hypothesis, there usually are chances that we make a mistake.  There are two types of errors:
 +
 +
* '''Type I error''' --- occurs when the hypothesis is true, but we reject it.
 +
* '''Type II error''' --- occurs when the hypothesis is false, but we accept it.
 +
 +
The table below lists all four cases.  (Shamelessly taken from [http://en.wikipedia.org/wiki/Type_I_and_type_II_errors].)
 +
 +
{| class="wikitable"
 +
!
 +
! Null hypothesis (''H''<sub>0</sub>) is true
 +
! Null hypothesis (''H''<sub>0</sub>) is false
 +
|-
 +
! Reject null hypothesis
 +
| align="center"| Type&nbsp;I error<br />False positive
 +
| align="center"| Correct outcome<br />Negative
 +
|-
 +
! Fail to reject null hypothesis
 +
| align="center"| Correct outcome<br />Positive
 +
| align="center"| Type&nbsp;II error<br />False negative
 +
|}
 +
 +
When we perform statistically hypothesis testing, we would like to check if the hypothesis is consistent with the observed data.  Therefore we shall only reject the hypothesis if the data is far inconsistent with the hypothesis, i.e., we shall reject <math>H_0</math> if the data is very improbable if we assume that <math>H_0</math> is true.  More specifically, we want the test to reject <math>H_0</math>, when <math>H_0</math> is true, with the probability at most some small value <math>\alpha</math>.  Common values for <math>\alpha</math> is 0.1 (10%), 0.05 (5%), 0.01 (1%), or even 0.005 (0.5%).  The value <math>\alpha</math> is called the level of significance of the test.  Note that this value <math>\alpha</math> is also the the type I error of the test.
 +
 +
'''EX1''': Suppose that you know that average height of Kasetsart University students is 170cm with variance of 10.  You look at students of the faculty of engineering and think that maybe these students do not share the average height of KU students.  Let <math>\mu</math> denote the mean of the heights of engineering students.  We assume that the heights are normally distributed and the variance is the same as KU students' heights, i.e., <math>\sigma^2 = 150</math>.  Now, our null hypothesis is:
 +
 +
: '''H<sub>0</sub>:''' &nbsp;&nbsp;&nbsp; <math>\mu = 170</math>.
 +
 +
We shall take a sample of size 10.  Let's design a test criteria with level of significance <math>\alpha=0.01</math>.
 +
 +
Our test will consider the sample mean <math>\bar{X}=(X_1+\cdots+X_{10})/10</math> and will reject <math>H_0</math> if <math>\bar{X}</math> is far from 170.  We will have to figure out how far the mean from 170 that we need.
 +
 +
First recall that the sample mean of a normal population is normally distributed with mean <math>\mu</math> and variance <math>\sigma^2/10=15</math>.  Therefore, the statistic
 +
 +
<center>
 +
<math>
 +
\frac{\bar{X} - \mu}{\sqrt{15}}
 +
</math>
 +
</center>
 +
 +
is unit normal.  Let's refer to this statistic as <math>Z = (\bar{X} -\mu)/\sqrt{15}</math>.
 +
 +
If we look at the standard normal table, we find out that
 +
 +
<center>
 +
<math>P\{|Z| > 2.58\} < 0.01</math>
 +
</center>
 +
 +
After some calculation, if we reject <math>H_0</math> when
 +
 +
<center>
 +
<math>
 +
|\bar{X} - 170| > 2.58\cdot \sqrt{15} = 9.992,
 +
</math>
 +
</center>
 +
 +
our test will have the level of significance of 0.01 as required.
 +
 +
'''Notes:''' In this case, the population that we want to test <math>H_0</math> is engineering students, not KU students. 
 +
 +
=== Concerning the type II errors ===
 +
From the previous example, if actually the mean <math>\mu = 170.001</math>, the hypothesis is incorrect but it will be extremely hard to reject <math>H_0</math>.  Therefore, in this case, the type II error will be very high. From this example, we can see that the type II error depends on how far the actual parameter from the one in the null hypothesis.
 +
 +
'''EX2:''' Consider the average height example. If the actual mean <math>\mu=190</math> and we use the same test criteria, what is the type II error rate?
 +
 +
We incorrectly accept <math>H_0</math> if <math>|\bar{X} - 170| \leq 9.992</math>.  To simplify our analysis, let's approximate the error by assuming that we accept the null hypothesis when <math>\bar{X} \leq 179.992</math>.  (This will cause a very small error in our probability calculation.)
 +
 +
Since the population mean <math>\mu = 190</math> and the population variance <math>\sigma^2=150</math>, the probability that this happens is
 +
 +
<center>
 +
<math>
 +
\begin{array}{rcl}
 +
P\{\bar{X} \leq 179.992\}
 +
&=& P\{\bar{X}-190 \leq -10.008\} \\
 +
&=& P\{(\bar{X}-190)/(\sigma/\sqrt{n}) \leq -10.008/(\sigma/\sqrt{n})\} \\
 +
&=& P\{(\bar{X}-190)/(\sigma/\sqrt{n}) \leq -10.008/(\sigma/\sqrt{n})\} \\
 +
&=& P\{(\bar{X}-190)/(\sigma/\sqrt{n}) \leq -2.584 \} \\
 +
\end{array}
 +
</math>
 +
</center>
 +
 +
This is equal to 0.5 - 0.498 = 0.002, because <math>(\bar{X}-190)/(\sigma/\sqrt{n})</math> is unit normal.  (See the table [https://en.wikipedia.org/wiki/Standard_normal_table here].)
  
 
== Tests concerning the mean of a normal population ==
 
== Tests concerning the mean of a normal population ==
 +
In this section, we discuss how to design a test for the mean of a normal population.  That is, we want to test the null hypothesis
 +
 +
<center>
 +
<math>H_0: \mu = \mu_0,</math>
 +
</center>
 +
 +
where <math>\mu_0</math> is some specified constant.  We usually test <math>H_0</math> against an alternative hypothesis <math>H_1:\mu\neq \mu_0</math>.
 +
 +
=== When <math>\sigma^2</math> is known  ===
 +
When the variance of the population <math>\sigma^2</math> is known, we can design a test with specified level of significance using the calculation as in our previous example.
 +
 +
: ''Details will be added later.''
 +
 +
=== When <math>\sigma^2</math> is unknown  ===
 +
The same reasoning we use when deriving the confidence interval of the sample mean when the population variance <math>\sigma^2</math> is unknown also works here.  Recall that if we take a sample of size <math>n</math>: <math>X_1,X_2,\ldots,X_n</math>, we can compute the statistics
 +
 +
<center>
 +
<math>\bar{X} = \frac{X_1+X_2+\cdots+X_n}{n},</math>
 +
</center>
 +
 +
and
 +
 +
<center>
 +
<math>S^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}.</math>
 +
</center>
 +
 +
Then, we have that
 +
 +
<center>
 +
<math>\frac{\bar{X}-\mu}{S/\sqrt{n}} \sim t_{n-1}.</math>
 +
</center>
 +
 +
Given a complete description of the distribution of <math>\bar{X}</math>, we can design a hypothesis testing procedure as required.
 +
 +
'''EX3''': Suppose that you know that average height of Kasetsart University students is 170cm.  You look at students of the faculty of engineering and think that maybe these students do not share the average height of KU students.  Let <math>\mu</math> denote the mean of the heights of engineering students.  We shall take a sample of size 10.  Let's design a test criteria with level of significance <math>\alpha=0.01</math> for the following hypothesis
 +
 +
: '''H<sub>0</sub>:''' &nbsp;&nbsp;&nbsp; <math>\mu = 170</math>.
 +
 +
Let
 +
 +
<center>
 +
<math>T = \frac{\bar{X}-\mu}{S/\sqrt{n}}</math>
 +
</center>
 +
 +
Since the sample size is 10, we have that <math>T\sim t_{9}</math>.  Therefore, we shall look at the ''t''-distribution table for 9 degrees of freedom. (See [https://en.wikipedia.org/wiki/Student%27s_t-distribution])  We have that
 +
 +
<center>
 +
<math>P\{|T| > 3.250\} < 0.01</math>.
 +
</center>
 +
 +
This implies that we shall reject <math>H_0</math> when
 +
 +
<center>
 +
<math>\left|\frac{\bar{X}-170}{S/\sqrt{n}}\right| > 3.250.</math>
 +
</center>
  
 
== Testing for the equivalence of the means ==
 
== Testing for the equivalence of the means ==
 +
In this part, we have two populations.  The first population is normal with mean <math>\mu_x</math> with variance <math>\sigma_x^2</math> and the second population is also normal with mean <math>\mu_y</math> with variance <math>\sigma_y^2</math>.  We want to test
 +
 +
: '''H<sub>0</sub>:''' &nbsp;&nbsp;&nbsp; <math>\mu_x = \mu_y</math>.
 +
 +
=== When the variances of both populations are known ===
 +
In this section, we assume that the variances <math>\sigma_x^2</math> and <math>\sigma_y^2</math> are known. 
 +
 +
Suppose that we take samples <math>X_1,X_2,\ldots,X_n</math> from the first population and <math>Y_1,Y_2,\ldots,Y_m</math> from the second population.  We shall compute the sample means <math>\bar{X}</math> and <math>\bar{Y}</math>. 
 +
 +
Note that if <math>\mu_x = \mu_y</math>, then <math>\mu_x - \mu_y = 0</math>.  Therefore we can write the hypothesis as
 +
 +
: '''H<sub>0</sub>:''' &nbsp;&nbsp;&nbsp; <math>\mu_x - \mu_y = 0</math>.
 +
 +
This new representation of the hypothesis suggests that we should reject the hypothesis if
 +
 +
<center>
 +
<math>|\bar{X} - \bar{Y}|</math>
 +
</center>
 +
 +
is large. 
 +
 +
Recall that
 +
 +
<center>
 +
<math>\bar{X} - \bar{Y} \sim Normal(\mu_x - \mu_y, \frac{\sigma_x^2}{n}+ \frac{\sigma_y^2}{m})</math>.
 +
</center>
 +
 +
Therefore, we have that  if <math>H_0</math> is true, the random variable
 +
 +
<center>
 +
<math>\frac{\bar{X} - \bar{Y}}{\sqrt{\frac{\sigma_x^2}{n}+ \frac{\sigma_y^2}{m}}}</math>
 +
</center>
 +
 +
is unit normal.  Hence, a criteria for the hypothesis testing can be calculated using the standard normal table.
 +
 +
=== When the variances are not known ===
 +
 +
: ''To be added later...''

รุ่นแก้ไขปัจจุบันเมื่อ 07:06, 8 ธันวาคม 2557

This is part of probstat.

Since it is very hard to obtain complete information of the population, we usually end up with a collection of much smaller sample data. A question arises: how can we be confident if the conclusion we make from the collected sample is correct or it is only by chance?

This section tries to answer this question.

A motivating example

You friend gives you a coin and claims that this coin is special. (It is unclear what special is about this coin.)

You want to prove it so you toss the coin for 20 times.

If you get 10 heads, do you believe your friend that the coin is special?

If you get 12 heads, do you believe your friend that the coin is special?

How about 15 heads? How about 18 heads? How about 20 heads?

Let's consider each case.

10: What is the probability that a normal coin turns up at least 10 heads from 20 coin tosses? 58% So that this does not show anything special about this coin.

12: What is the probability that a normal coin turns up at least 12 heads from 20 coin tosses? 25% So this coin might be a bit special?

15: What is the probability that a normal coin turns up at least 15 heads from 20 coin tosses? 2% This coin is special or I am very lucky.

18: What is the probability that a normal coin turns up at least 18 heads from 20 coin tosses? 0.02% This coin is special or I am extremely lucky.

20: What is the probability that a normal coin turns up at least 20 heads from 20 coin tosses? about 1 in a million. I definitely should believe that this coin is special.

Let's go back to our reasoning in the previous coin example.

We want to reject some belief, i.e., that the coin is normal. In this case, that the normality is in the probability of turning up head. So the hypothesis that we want to test (or reject) is the following:

H0: "the probability that the coin turns up head is 0.5".

If the experimental result contradicts this hypothesis, we can reject it. However, note that it is impossible to completely contradict this hypothesis, even with a result that shows 1000 heads in 1000 coin tosses does not contradict this hypothesis because there is non-zero probability to obtain that result. Therefore, we are happy with a result which is "unlikely" enough. The degree of "unlikely" matters in our confidence in rejecting the hypothesis.

Consider this criteria:

We shall reject , if after tossing the coin for 20 times, we get at least 18 heads.

We know that if the hypothesis is true, the probability that we reject it is at most 0.02%. Therefore, if we reject it under this assumption, it is extremely unlikely because of chance. The probability that we reject when it is actually true is the significant level of the test; in this case, the level of significant of the test is . (Note that if the significant level is very small, it means that if we reject , it is very significant.)

The null hypothesis

See also the wikipedia article.

When we perform hypothesis testing, we usually start with a hypothesis that describe a "normal" situation, usually referred to as the null hypothesis. This hypothesis is there so that we can accept or reject it with experimental data.

In the previous example, the null hypothesis specifies that the head probability of the coin is 1/2. Let's consider another example. Suppose that we have know that on average students will get 80 points from the final exam for the probability class. In this semester, we try something different. We add another review section for each week and we would like to test if this review section improves the test score. Let denote the average score of students taking review sections. Our null hypothesis is

H0: ,

which says that the review sections do not improve the score.

After we set up the null hypothesis, we should create a criteria for accept or reject the null hypothesis. For example, we may say that we shall reject the null hypothesis that the review section helps if the average score of students is greater than 90.

Error types

See also wikipedia article on Type I and II errors

For any criteria for testing a null hypothesis, there usually are chances that we make a mistake. There are two types of errors:

  • Type I error --- occurs when the hypothesis is true, but we reject it.
  • Type II error --- occurs when the hypothesis is false, but we accept it.

The table below lists all four cases. (Shamelessly taken from [1].)

Null hypothesis (H0) is true Null hypothesis (H0) is false
Reject null hypothesis Type I error
False positive
Correct outcome
Negative
Fail to reject null hypothesis Correct outcome
Positive
Type II error
False negative

When we perform statistically hypothesis testing, we would like to check if the hypothesis is consistent with the observed data. Therefore we shall only reject the hypothesis if the data is far inconsistent with the hypothesis, i.e., we shall reject if the data is very improbable if we assume that is true. More specifically, we want the test to reject , when is true, with the probability at most some small value . Common values for is 0.1 (10%), 0.05 (5%), 0.01 (1%), or even 0.005 (0.5%). The value is called the level of significance of the test. Note that this value is also the the type I error of the test.

EX1: Suppose that you know that average height of Kasetsart University students is 170cm with variance of 10. You look at students of the faculty of engineering and think that maybe these students do not share the average height of KU students. Let denote the mean of the heights of engineering students. We assume that the heights are normally distributed and the variance is the same as KU students' heights, i.e., . Now, our null hypothesis is:

H0:     .

We shall take a sample of size 10. Let's design a test criteria with level of significance .

Our test will consider the sample mean and will reject if is far from 170. We will have to figure out how far the mean from 170 that we need.

First recall that the sample mean of a normal population is normally distributed with mean and variance . Therefore, the statistic

is unit normal. Let's refer to this statistic as .

If we look at the standard normal table, we find out that

After some calculation, if we reject when

our test will have the level of significance of 0.01 as required.

Notes: In this case, the population that we want to test is engineering students, not KU students.

Concerning the type II errors

From the previous example, if actually the mean , the hypothesis is incorrect but it will be extremely hard to reject . Therefore, in this case, the type II error will be very high. From this example, we can see that the type II error depends on how far the actual parameter from the one in the null hypothesis.

EX2: Consider the average height example. If the actual mean and we use the same test criteria, what is the type II error rate?

We incorrectly accept if . To simplify our analysis, let's approximate the error by assuming that we accept the null hypothesis when . (This will cause a very small error in our probability calculation.)

Since the population mean and the population variance , the probability that this happens is

This is equal to 0.5 - 0.498 = 0.002, because is unit normal. (See the table here.)

Tests concerning the mean of a normal population

In this section, we discuss how to design a test for the mean of a normal population. That is, we want to test the null hypothesis

where is some specified constant. We usually test against an alternative hypothesis .

When is known

When the variance of the population is known, we can design a test with specified level of significance using the calculation as in our previous example.

Details will be added later.

When is unknown

The same reasoning we use when deriving the confidence interval of the sample mean when the population variance is unknown also works here. Recall that if we take a sample of size : , we can compute the statistics

and

Then, we have that

Given a complete description of the distribution of , we can design a hypothesis testing procedure as required.

EX3: Suppose that you know that average height of Kasetsart University students is 170cm. You look at students of the faculty of engineering and think that maybe these students do not share the average height of KU students. Let denote the mean of the heights of engineering students. We shall take a sample of size 10. Let's design a test criteria with level of significance for the following hypothesis

H0:     .

Let

Since the sample size is 10, we have that . Therefore, we shall look at the t-distribution table for 9 degrees of freedom. (See [2]) We have that

.

This implies that we shall reject when

Testing for the equivalence of the means

In this part, we have two populations. The first population is normal with mean with variance and the second population is also normal with mean with variance . We want to test

H0:     .

When the variances of both populations are known

In this section, we assume that the variances and are known.

Suppose that we take samples from the first population and from the second population. We shall compute the sample means and .

Note that if , then . Therefore we can write the hypothesis as

H0:     .

This new representation of the hypothesis suggests that we should reject the hypothesis if

is large.

Recall that

.

Therefore, we have that if is true, the random variable

is unit normal. Hence, a criteria for the hypothesis testing can be calculated using the standard normal table.

When the variances are not known

To be added later...