T Confidence Intervals

We previously discussed confidence intervals of the form \(Estimate \pm ZQ \times StdErr_{Est}\).

Where \(ZQ\) is a standard distribution quantile.

We’ll now look at methods for small samples, which will use \(Estimate \pm TQ \times StdErr_{Est}\), where \(TQ\) is a T-distribution quantile.

Gosset’s t distribution

Invented by William Gosset under the pseudonym ‘Student’. Hence “Student’s t-distribution”.

Thicker tails, and indexed by degrees of freedom, rather than mean and variance.

It assumes the underlying data are IID Gaussian with the result that

\[ \frac{ \bar{X} - \mu }{ S / \sqrt{n} } \]

follows Gosset’s t distribution with \(n-1\) degrees of freedom.

When we replace \(\sigma\) with \(S\), the sample standard deviation, it turns into a t-distribution. When \(n\) increases, the distinction becomes irrelevant, but with small \(n\) the distinction can be quite wide. If you use the standard normal with small sample sizes you get confidence intervals that are too narrow.

Interval is \(\bar{X} \pm t_{n-1} S / \sqrt{n}\), where \(t_{n-1}\) is the relevant quantile.

We take a look at the t-distribution versus a standard normal with 3 degrees of freedom:

tibble(
    x = seq(-4, 4, 0.01),
    z = dnorm(x),
    t = dt(x, 3)
) %>%
    gather('dist', 'y', c('z', 't')) %>%
    ggplot() +
    geom_line(aes(x, y, colour = dist))

We can see the tails are much fatter.

Notes

Technically assumes the data are IID normal, though it is robust in this assumption.
Works well whenever the disibutrion is roughly symmetric and mound shaped.
Paired observations are often analysed using the t interval by taking differences.
- For example measuring something once, then measuring it again a day later.
For large degrees of freedom, the t quantiles become the same as standard normal quantiles.
- Suggest always use the t-interval.
For skewed distributions, the spirit of the t interval assumptions are violated.
- Consider taking logs or using a different summary like the median.
For discrete data, other intevals such as the Poisson are available.

Independent Group t Confidence Intervals

Suppose we want to compare the mean blood pressure between two groups in a randomised trial: those who received the treatment and those who received a placebo.
We cannot use the paired t test because groups are independent and may have different sample sizes.

The confidence interval is:

\[ \bar{Y} - \bar{X} \pm t_{n_x + n_y-2,1-\alpha/2} S_p \bigg(\frac{1}{n_x} + \frac{1}{n_y}\bigg) ^ {1/2} \]

The mean of group \(Y\) minus the mean of group \(X\)
Times the relevant t quantile
- The degrees of freedom are \(n_x + n_y - 2\), where \(n_{foo}\) is the number of observations in a group.
The entire \(S_p\) section is the standard error of the difference.
- Notice that as the number observations grows, \(n\) gets large and so this entire section gets smaller.

The \(S_p^2\) is the pooled variance:

\[ S^2_p = \frac{ (n_x-1)S^2_x + (n_y-1)S^2_y }{ n_x + n_y -2 } \] If we’re willing to assume that the variance is the same in each group - which is reasonable given they’re randomised. However the sample sizes are differnet, so we need to weight the variances.

Remember this assumes the same variance.

Two Group Unequal Variance Test

If the \(x\) oservations and the \(y\) observations are IID normal, potentially with different means and different variances, the relevant normalised statistic does not follow a t-distribution.

When there are unequal variances between the groups, a Welch’s t-test should be used.

The confidence interval is:

\[ \bar{Y} - \bar{X} \pm t_{df} \times \bigg( \frac{ s^2_x }{ n_x } + \frac{ s^2_y }{ n_y } \bigg)^{1/2} \] It can be approximated by a rather elaborate formula for the degrees of freedom:

\[ df = \frac{ (S^2_x/n_x + S^2_y/n_y)^2 }{ \bigg(\frac{S^2_x}{n_x}\bigg)^2 / (n_x - 1) + \bigg(\frac{S^2_y}{n_y}\bigg)^2 / (n_y - 1) } \]

Shifting around, the t-statistic is:

\[ t = \frac{ \bar{Y} - \bar{X} }{ \sqrt{ \frac{ s_x^2 }{ n_x } + \frac{ s_y^2 }{ n_y } } } \]

When you’re in doubt, use the unequal variance.

Other Kinds of Data

For binomial data there’s lots of ways to compare two groups:

Relative risk, risk difference, odds ratio.
Chi-squared tests, normal approximations, exact tests.

For count data, there’s also chi-squared tests and exact tests.

Hypothesis Testing

Starts with the null hypothesis represents the status quo, and is usually labeled \(H_0\). The null hypothesis is assumed to be true and statistical evidence is required to reject it.

Four scenarios:

Accept the null
Type I error (false positive)
Reject the null
Type II error (false negative)

Consider a court of law, the null hypothesis is that the defendant is innocent. We require a standard on the available evidence to reject the null hypothesis and convict.

If we set a low standard, we increase the amount of people convicted (type I errors). However we would increase the percentage of guilty people convicted (correctly rejecting the null).

If we set a high standard, we increase the percentage of people let free (correctly accepting the null) while we would also increase the percentage of guilty people let free.

Rejection

We reject the null hypothesis if \(\bar{X}\) was larger than some constant, \(C\). \(C\) is chosen so that the probability of a type I error, \(\alpha\), is 0.05. \(\alpha = \text{Type I error rate}\), which is the probability of rejecting the NH when in fact the hypothesis is correct.

The standard of the error is \(\sigma^2/\sqrt{n}\). Under \(H_0 \bar{X} \sim N(\mu, \sigma)\). We want to chose C so that \(P(\bar{X} \gt C; H_0)\) is 5%.

The 95th percentile of a standard deviation is 1.645, so \(C = \mu + 1 \times 1.645\).

In general we don’t convert \(C\) back to the original scale, we can just reject because of the Z-score, which is how many standard errors the sample mean is above the hypothesised mean.

Formally, whenever:

\[ \frac{ \sqrt{n}(\bar{X} - \mu_0) }{ s } \gt Z_{1-\alpha} \]

T Tests

When \(n\) is small (e.g. 16) the test statistic, how many estimated standard errors from the hypothesised mean the sample mena is, follows a T-distribution with \(n-1\) (e.g. 15) degrees of freedom.

We then calculate the 95th percentile of the T-distribution rather than the normal (qw(.95, x)).

The two sided tests rejects the null if it’s too big or too small. Often used even if it doesn’t make sense in the scientific setting.

In order to get the 5% probability of rejecting the null, we’ve got 2.5% on each tail. We thus use qt(0.975, 15).

Single Group T-Test

Let’s consider we have a sample from a single group. In our case it’s 10 random normal observations. Let our null hypothesis be that the true mean is 0. We then run a t.test on it:

set.seed(1)

t.test(rnorm(10))

## 
##  One Sample t-test
## 
## data:  rnorm(10)
## t = 0.53557, df = 9, p-value = 0.6052
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.4261948  0.6906003
## sample estimates:
## mean of x 
## 0.1322028

t.test(rnorm(10, mean = 2))

## 
##  One Sample t-test
## 
## data:  rnorm(10, mean = 2)
## t = 6.6493, df = 9, p-value = 9.382e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1.48376 3.01393
## sample estimates:
## mean of x 
##  2.248845

The t-statistic is the ratio of the departure of the estimated value from its hypothesised value to the standard error.

\[ t_{\hat{\beta}} = \frac{ \hat{\beta} - \beta_0 }{ SE(\hat{\beta} ) } \] where \(\beta_0 = 0\)

In the first test, the t-statistic is small, so we cannot reject the null hypothesis.

In the second the t-statistic is larger, so we can be confident in rejecting the null hypothesis.

Paired T-Test

The paired t-test is the same as a single group, except we’re taking the differences between two observations.

Note below that the outcomes are the same:

sample_a <- rnorm(10)
sample_b <- rnorm(10, mean = 10)

t.test(sample_a, sample_b, paired = T)

## 
##  Paired t-test
## 
## data:  sample_a and sample_b
## t = -27.034, df = 9, p-value = 6.279e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.112461  -9.396346
## sample estimates:
## mean of the differences 
##                -10.2544

t.test(sample_a - sample_b)

## 
##  One Sample t-test
## 
## data:  sample_a - sample_b
## t = -27.034, df = 9, p-value = 6.279e-10
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -11.112461  -9.396346
## sample estimates:
## mean of x 
##  -10.2544

Two Group Intervals

We know how to do two group T tests: we have covered independent group T intervals. The rejection rules are the same, except we’re testing \(H_0 : \mu_1 = \mu_2\)

We can do tests where the variance is the same or different in both groups. By default t.test() does an unequal variance Welch test.

set.seed(1)
group_a <- rnorm(10)
group_b <- rnorm(10, mean = 3)
group_c <- rnorm(10, mean = 3, sd = 4)

t.test(group_a, group_b, var.equal = T)

## 
##  Two Sample t-test
## 
## data:  group_a and group_b
## t = -7.4434, df = 18, p-value = 6.737e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.996318 -2.236966
## sample estimates:
## mean of x mean of y 
## 0.1322028 3.2488450

t.test(group_a, group_c)

## 
##  Welch Two Sample t-test
## 
## data:  group_a and group_c
## t = -1.8911, df = 9.7493, p-value = 0.08865
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.0915810  0.4253725
## sample estimates:
## mean of x mean of y 
## 0.1322028 2.4653070

Notice the fractional degrees of freedom in the second test, and that the t-statistic is low due to the variability in the second group which adds ‘noise’.1

P-Values

P-values are a conveinient way to communicate the results of a hypothesis test. Formally the p-value is the probability of getting data as or more extreme than the observed data in favor of the alternative. The calculation is done assuming the null is true.

Approach:

Define the hypothetical distribution of a data summary (statistic) when “nothing is going on” - the null hypothesis.
Calculate the summary/statistic with the data we have - test statistic.
Compare what we calculated to our hyptothetical distribution and see if the value is “extreme” - p-value.

If the P-value is small, then either \(H_0\) is true and we have observed a rare event, or \(H_0\) is false.

Example

We get a t-statistic of 2.5 for 15 degrees of freedom testing \(H_0 : \mu = \mu_0\) versus \(H_a : \mu \gt \mu_0\)

What’s the probability of getting a T-statistic as large as 2.5?

pt(q = 2.5, df = 15, lower.tail = F)

## [1] 0.0122529

We see the probabilit is 1.2%.

For a two sided hypothesis test where the critical area is on both tails of the distribution, double the smaller of the two one-sided hypothesis test values to take into account the tails. Most times software that reports the p-value is performing a two-tailed test.

Poisson Example

Suppose that a hospital has an infection rate of 10 infections per 100 person/days at a risk (rate) of 0.1 during the last monitoring period.

Assume that an infection rate of 0.05 is an important benchmark. Given the model, could the observed rate being larger than 0.05 be attributed to chance?

Under \(H_0 : \lambda = 0.05\) so that \(\lambda_0 \times 100 = 5\).
Consider \(H_a : \lambda \gt 0.05\)

# The 9 is (10 - 1) so that we're getting the upper tail or "10 or more"
ppois(9, 5, lower.tail = F)

## [1] 0.03182806

Course 6 - Statistical Inference - Week 3 - Notes

Greg Foletta

2019-12-30