STATISTICS Course

Confidence interval

Go back

In the likelihood function example, we got $\hat\theta = 0.6$ while the real value was $0.7$. Most of the time, instead of giving an estimator, we are giving either a confidence interval or both a confidence interval and an estimator.

Since you are usually working on a sample and not on the whole dataset, giving the size of the sample is also a good practice.


Theory

  • Tolerance interval

An interval with a confidence level, containing a proportion $\alpha$ of the population (often $\alpha =68\%/95\%/99\%$).

  • Confidence interval

An interval with a confidence level, of the range of $\theta$.

  • Confidence level

Often $90\%/95\%/99\%$. It seems that in 90%/... of tries, then our value will be in the interval.


Normal distribution confidence interval

According to the Central limit theorem (Théorème de la limite centrale), a sequence of random variables when $n \to +\infty$ converges to a normal distribution. Because of this, you will be able to work on normal distribution a lot, so you must remember the 95% confidence interval.

  • $[\mu-\sigma, \mu+\sigma]$ = $\frac{2}{3}$ of our data
  • $[\mu-1.96*\sigma, \mu+1.96*\sigma]$ = 95% of our data
  • if the distribution is symmetric, mean=median

Fact

Let's say you got a distribution generated by (like it was in our example)

x <- rbinom(n = 10, size = 1, prob = 0.7)

We could try a lot of times generating $x$ and evaluating the MLE. If you do that, you will end up approaching the real value. But since you are not the one generating the sample in the real-life, you need to look for how much error you are making while assuming that your estimator is correct.

Here is an example of the estimation of the mean for 10 successes out of 20 tries. I think you can already guess that the value is $0.5$.

library('binom')
binom.confint(x = 10, n = 20, method="all")
# method  x  n mean     lower     upper
# 1  agresti-coull 10 20  0.5 0.2992980 0.7007020
# 2     asymptotic 10 20  0.5 0.2808694 0.7191306
# 3          bayes 10 20  0.5 0.2933765 0.7066235
# 4        cloglog 10 20  0.5 0.2713278 0.6918925
# 5          exact 10 20  0.5 0.2719578 0.7280422
# 6          logit 10 20  0.5 0.2938989 0.7061011
# 7         probit 10 20  0.5 0.2914070 0.7085930
# 8        profile 10 20  0.5 0.2910141 0.7089859
# 9            lrt 10 20  0.5 0.2909826 0.7090174
# 10     prop.test 10 20  0.5 0.2992980 0.7007020
# 11        wilson 10 20  0.5 0.2992980 0.7007020