Confidence Interval



So we have a sample from the population and the sample mean is :

n

∑ xi / n
i=1


This serves as a simple 'POINT ESTIMATE' of the mean. As the number of samples increase, the sample mean will be approximately equal to the population mean as shown in the previous post.

We know that the mean of the infinite sample is approximately equal to the population mean. Theoretically, the population mean is "Sum of Sample Mean + Standard Error".

The maximum error made in the sample mean calculation is called "Margin of Error" .

The probability associated with the claim is called "Confidence Level".

So if I say, population mean lies between [x,y] with 95% confidence, it means the range [x,y] is the confidence interval and 95% is my confidence level. 

For instance:

Say population mean is 20 and the margin of error is 2 then:


P(μ-2 < μ < μ+2) = 95% P(20-2 < 20 < 20+2) = 95%

(Derived from the empirical rule of normal distribution)


P(18 < 20 < 22) = 95%
 

Confidence interval is [18,22].

Confidence interval describes the amount of uncertainty associated with the sample estimate of the parameter.



Confidence Interval
Z
90%
+- 1.65
95%
+- 1.96
99%
+- 2.58
 




Calculating the Confidence Interval:

Given μ and σ:

If we already know population mean and standard deviation, we can compute confidence interval using the above formula for any confidence level.

Given μ and unknown σ:

We use t-distribution in this case and the formula loos like:


Calculating Confidence Intervals using Empirical Bootstrapping:

Bootstrapping uses sampling with replacements technique to calculate Confidence Intervals using any statistical parameter.it has two stages:
1. Sampling with replacement
2. Calculating the Confidence Interval

In the first stage, we sample the population with replacement and compute the desired statistic. More the samples, better the results. The the computed statistic is sorted for later processing.

In the final stage, depending the confidence value we are interested in, an appropriate value corresponding to the confidence value is chosen as the lower and upper bound of confidence interval.

For instance, say there are 1000 samples and the confidence level we're interested in is 95%. Then the lower bound will be the 25th value and the upper bound will be the 975th value.

Comments