Hypothesis testing

27 January 2025

more practice at a skill (like batting) ⟹ more sample size, more data
there’s no perfect system, there’s always a chance of error.
hypothesis
a statement, belief, idea, opinion or a product
yet to be backed by evidence
may or may not be supported by our data.
because the actual truth or the population is unknown, we need hypothesis testing.
we always need two different, conflicting statements (analogy: only one lawyer for a court case)
- Null hypothesis: age-old belief or idea (an assumption to be challenged)
- Alternative hypothesis: new belief (researcher’s hypothesis)
first task is to correctly identify these two before proceeding.
- example 1:
  - safe/null: the accused is innocent
  - alternative: accused is guilty
- example 2:
  - safe/null: students will score good marks
  - alternative/negative: students will not score good marks
Hypothesis testing is not done for the null hypothesis.

Decision\true	Accused is inocent	accused is guilty
Accused is innocent	✅	Type II: slightly less serious error
Accused is guilty	Type I: more serious error	✅

null hypothesis $H_{o}\colon$ old drug is better
alternative hypothesis $H_{a}\colon$ new drug is better

Desision\truth	old is better	new is better
Old is better	✅	Type II: slightly less serious error
New is better	Type I: more serious error	✅

not a lot of harm if we continue with the old one by mistake.
if the old one was better, but by mistake it gets rejected ⟶ much serious problem.
alternatively, you can figure out null and alternative hypotheses using the decision table (by identifying type 1 and 2 errors)

Level and Power

How to quantify these errors/uncertainities?
- size of a test = P(type 1 error)
- $\alpha=$ max amount of probability of tolerable Type I error (guarding wickets)
  - confidence interval = $100(1-\alpha)\%$
- $P(\text{type 2 error}) = \beta$
- $\text{Power} = P(\text{rejecting } H_{0} H_{0} \text{ is false} ) = 1 - \beta$ (runs)
  - new drug: main objective is to find out if the new is doing better
We want to keep both $\alpha$ and $\beta$ low, but $\alpha$ dec ⟹ beta inc
- we have to limit $\alpha$: put a cap on $\alpha$
decrease alpha or increase power ⟹ sample size has to increase

Next class: quantifying errors

Examples

note: hypothesis testing is done for a single sample

Bottling machine A bottling machine is to be tested for accuracy of the amount it fills in 2-liter bottles. Setup the hypotheses required to test this.
- $\textcolor{var(–text-error)}{H_{0}}:$ amount of water filled $=\pu{2L}$
- $\textcolor{var(–color-red)}{ H_{a}: \overline{H_{0}}}$ avg $\neq \pu{ 2L }$.
- Variability is measured by $\sigma,\sigma^{2},s,s^{2}$
- most (all?) bottles are not filled exactly up-to 2 liters ⟶ 2 liters on average + good precision $s^{2}$ (narrow interval).
- suppose average of sample $=\pu{ 2.3L }$ ⟶ nothing uncertain about this (population parameter is uncertain).
- we make guesses for the population data using sample data. Sample data is always correct: $\begin{align} \textcolor{var(--color-red)}{ \xcancel{ H_{0}: \bar{x}=2L }} \\[5pt] \textcolor{var(--color-red)}{ \xcancel{ H_{a}: \bar{x} \neq 2L }} \end{align}$
Oil prices: During the sharp increase in gasoline prices in the summer of the year 2006, oil companies claimed that the average price of the unleaded gasoline with minimum octane rating of 89 in the Midwest was not more than $$3.75$. Test this claim.
- Is this the null or the alternative?
- there’s been a sharp increase ⟶ oil companies “claim” $\mu\leq 3.75$
- this is a new claim ⟹ alternative hypothesis.
- $H_{a}: \mu\leq $3.75$ ⟶ one sided test
- $H_{0}: \mu>3.75$ ⟶ left-tailed test
- sometimes we look at the status quo, some other times we look at common belief to find null and alternative hypothesis.
Problems in pdf
The Great Indian Sports Company
- $n = 100$
- $\bar{x}=98.65$
- $s=17.678$
- good situation and bad situation
- $\mu : \text{ mean time for all possible products}$
- $H_{0}: \mu = 96$ ⟶ wrong
- $H_{0}: \mu\leq96 = \mu_{0}$ ⟶ cost increases, but production does not stop
- $H_{a}: \mu>96$
- sample size is high, so $\bar{x} \sim \mathcal{N} \left( \mu, \frac{s^{2}}{n} \right)$

\[\begin{align} H_{0}&: \bar{x} \sim \mathcal{N} \left( \mu_{0}, \frac{s^{2}}{n} \right) \\ z_{\text{test-stat}}&=z_{\text{obs}}=\left( \frac{\bar{x}-\mu_{0}}{\frac{s}{\sqrt{ n }}} \right) \sim \mathcal{N} \left( 0, 1 \right) \end{align}\]

ipad

$\alpha=$ level of significance ⟶ maximum amount of tolerable type 1 error

$100(1-\alpha)\%$ = confidence level

$\begin{align} z_{obs} > z_{\alpha} \to\text{reject } H_{0} \\ z_{obs} < z_{\alpha} \to\text{accept } H_{0} \end{align}$

rejection region is on the right of the critical value ⟶ right tailed tests
$z_{obs} = 1.49$
$z_{\alpha} =$ norm.s.inv(1-0.05)

29 January 2025:

Iceberg

$n = 25$
$\bar{x}=56.27$
$s=7.8$
t-test
Assumption: the underlying population distribution is normal.
$H_{0}:$ turnover is good ⟶ mean waiting time $\geq 58$
$H_{a}:$ turnover is not good ⟶ mean waiting time $<58 \pu{ minutes }$
When is the turnover good?

$H_{0}: \mu\geq 58=\mu_{0}$

$\mu_{0}$ is called the null value.

\[H_{a}:\mu<58\] \[\frac{\bar{x}-\mu_{0}}{se(\bar{x})}\] \[t_{\text{test-stat}} = \frac{\bar{x}-\mu_{0}}{s/{\sqrt{ n }}} \sim t_{n-1}\]

$t_{\text{obs}} = -1.11$
$t_{\text{calc}}=$
Like normal distribution, $t$ is also symmetric.
ipad
$t_{1-\alpha}=-t_{\alpha}$ = t.inv(0.05, 24) = $-1.7$
null hypothesis is favoured.
left-tailed test

Decision rule

Alternative Hypothesis can look like one of these:

$H_{a}: \mu<\mu_{0}$ : left tailed test:
reject $H_{0}$ if $t_{obs} <$ critical value${} =t_{1-\alpha}$
$H_{a}: \mu>\mu_{0}$ : right tailed test
reject $H_{0}$ if ${} t_{\text{obs}}>$ critical}value${} =t_{\alpha}$
$H_{a}: \mu\neq\mu_{0}$ : two sided test: next class.

Important: Concluding statement. In the light of the evidence provided to us, it seems that the null hypothesis should be accepted for 5% level of significance.

conclusion may not be sacrosanct.
“it seems that” ⟶ we are correct only 95% of the times.
“accepted for 5% level of significance” ⟶ acknowledging the possibility of error.

In the light of the evidence provided to us, it seems that the turnover time will be good for 5% level of significance.

30 January 2025

Problems on one-sample proportion

Houston department store problem.

sample size = 80
sample proportion, $\hat{p} = 12/80$ (population proportion $p$ is of course, unknown)
$H_{0}:p=6\%=p_{0}$ (null value) vs. $H_{a}: p \neq p_{0}$ (from part c)

\[\begin{align} \mathrm{E}\left[ \hat{p} \right]=p \\[5pt] \mathrm{var}\left(\hat{p} \right) = \frac{\hat{p}(1-\hat{p})}{n} \\[20pt] \text{CLT: } n \hat{p} > 5 \text{ then} \\[5pt] \hat{p}\sim \mathcal{N} \left( p, \frac{p(1-p)}{n} \right) \\[10pt] \end{align}\] \[\begin{align} H_{0}&:\hat{p}=p_{0} \\ H_{0}&:n\hat{p}=80\times \frac{12}{80} = 12>5 \implies \text{CLT} \\ \\ \hat{p}&\sim \mathcal{N} \left( p, \frac{p(1-p)}{n} \right) \\ \\ \frac{\hat{p}-p}{\sqrt{ \frac{p(1-p)}{n} }} &\sim \mathcal{N} \left( 0, 1 \right) \end{align}\] \[p_{\text{test-stat}} = \frac{\hat{p}-p_{0}}{\sqrt{ \frac{p_{0}(1-p_{0})}{n} }} \sim \mathcal{N} \left( 0, 1 \right)\]

$z_{\text{obs}}=3.3895960972$

ipad
$z_{\alpha/2}=1.96$ ⟹ reject
Reject $H_{0}$ if $z_{\text{obs}}>z_{\alpha/2}$ or $z_{\text{obs}}<-z_{\alpha/2}$
Reject $H_{0}$ if $\lvert z_{\text{obs}} \rvert > z_{\alpha/2}$

Given the sample data provided to us, it seems that the propotion of items returned from the Houston store is significantly different from the national figures at 5% level of significance (or 95% level of confidence).

95% confidence interval for $p$.
Confidence interval = $\hat{p}\pm z_{\alpha/2}\sqrt{ \hat{p}(1-\hat{p})/n }$
[0.072, 0.228]
Does not contain $0.06$ ⟹ reject $H_{0}$

If your $100(1-\alpha)\%$ CI contains the null value, then accept $H_{0}$, at $\alpha\%$ level of significance, otherwise reject $H_{0}$.