Prezentace se nahrává, počkejte prosím

Prezentace se nahrává, počkejte prosím

Summary.

Podobné prezentace


Prezentace na téma: "Summary."— Transkript prezentace:

1 summary

2 population of students that did not attend the musical lesson
parameters are known 𝜇 0 𝜎 0 population of students that did attend the musical lesson unknown 𝜇 𝜎 sample 𝑥 statistic is known

3 Z-test Test statistic 𝑍= 𝑥 − 𝜇 0 𝜎 0 𝑛
𝑍= 𝑥 − 𝜇 0 𝜎 0 𝑛 test statistic Z-test We use Z-test if we know the population mean 𝜇 0 and the population sd 𝜎 0 .

4 Formulate the test statistic
population of students that did not attend the musical lesson 𝜇 0 𝜎 0 known unknown assumption: 𝜎 0 =𝜎 population of students that did attend the musical lesson unknown 𝜇 𝜎 𝑡= 𝑥 − 𝜇 0 𝑠 𝑛 sample 𝑥 𝑠 one sample t-test

5 Z-test vs. t-test Use Z-test if Use t-test if
you know the standard deviation of the population. If you know the sample 𝑠 AND you have a large sample size (traditionally over 30). In addition, you assume that the population standard deviation is the same as the sample standard deviation. Use t-test if you don't know the population standard deviation (you now only sample standard deviation 𝑠) and have a relatively small sample size. Tip: If you know only the sample standard deviation, always use t-test.

6 Summary of t-tests one-sample test (jednovýběrový test)
you test H0 : 𝜇= 𝜇 0 two-sample test (dvouvýběrový test) you test H0 : 𝜇 1 − 𝜇 2 =0 dependent samples paired t-test (párový test) independent samples equal variances 𝜎 1 ~ 𝜎 2 unequal variances 𝜎 1 ≠ 𝜎 2 two-sample tests

7 t-test assumptions equality of variances – how to test it?
data normality – how to test it? nonparametric tests

8 QQ-plot

9 new stuff

10 t-test, one sample, one sided (𝜇<1000)
Výrobce garantuje, že jím vyrobené žárovky mají životnost v průměru 1000 hodin. Aby útvar kontroly zjistil, že tomuto konstatování odpovídá i v daném období vyrobená a expedovaná část produkce, vybral z připravené dodávky náhodně 50 žárovek a došel k závěru, že průměrná doba životnosti je 950 hodin a směrodatná odchylka doby životnosti pak 100 hodin. Je možné zjištěný rozdíl doby životnosti ve výběru připsat náhodě nebo je známkou nekvality produkce? t-test, one sample, one sided (𝜇<1000)

11 paired t-test, dependent, one sided
Ve městě Zpiťákov se dělal výzkum požívání alkoholu tak, že se náhodně vybralo 8 občanů a u nich se zjistila průměrná dávka alkoholu za měsíc. Po nějaké době došlo ve městě ke dvěma úmrtím na cirrhózu jater (u jiných Zpiťákovců, než kteří byli statisticky testováni). K posouzení, zda tato událost snížila konzumaci ve městě se použilo výsledků předchozího výzkumu a navíc byla u stejných 8 občanů zjištěna měsíční spotřehba po úmrtí. Rozhodněte, zda ona dvě úmrtí snížila konzumaci? paired t-test, dependent, one sided

12 Z-test, one sided (𝜇<67)
Průměrná váha žen v ČR ve věku let je 67 kg se směrodatnou odchylkou 4 kg. U 10 náhodně vybraných studentek VŠCHT byla zjišťována jejich hmotnost, její průměr činí 65.4 kg se směrodatnou odchylkou 3,2 kg. Vede dlouhotrvající sezení na nudných přednáškách k poklesu váhy studentek? Z-test, one sided (𝜇<67)

13 t-test, two sample, independent, two sided ( 𝜇 1 ≠ 𝜇 2 )
Porovnáváme množství organických látek v odpadních vodách dvou papíren. Na základě několika náhodných měření v těchto papírnách máme rozhodnout, zda se tyto papírny liší v množství odpadních látek. V první papírně proběhlo 20 měření s průměrem 14.9 a směrodatnou odchylkou měření z druhé papírny vykazovalo průměr 22. a směrodatnou odchylku 7.4. t-test, two sample, independent, two sided ( 𝜇 1 ≠ 𝜇 2 )

14 t-test, one sample, one sided (𝜇>8.72)
Podnikatel začal vyrábět jehly do šicích strojů. Prosadí se na trhu jedině tehdy, budou-li jehly mít vyšší životnost než konkurenční. Z odborného tisku podnikatel zjistil, že životnost konkurenčních jehel je 8.72 milion stehů. Sám na zkoušku vyrobil 395 jehel a změřil životnost každé z nich. Výběrový průměr a směrodatná odchylka činí 8.92 a Má podnikatel rozjet výrobu naplno? t-test, one sample, one sided (𝜇>8.72)

15 Report statistical results
Descriptive statistics mean, s.d.

16 Report statistical results
Inferential statistics – hypothesis test kind of test (e.g., one-sample t-test) the actual value of the test statistic (e.g., the value of t) d.f. p-value if applicable, give a direction of test (e.g., one-tailed or two-tailed) 𝛼 level! APA style for reporting results of our hypothesis test t(df) = X.XX, p = X.XX, direction e.g. t(24) = -2.50, p = 0.01, one-tailed

17 Report statistical results
Inferential statistics – confidence intervals confidence level (e.g., 95%) lower limit upper limit CI on what (e.g., on a mean)? APA style Confidence interval on the mean difference; 95% CI = (4,6)

18 anova

19 A problem You're comparing three brands of beer.

20 A problem You buy four different bottles from each brewery. They have the following prices. Which of these brands have significantly different prices. Primátor and Kocour Primátor and Matuška Kocour and Matuška No significant difference between any of these three. Primátor Kocour Matuška 15 39 65 12 45 14 48 32 11 60 38

21 Beer brands – a boxplot

22 t-test We can do three t-tests to statistically show if there is a significant difference between these brands. How many t-tests would you need to compare four samples? 6 To compare 10 samples you would need 45 t-tests! This is a lot. We don’t want to do a million t-tests. But in this lesson you'll learn a simpler method. Its called Analysis of variance (Analýza rozptylu) – ANOVA.

23 Multiple comparisons problem
And there is another (more serious problem) with many t-tests. It is called a multiple comparisons problem.

24 We don’t want to do a million t-tests.
But we can use the same ideas that underlie t-tests to compate three or more samples. In t-test, a general form of t-statistic was 𝑡= 𝑥 1 − 𝑥 2 𝑆𝐸 To compare three or more samples we do something similar. We have some kind of measure of distance between means in numerator, and some kind of error in denominator. difference/variability between means error

25 Numerator How can we compare three or more samples?
Use the maximum distance between any two sample means. Use the average deviation of each sample mean from the total mean. Find the distance each sample mean is from each of the other sample means. Find the averaged squared deviation of each sample mean from the total mean. Find the average squared deviation of each value in each sample from the total mean.

26 Grand mean It is called a grand (total) mean, 𝑥 𝐺 .
celkový průměr Let’s say we have four samples A, B, C and D with means 𝑥 𝐴 , 𝑥 𝐵 , 𝑥 𝐶 and 𝑥 𝐷 . Will the grand mean be 𝑥 𝐴 + 𝑥 𝐵 + 𝑥 𝐶 + 𝑥 𝐷 4 ? Always Sometimes Never mean of sample means

27 Between-group variability
What conclusions can we draw from deviation of each sample mean from the grand mean? The greater the distance between sample means, the less likely population means will differ significantly. The smaller the distance between sample means, the less likely population means will differ significantly. The greater the distance between sample means, the more likely population means will differ significantly. The smaller the distance between sample means, the more likely population means will differ significantly. This is called between-group variability (variabilita mezi skupinami). And this we're trying to measure.

28 Denominator difference error
Just like we analyzed the variability of the sample/samples which we use to create a standard error, we need to consider the variability of each individual sample when comparing three or more.

29 How variability impacts the difference in means

30 Within-group variability
What does this say about comparing three or more samples? Check all that apply. The greater the variability of each individual sample, the less likely population means will differ significantly. The smaller the variability of each individual sample, the less likely population means will differ significantly. The greater the variability of each individual sample, the more likely population means will differ significantly. The smaller the variability of each individual sample, the more likely population means will differ significantly. This is called within-group variability (variabilita v rámci skupin).

31 ANOVA If we compare samples we simply extend the idea of the t-test.
We can compare samples to each other by comparing how far each sample mean is from the grand mean (between group variability). But we also want to look at the variability of each sample because this impacts whether or not the samples are significantly different (within group variability). ANOVA can compare as many means as you want just with one test.

32 Hypothesis Let's compare three samples with ANOVA. Just try tu guess what the hypothesis will be? 𝐻 0 : 𝜇 1 = 𝜇 2 = 𝜇 3 𝐻 1 : 𝜇 1 ≠ 𝜇 2 ≠ 𝜇 3 𝐻 0 : 𝜇 1 ≠ 𝜇 2 ≠ 𝜇 3 𝐻 1 : 𝜇 1 = 𝜇 2 ≠ 𝜇 3 𝐻 1 : at least one pair of samples is significantly different Follow-up multiple comparison steps – see which means are different from each other. between−group variability within−group variability

33 F ratio 𝐹= between−group variability within−group variability
As "between group variability" increases F-statistic increases, and this leans more in favor of the alternative hypothesis that at least one pair of means is significantly different. As "within group variability" increases F-statistic decreases, and this leans more in favor of the null hypothesis that the means are not siginificantly different. 𝐹= between−group variability within−group variability

34 F ratio 𝐹= between−group variability within−group variability
Just like the t-statistic, the numerator indicates how much the means differ. This is explained variation because it most likely results from the differences due to the treatment or just differences in the populations (recall beer prices, different brands are differently exppensive). The denominator is a measure of error. It measures individual differences of subjects within each group. This is considered error variation because we don't know why individual subjects in the same group are different. 𝐹= between−group variability within−group variability

35 Between-group variability
SS – sum of squares, suma čtverců MS – mean squares, průměrné čtverce 𝑆𝑆𝐵 𝑀𝑆𝐵= 𝑛 𝐾 𝑥 𝑘 − 𝑥 𝐺 2 𝑘−1 𝑀𝑆𝐵= 𝑛 𝑥 𝑘 − 𝑥 𝐺 𝑘−1 𝑑 𝑓 𝐵 if all samples have the same size

36 Within-group variability
𝑀𝑆𝑊= 𝑆𝑆𝑊 𝑑 𝑓 𝑊 = 𝑘 𝑥 𝑖 − 𝑥 𝑘 𝑁−𝑘 𝑀𝑆𝐵= 𝑆𝑆𝐵 𝑑 𝑓 𝐵 = 𝑛 𝐾 𝑥 𝑘 − 𝑥 𝐺 2 𝑘−1 Primátor Kocour Matuška 15 39 65 12 45 14 48 32 11 60 38 𝑥 𝑖 ... value of each data point 𝑥 𝑘 ... sample mean 𝑁 ... total number of data points 𝑘 ... number of samples 𝑛 𝐾 ... number of data points in each sample 𝐾 𝑥 𝐺 = 𝑥 𝑖 𝑁 ... grand mean

37 Total variability What is the total number of degrees of freedom?
𝑑 𝑓 𝐵 =𝑘−1 𝑑 𝑓 𝑊 =𝑁−𝑘 What is the total number of degrees of freedom? 𝑑 𝑓 𝐵 +𝑑 𝑓 𝑊 =𝑘−1+𝑁−𝑘=𝑁−1=𝑑 𝑓 𝑇𝑜𝑡𝑎𝑙 Likewise, we have a total variation 𝑆𝑆𝑇=𝑆𝑆𝐵+𝑆𝑆𝑊=∑ 𝑥 𝑖 − 𝑥 𝐺 2 𝑀𝑆𝑇=𝑀𝑆𝐵+𝑀𝑆𝑊

38 𝐹 𝑑 𝑓 𝐵 , 𝑑 𝑓 𝑊 = 𝑀𝑆𝐵 𝑀𝑆𝑊 𝑑 𝑓 𝐵 =𝑘−1 𝑑 𝑓 𝑊 =𝑁−𝑘
F-ratio 𝐹 𝑑 𝑓 𝐵 , 𝑑 𝑓 𝑊 = 𝑀𝑆𝐵 𝑀𝑆𝑊 𝑑 𝑓 𝐵 =𝑘−1 𝑑 𝑓 𝑊 =𝑁−𝑘

39 F-distribution

40 F distribution

41 Beer prices 𝑀𝑆𝐵= 𝑆𝑆𝐵 𝑑 𝑓 𝐵 =1505.3 𝑀𝑆𝑊= 𝑆𝑆𝑊 𝑑 𝑓 𝑊 =95.78 𝑥 𝑘
𝑀𝑆𝐵= 𝑆𝑆𝐵 𝑑 𝑓 𝐵 =1505.3 Primátor Kocour Matuška 15 39 65 12 45 14 48 32 11 60 38 13 𝑀𝑆𝑊= 𝑆𝑆𝑊 𝑑 𝑓 𝑊 =95.78 𝑥 𝑘 𝑥 𝐺 =35.33 𝑆𝑆𝐵=𝑛 𝑥 𝑘 − 𝑥 𝐺 2 =3011 𝑑 𝑓 𝐵 =𝑘−1=2 𝑆𝑆𝑊= 𝑘 𝑥 𝑖 − 𝑥 𝑘 2 =862 𝑑 𝑓 𝑊 =𝑁−𝑘=9 𝐹 2,9 = 𝑀𝑆𝐵 𝑀𝑆𝑊 =15.72

42 Beer brands – ANOVA


Stáhnout ppt "Summary."

Podobné prezentace


Reklamy Google