EXAM IN SF2950 (old name 5B1550 APLLIED MATHEMATICAL STATISTICS WEDNESDAY 17TH MARCH 2010 14.00 19.00 Examiner : Gunnar Englund, tel. 790 7416, email: gunnare@math.kth.se Allowed aids: Formel- och tabellsamling i matematisk statistik. Formel- och tabellsamling i TMS. Calculator. Results are given before wednesday April 2th 2010 via Mina sidor. If you want the result by email send a mail to gunnare@math.kth.se asking for this. Notation should be defined. Arguments and calculations should be so clear that they are easy to follow. Numerical answers should be given with at least 2 significant decimals. Each problem is worth 10 points. The limit for passing is preliminarily 20 points. The exams will be available at studentexpedition until 6 weeks after exam. The problems are not listed in increasing order of complexity but according to area. Uppgift 1 a The strength of two types of alloys are to be investigated. 12 samples of each alloy was tested and the following results were obtained. Alloy 1 12,3 17,5 6,2 6,2 2,6 9,8 8,1 4,3 18,3 17,1 10,3 5,5 Alloy 2 11,0 18,4 9,5 12,4 9,0 6,1 8,2 14,6 20,5 8,5 17,3 13,8 Use an appropriate non-parametric test to test the hypothesis that the alloys are equivalent against the hypothesis that they differ. (5 p b To investigate the home work time spent per week in a school a sample survey was made. The following data were obtained: Number of studets Number of students in sample Mean Variance Junior level 200 30 0.6 0.5 Intermediate level 250 50 1.1 0.8 Upper level 220 40 1.5 1.4 High school 170 50 3.4 2.1 Give an approximate 95% confidence interval for the average time spent per week for the students. (5 p Uppgift 2 Viscosity measurements were made on mixtures of the hydrocarbon styrene and a polyester for different levels of styrene, namely two independent replications for each of the styrene concentrations 16%, 18%, 20%, 22% and 24%. The following data were obtained:
forts tentamen i SF2950 FD 5B1550 10 03 17 2 % styrene (x 16 18 20 22 24 viscosity (y 21.0 25.0 20.0 20.0 18.0 14.0 16.0 14.0 11.0 11.0 By coincidence all decimals were 0. Using a computer package the following two ANOVA tables were calculated: Regression of y on x: Regression d.f. = 1 Sum of squares = 168.2 Residual d.f. = 8 Sum of squares = 21.8 Total d.f. = 9 Sum of squares = 190.0 y-mean = 17.00, regr-coeff. = -1.45, R2 = 88.53 % ANOVA model for one way layout in groups after x: Between groups d.f. = 4 Sum of squares = 172.0 Within groups d.f. = 5 Sum of squares = 18.0 Total d.f. = 9 Sum of squares = 190.0 a Under each of these models estimate the error standard deviation. (2 p b Test for linearity in the variable x. (4 p c Under the assumption of linear regression we want to have confidence bands for the true regression line, which with 95% confidence level contains the the line within the whole x- interval 16 < x < 24. Determine the shape of this confidence band and calculate its value for 18% styrene. (4 p Uppgift 3 The dissolving time for the three substances A, B, and C were compared. In each of four solvents 25 g of the substances were dissolved yielding the following results: Solvent Substance A B C 1 24.6 22.7 19.1 2 25.3 24.8 20.4 3 28.0 27.0 22.3 4 30.2 30.9 25.6 Auxilliary sums: 4 i=1 (ȳ i. ȳ.. 2 = 26.243, 3 j=1 (ȳ.j ȳ.. 2 = 15.829, 3 j=1 (y ij ȳ.. 2 = 143.983, ȳ.. = 25.075 4 i=1 a It was thought that there was no interaction between substance and solvent. Test on the level 5% that the substances are equivalent as to dissolving. (3 p b Estimate the systematic difference between the effects of substances A and B and give a 95% confidence interval for this difference. (3 p c Assume that we know that σ 2 = 0.12. Show how we then can test whether there is any interaction. Perform the test on the level 5%. (4 p Uppgift 4 In comparison made by the Institute for applied environmental science they wanted to find out with what precision laboratories could determine the amount of Kjeldahl-nitrogen in
forts tentamen i SF2950 FD 5B1550 10 03 17 3 nutrients. A number of laboratories were given two samples to analyze. All samples were in fact taken from the same material, and they therefore represented identical concentrations. The following data concerns the six laboratories which used a combination of Cu-catalysts and a photometric method. This analysis method was thought to be without systematic errors and all the participating laboratories were judged to be representative for their type of laboratory. Here are the results: Lab id-nr Sample 1 Sample 2 Avg. 26 1.85 2.01 1.93 44 2.33 2.08 2.205 50 2.95 2.60 2.775 57 2.20 1.90 2.05 89 1.93 2.33 2.13 192 2.34 2.16 2.25 Avg. 2.27 2.18 2.22 If these data are analyzed in a statistical package for Full factorial with Lab, Sample and Lab*Sample we get the following ANOVA: Source DF SS Lab 5 0.8604 Sample 1 0.0225 Lab*Sample 5 0.2240 Total 11 1.1069 Note: We do not get any Error-sum-of-squares or an F-quotient since there were no degrees of freedem left for Error. Task: Give a reasonable statistical model for the data and estimate the parameters of the model. (10 p Uppgift 5 In a fractional 2 4 1 -design without replications where the three-way-interaction ABC was confounded with the identity I we got the following effect estimates I = 1400 Â = 20 ˆB = 30 Ĉ = 6 ˆD = 26 ÂD = 6 BD = 4 ÂBD = 4 a Describe how the other effects were confounded with the above. (3 p b Estimate the residual variance assuming all two-factor interactions (and higher can be ignored. (4 p c Test which effects are significant in the model. (3 p
LÖSNINGSFÖRSLAG I SF2950 TILLÄMPAD MATEMATISK STATISTK 2010 03 17 Uppgift 1 a Använd Wilcoxon tvåsampeltest, ty två oberoende stickprov. Ordna observationerna i storleksordning, de från första stickprovet understrukna: Observation 2,6 4,3 5,5 6,1 6,2 6,2 8,1 8,2 8,5 9,0 9,5 9,8 Rang 1 2 3 4 5,5 5,5 7 8 9 10 11 12 Observation 10,3 11,0 12,3 12,4 13,8 14,6 17,1 17,3 17,5 18,3 18,4 20,5 Rang 13 14 15 16 17 18 19 20 21 22 23 24 Om T 1 är rangsumman av det första stickprovet för vi dess observerade värde till T 1obs = 1+2+3+5,5+ +21+22 = 126. Stickprovsstorlekarna är n 1 = n 2 = 12 varför vi inte kan använda tabell utan utnyttjar normalfördelningsapproximationen. Väntevärdet för T 1 är 12 (12+12+1 = 150 och variansen 12 12(12+12+1 = 300 varför T 2 12 1 är N(150,300. Sätt T = T 1 150. Kritiskt område blir då (tvåsidigt test T > λ 0.025 = 1.96. 300 T obs = 24 300 = 1.39 < 1.96. Det observerade värdet är alltså inte signifikant, dvs ingen signifikant skillnad mellan legeringarna. b Enligt formelsamling är (sedvanliga beteckningar vilket skattas med V(m = k i=1 c 2 i S 2 i n i (1 n i N i s 2 = ( 200 2 0.5( 30 1 30 200 + (250 2 0.8( 50 1 50 250 + (220 2 1.4( 40 1 40 220 + (170 2 2.1( 50 1 50 170 = 0.005115 Medelvärdet skattas med m = 200 250 220 170 0.6+ 0.8+ 1.4+ = 1.5512 ˆ2.1 Härav fås att ett approximativt 95% konfidensintervall för m ges av m ± λ 0.025 s dvs 1.5512 ± 0.1402 a Ur variansanalystabellerna fås Uppgift 2 ˆσ 2 = 21.8/8 ˆσ = 1.65 respektive ˆσ 2 = 18.0/5 ˆσ = 1.90 b Kv.s.(icke-linearitet = 21.8 18.0 = 3.8, med 8 5 = 3 fr.gr. Härav testets F-kvot (3.8/3/(18.0/5 < 1,
forts tentamen i SF2950 FD 5B1550 10 03 17 2 dvs ingen indikation på icke-linearitet. c Simultana konfidensintervall för regressionslinjens uttryck, med uppgifter hämtade ur den övre variansanalystabellen. Notera att N = 10, x = 20 och (x i x 2 = 2(16+4+0+4+ 16 = 80: ( 1 17.00 1.45(x 20± ˆσ 2F 0.05 (2,8 10 + (x 202. 80 Uppgift 3 a Tvåsidig variansanalys med systematiska komponenter och en observation per cell. Inget samspel mellan lösningsmedel och ämnen förutsätts: Y ij = α i + β j + ε ij då ε ij N(0,σ 2 och oberoende. Hjälpsummorna ger variansanalystabellen Variation Frgr Kvs Mkvs Testkvot Mellan Lösn.medel (rader 3 78.729 26.243 81.234 Mellan ämnen (kolumner 2 63.315 31.658 97.994 Residual 6 1.938 0.323 Totalt 11 143.983 Testkvoten 97.994ska jämförasmed F-värdetF 0.05 (2,6 = 5.14.Hypotesen likvärdiga ämnen förkastas. b Effekten av ämne A, β 1 skattas med Ȳ.1 Ȳ.. och effekten av ämne B, β 2 skattas med Ȳ.2 Ȳ... Skillnaden β 1 β 2 skattas alltså med Ȳ.1 Ȳ.2 som är N(β 1 β 2,σ 2 /4+σ 2 /4 = N(β 1 β 2,σ 2 /2. Som vanligt skattas σ 2 med medelkvadratsumman för residualer dvs ˆσ 2 = 0.323. Man erhåller konfidensintervallet I β1 β 2 = ȳ.1 ȳ.2 ±t 0.025 (6ˆσ/ 2 = 27.025 26.35±2.45 0.568/ 2 = 0.675±0.985 Det är ingen signifikant skillnad mellan ämnena A och B eftersom 0 tillhör intervallet. c För att testa att samspelet är försumbart kan man använda kvadratsumman för residualer i variansanalystabellen ovan. Om samspelseffekten är 0 är Kvs(residual/σ 2 χ 2 - fördelad med 6 frihetsgrader. Hypotesen att det inte är något samspel ska alltså förkastas om Kvs(residual/σ 2 > χ 2 0.05(6 = 12.6. Med σ 2 = 0.12 blir Kvs(residual/σ 2 = 1.938/0.12 = 16.15. Samspelet är alltså signifikant på nivån 5%. Uppgift 4 Hierarkisk modell med varianskomponenter mellan lab och mellan prov inom lab, eller med andra ord, ensidig indelning, typ II. I formler: Y ij = µ+δ i +ε ij, parametrarµ,σ 2 lab (σ2 δ ochσ2 prov (σ 2 ε. DessANOVAbörskrivas omgenomattslå sammande två sista raderna och benämna dem error eller inom lab. Här har också MS (medelkv.s och E(MS tagits med: Source DF SS MS E(MS Mellan Lab 5 0.8604 0.1721 σprov 2 +2σlab 2 Prov inom lab 6 0.2465 0.0411 σprov 2 Totalt 11 1.1069
forts tentamen i SF2950 FD 5B1550 10 03 17 3 Identifiering av medelkvadratsummorna med sina väntevärden ger varianskomponentskattningarna σprov 2 = 0.0411 och σ2 lab = (0.1721 0.0411/2 = 0.0655. Slutligen, MK-skattningen av µ är ȳ.. = 2.22 Uppgift 5 a Kopplingarna är I = ABC, A = BC, B = AC, C = AB, D = ABCD, AD = BCD, BD = ACD, och CD = ABD b Man får som skattning av σ 2 : ˆσ 2 = 8 3 (ÂD2 + BD 2 +ÂBD 2 = 181.33 c Effektskattningar som till beloppet är större än t 0.025 (3 ˆσ 2 /8 = 3.18 181.33/8 = 15.14 är signifikanta på nivån 5%. I det här fallet A, B och D.