Kurskod: TAMS28 MATEMATISK STATISTIK Provkod: TEN1 20 August 2014, English Version

Kurskod: TAMS28 MATEMATISK STATISTIK Provkod: TEN1 20 August 2014, 14-18 Examinator/Examiner: Xiangfeng Yang (Tel: 070 2234765) a. You are permitted to bring a calculator, the formula and table collection edited by MAI. Please answer in ENGLISH if you can. b. Scores rating: 8-11 points giving rate 3; 11.5-14.5 points giving rate 4; 15-18 points giving rate 5. 1 (3 points) English Version The lifetime (in hours) of a certain type of radio tubes is assumed to be a continuous random variable X with a density function { 100/x 2, if x > 100, f X (x) = 0, otherwise. (1.1). (1p) Find the probability that a tube works less than 200 hours. (1.2). (1p) Find the probability that a tube still works after 150 hours. (1.3). (1p) Find the probability that a tube works less than 200 hours, given that this tube still works after 150 hours. Solution. (1.1) (1.2) (1.3) P (X < 200) = P (X > 150) = P (X < 200 X > 150) = f X (x)dx = f X (x)dx = P (150 < X < 200) P (X > 150) 200 = 100 150 100/x 2 dx = 1/2. 100/x 2 dx = 2/3. 200 150 100/x2 dx = 1/6 2/3 2/3 = 1 4 = 0.25. 2 (3 points) A large freight elevator can transport a maximum of 9800 kg. Suppose a load of cargo containing 49 boxes must be transported via the elevator. If we know that the weight of a box of this type follows a distribution with mean µ = 205 kg and standard deviation σ = 15 kg. Based on this information, what is the probability that all 49 boxes can be safely loaded onto the elevator and transported? Solution. Let {X 1,..., X 49 } be the weights of these 49 boxes. In order to make sure that all 49 boxes can be safely loaded onto the elevator and transported, it is required that X 1 +... + X 49 9800. Therefore we aim to find the probability P (X 1 +... + X 49 9800). From CLT, it follows P (X 1 +... + X 49 9800) = P ( X 1 +... + X 49 49 = P ( X µ σ/ n 200 µ σ/ n = P (N(0, 1) 2.333) = 0.0099. 9800 49 ) = P ( X 200) ) P (N(0, 1) 200 205 15/ 49 ) Page 1/4

3 (3 points) Suppose that the distribution of a population X has the probability mass function as follows X 0 1 p(x) 1 p p where p is unknown. We have a sample from this distribution with the following observations: 1 0 0 1 1 1 (3.1). (1p) Find a point estimate ˆp MM of p using Method of Moments. (3.2). (2p) Find a point estimate ˆp ML of p using Maximum-Likelihood method. (Hint: P (X = x) = p x (1 p) 1 x ) Solution. (3.1). For Method of Moments, the first equation is E(X) = X. The mean E(X) can be calculated as E(X) = 0 (1 p) + 1 p = p. By solving E(X) = X, we have p = X which yields ˆp MM = X. From the data, x = 1+0+0+1+1+1 6 = 2/3, thus ˆp MM = 2 3. (3.2). For the Maximum-Likelihood method, we write the likelihood function as Maximizing L(p) is equivalent to maximize ln L(p) where By d ln L(p) dp = 0, we have Xi p (The second derivative d2 ln L(p) dp 2 L(p) = f(x 1 ) f(x 2 )... f(x n ) = p X i (1 p) (1 X i). ln L(p) = X i ln p + (1 X i ) ln(1 p). (1 Xi) 1 p = 0, therefore ˆp ML = Xi n = X. From the data ˆp ML = 2 3. < 0 which yields that ˆp ML is indeed a maximal point) 4 (3 points) A certain proportion of the antibiotics that are injected into the blood is bound to serum proteins. This phenomenon directly affects the effectiveness of the medication, because the absorption of the specimen decreases. In the table below there are specified proportions (unit: percent) for two common antibiotics that are bound to the experimental animal serum. Preparat Measured values x i s i Penicillin G 29.6 24.3 28.5 32.0 28.60 3.22 Erythromycin 21.6 17.4 18.3 19.0 19.08 1.81 Model: We have two independent samples from N(µ i, σ 2 ), namely, Penicillin G from N(µ 1, σ 2 ), and Erythromycin from N(µ 2, σ 2 ). (4.1). (1p) Construct a (two-sided) 95% confidence interval for µ 1. (4.2). (1p) Construct a (two-sided) 95% confidence interval for µ 2. (4.3). (1p) Compare parameters µ 1 and µ 2 by constructing an appropriate 99% confidence interval. Solution. (4.1). Since population variance σ 2 is unknown, the confidence interval of µ 1 is x 1 ± t α/2 (n 1 1) s 1 n1 = 28.60 ± 3.18 3.22 4 = 28.60 ± 5.1198 = (23.4802, 33.7198). (4.2). Since population variance σ 2 is unknown, the confidence interval of µ 2 is x 2 ± t α/2 (n 2 1) s 2 n2 = 19.08 ± 3.18 1.81 4 = 19.08 ± 2.8779 = (16.2021, 21.9579). (4.3). We compare µ 1 and µ 2 by constructing a 99% confidence interval of µ 1 µ 2, that is 1 ( x 1 x 2 ) ± t α/2 (n 1 + n 2 2) s + 1 1 = (28.60 19.08) ± 3.71 2.6119 n 1 n 2 4 + 1 = 9.52 ± 6.852 = (2.668, 16.372). 4 where s 2 = (n1 1)s2 1 +(n2 1)s2 2 4 1+4 1 = 6.82225 and s = s 2 = 6.82225 = 2.6119. From this confidence interval we can say that µ 1 > µ 2 since both 2.668 and 16.372 are > 0. n 1 1+n 2 1 = 3 3.222 +3 1.81 2 Page 2/4

5 (3 points) The minimal daily demand on zinc of a male person over 30 years of age is 15 mg. A scientist conjectures that the expected value is lower and wants to conduct a study in order to show that. Assume that the scientist measures the zinc intake of 25 randomly selected male person over 30 years of age and uses these data in order to test the hypotheses H 0 : µ = 15 versus H 1 : µ < 15. Assume that the observations are independent and from a population N(µ, σ 2 ). The sample mean is x = 13 and the sample standard deviation is s = 6. (5.1). (1.5p) If σ is unknown, do you reject H 0 given a significance level α = 0.01? and why? (5.2). (1.5p) If σ is known σ = 4, do you reject H 0 given a significance level α = 0.01? and why? Solution. (5.1) Since σ is unknown, the rejection region is (, t α (n 1)) = (, t 0.01 (25 1)) = (, 2.49). The test statistic is x µ0 s/ n = 13 15 6/ = 1.6667. Because the test statistic is NOT in the rejection region, we do NOT 25 reject H 0. (5.2) Since σ is known σ = 4, the rejection region is (, z α ) = (, z 0.01 ) = (, 2.325). The test statistic is x µ0 σ/ n = 13 15 4/ 25 = 2.5. Because the test statistic is in the rejection region, we reject H 0. 6 (3 points) In a scientific paper measurements of the thermal conductivity of polymer melts under Short-hot-wire method were reported. The measurements are thermal conductivity y and temperature x (unit: 1000 o C), and data are analyzed according to the models Model 1: Y = β 0 + β 1x + ε Model 2: Y = β 0 + β 1 x + β 2 x 2 + ε where ε and ε are assumed to be Normal distributions. Analyses from Minitab are Modell 1. Regression Analysis: y versus x y = 0.254-0.0451 x Constant 0.253770 0.006334 40.07 0.000 x -0.04510 0.03847-1.17 0.261 S = 0.0108651 R-Sq = 8.9% R-Sq(adj) = 2.4% Regression 1 0.0001622 0.0001622 1.37 0.261 Residual Error 14 0.0016527 0.0001180 Total 15 0.0018149 Residuals from y vs x Modell 2. Regression Analysis: y versus x, x^2 y = 0.221 + 0.553 x - 2.08 x^2 Constant 0.221269 0.003085 71.72 0.000 x 0.55278 0.04768 11.59 0.000 Page 3/4

x^2-2.0814 0.1617-12.87 0.000 S = 0.00304108 R-Sq = 93.4% R-Sq(adj) = 92.4% Regression 2 0.00169471 0.00084736 91.62 0.000 Residual Error 13 0.00012023 0.00000925 Total 15 0.00181494 Source DF Seq SS x 1 0.00016224 x^2 1 0.00153247 (6.1). (1p) How does the analysis indicate that Model 1 works very poorly? Explain your answer using an appropriate numerical value from the analysis. (6.2). (2p) Is the term x 2 useful as an explanatory variable i Modell 2? Explain your answer using an appropriate 95% confidence interval or test. Solution. (6.1). We see that in Model 1 the R-Sq = 8.9% which is too low. R-Sq describes the proportion of variation due to x. A low R-Sq means that x explains y little. So Model 1 works very poorly. (6.2). Yes, it is. We can see this by constructing a (two-sided) 95% confidence interval of the coefficient β 2 of x 2, which is ˆβ 2 ± t 0.025 (16 2 1) se(β 2 ) = 2.0814 ± 2.16 0.1617 = 2.0814 ± 0.346 = ( 2.4274, 1.735). Since 0 is not in this confidence interval, we believe β 2 0. Therefore x 2 is useful as an explanatory variable. Page 4/4

Kurskod: TAMS28 MATEMATISK STATISTIK Provkod: TEN1 20 augusti 2014, kl. 14-18 Examinator/Examiner: Xiangfeng Yang (Tel: 070 2234765) a. Tillåtna hjälpmedel är en räknare, formel -och tabellsamling utgiven av MAI. Vänligen svara på ENGELSKA om du kan. b. Betygsgränser: 8-11 poäng ger betyg 3; 11.5-14.5 poäng ger betyg 4; 15-18 poäng ger betyg 5. 1 (3 poäng) Svensk version Livslägden (i timmar) hos en viss typ av radiorör antas vara en kontinuerlig stokastisk variabel X med täthetsfunktionen { 100/x 2, om x > 100, f X (x) = 0, annars. (1.1). (1p) Bestäm sannolikheten för att ett sådant rör fungerar mindre än 200 timmar. (1.2). (1p) Bestäm sannolikheten för att ett sådant rör fortfarande fungerar efter 150 timmar. (1.3). (1p) Bestäm sannolikheten för att ett sådant rör fungerar mindre än 200 timmar, givet att röret fortfarande fungerar efter 150 timmar. 2 (3 poäng) En stor varuhiss kan transportera högst 9800 kg. Antag att en last med 49 lådor måste transporteras via hissen. Om vi vet vikten av en låda av denna typ är en fördelning med väntevärdet µ = 205 kg och standardavvikelsen σ = 15 kg. Baserat på denna information, vad är sannolikheten att alla 49 lådor säkert kan lastas på hissen och transporteras? 3 (3 poäng) Antag att fördelningen för en population X har sannolikhetsfunktionen enligt följande X 0 1 p(x) 1 p p där p är okänd. Vi har ett stickprov från denna fördelning med observerade värden: 1 0 0 1 1 1 (3.1). (1p) Hitta en punktskattning ˆp MM av p genom att använda momentmetoden. (3.2). (2p) Hitta en punktskattning ˆp ML av p genom att använda Maximum Likelihood-metoden. (Ledning: P (X = x) = p x (1 p) 1 x ) 4 (3 poäng) En viss andel av antibiotika som injiceras i blodet binds till serumproteiner. Detta fenomen påverkar direkt effektiviteten i medicineringen, eftersom upptagningen av preparatet minskar. I tabellen nedan anges för två vanliga antibiotikapreparat hur stor andel (enhet: procent) som binds vid försök med djurserum. Preparat Uppmätta värden x i s i Penicillin G 29.6 24.3 28.5 32.0 28.60 3.22 Erythromycin 21.6 17.4 18.3 19.0 19.08 1.81 Page 1/3

Modell: Vi har två oberoende stickprov från N(µ i, σ 2 ), dvs, Penicillin G från N(µ 1, σ 2 ), och Erythromycin från N(µ 2, σ 2 ). (4.1). (1p) Konstruera ett (tvåsidiga) 95% konfidensintervall för µ 1. (4.2). (1p) Konstruera ett (tvåsidiga) 95% konfidensintervall för µ 2. (4.3). (1p) Jämför parametrarna µ 1 och µ 2 genom att beräkna ett lämpligt 99% konfidensintervall. 5 (3 poäng) Minsta dagliga behov av zink är 15 mg för män över 30 år. I själva verket misstänker man att det förväntade värdet är lägre och man will genomföra en studie för att påvisa detta. Antag att man mäter zinkintaget för 25 slumpmässigt utvalda män över 30 år och använder data för att testa hypoteserna H 0 : µ = 15 versus H 1 : µ < 15. Antag att observationerna är oberoende och från en population N(µ, σ 2 ). Stickprovsmedelvärdet är x = 13 och stickprovsstandardavvikelsen är s = 6. (5.1). (1.5p) Om σ är okänd, förkastar du H 0 givet en signifikansnivån α = 0.01? Varför? (5.2). (1.5p) Om σ är känd σ = 4, förkastar du H 0 givet en signifikansnivån α = 0.01? Varför? 6 (3 poäng) I en vetenskaplig artikel redovisas mätresultat på värmeledningsförmågan för polymersmältor enligt short-hot-wire -metoden. Man har fått värden på värmeledning y och temperatur x (enhet: 1000 o C), och data har analyserats enligt modellerna Modell 1: Y = β 0 + β 1x + ε Modell 2: Y = β 0 + β 1 x + β 2 x 2 + ε där ε och ε antas vara normalfördelade. Analyserna från Minitab är Modell 1. Regression Analysis: y versus x y = 0.254-0.0451 x Constant 0.253770 0.006334 40.07 0.000 x -0.04510 0.03847-1.17 0.261 S = 0.0108651 R-Sq = 8.9% R-Sq(adj) = 2.4% Regression 1 0.0001622 0.0001622 1.37 0.261 Residual Error 14 0.0016527 0.0001180 Total 15 0.0018149 Residuals from y vs x Page 2/3

Modell 2. Regression Analysis: y versus x, x^2 y = 0.221 + 0.553 x - 2.08 x^2 Constant 0.221269 0.003085 71.72 0.000 x 0.55278 0.04768 11.59 0.000 x^2-2.0814 0.1617-12.87 0.000 S = 0.00304108 R-Sq = 93.4% R-Sq(adj) = 92.4% Regression 2 0.00169471 0.00084736 91.62 0.000 Residual Error 13 0.00012023 0.00000925 Total 15 0.00181494 Source DF Seq SS x 1 0.00016224 x^2 1 0.00153247 (6.1). (1p) Hur framgår det av analysen att Modell 1 fungerar väldigt dåligt? Motivera ditt svar med hjälp av ett lämpligt siffervärde ur analysen. (6.2). (2p) Gör x 2 nytta som förklaringsvariabel i Modell 2? Motivera ditt svar med hjälp av ett lämpligt 95% konfidensintervall eller test. Page 3/3