Kurskod: TAMS28 MATEMATISK STATISTIK Provkod: TEN1 16 January 2015, 8:00-12:00. English Version

Kurskod: TAMS28 MATEMATISK STATISTIK Provkod: TEN 6 January 205, 8:00-2:00 Examiner: Xiangfeng Yang (Tel: 070 2234765). Please answer in ENGLISH if you can. a. You are allowed to use a calculator, the formula and table collection edited by MAI. b. Scores rating: 8- points giving rate 3;.5-4.5 points giving rate 4; 5-8 points giving rate 5. English Version (3 points) The random variable X is discrete and X 0 f(x) 0.3 c 0. (.). (p) Find the value of c =? (.2). (p) Find the mean µ = E(X) of the random variable X. (.3). (p) Find the variance σ 2 = V (X) of the random variable X. Solution. (.). Since the sum of the probabilities is equal to, we have 0.3 + c + 0. =. Thus c = 0.6. (.2). The mean µ = E(X) = 0.3 + 0 0.6 + 0. = 0.2. (.3). We can first find E(X 2 ) = ( ) 2 0.3 + 0 2 0.6 + 2 0. = 0.4. So the variance is σ 2 = V (X) = E(X 2 ) µ 2 = 0.4 ( 0.2) 2 = 0.36. 2 (3 points) Times (in minutes) to pick up patient records in a hospital archives are independent and Re(3.7), i.e., have uniform distribution on the interval (3, 7). The archives are managed by a person. One morning there are orders of 96 patient records. Calculate the probability that the person can finish picking up all these records during normal working hours plus a maximum 5-minute overtime, i.e. eight hours and 5 minutes. (Hint: use the central limit theorem). Solution. For a uniform distribution X Re(3, 7), we know the mean µ = E(X) = (3 + 7)/2 = 5, and the variance is σ 2 = V (X) = (7 3) 2 /2 = 6/2 =.333. Now we let {X,..., X 96 } be the times spent on picking up these 96 records. In order to make sure that all 96 records can be picked up during eight hours and 5 minutes, it is required that X +... + X 96 8 60 + 5. Therefore P (X +... + X 96 8 60 + 5) = P ( X +... + X 96 96 = P ( X µ σ/ n 5.56 µ σ/ n = P (N(0, ).324) = 0.9082. 495 96 ) = P ( X 5.56) ) P (N(0, ) 5.56 5.333/ 96 ) (remark: if only two (or even one) decimals are taken into account for σ 2 and 495 96, then you probably can t get the value.324. But at least you should show that you fully understand the problem and follow a correct idea.) Page /4

3 (3 points) The following data set represents a sample from a continuous distribution with a probability density function f(x) = θx θ, 0 x, where θ > 0 is unknown. In the sample we have the observations: 0.6, 0., 0.2, 0.8. (3.). (p) Find a point estimate ˆθ MM of θ using Method of Moments. (3.2). (2p) Find a point estimate ˆθ ML of θ using Maximum-Likelihood method. Solution. (3.). For Method of Moments, the first equation is E(X) = x. The mean E(X) can be calculated as E(X) = By solving E(X) = x, we have θ = xf(x)dx = 0 x θx θ dx = 0 θx θ dx = θ θ +. x x which yields ˆθ MM = x 0.6+0.+0.2+0.8 x. From the data, x = 4 =.7/4, thus ˆθ MM =.7/4.7/4 = 0.74. (3.2). For the Maximum-Likelihood method, we write the likelihood function as L(θ) = f(x ) f(x 2 )... f(x n ) = θx θ θx θ 2... θx θ n = θ n (x... x n ) θ. Maximizing L(θ) is equivalent to maximize ln L(θ) where ln L(θ) = n ln(θ) + (θ ) ln(x... x n ). By d ln L(θ) dθ = 0, we have n θ + ln(x... x n ) = 0, therefore ˆθ n ML = ln(x...x n). From the data we have ˆθ ML = (The second derivative d2 ln L(θ) dθ 2 n ln(x... x n ) = 4 ln(0.6 0. 0.2 0.8) = 4 4.646 = 0.86. < 0 which yields that ˆθ ML is indeed a maximal point) 4 (3 points) A certain proportion of the antibiotics that are injected into the blood is bound to serum proteins. This phenomenon directly affects the effectiveness of the medication, because the absorption of the specimen decreases. In the table below there are specified proportions (unit: percent) for two common antibiotics that are bound to the experimental animal serum. Preparat Measured values x i s i Penicillin G 29.6 24.3 28.5 32.0 28.60 3.22 Erythromycin 2.6 7.4 8.3 9.0 9.08.8 Model: We have two independent samples from N(µ i, σ), namely, Penicillin G from N(µ, σ), and Erythromycin from N(µ 2, σ). (4.). (p) Construct a (two-sided) 95% confidence interval for µ µ 2. (4.2). (p) Construct a (two-sided) 95% confidence interval for σ. (4.3). (p) Test the hypotheses with a significance level α = 5%: H 0 : µ = µ 2 versus H : µ > µ 2 Solution. (4.). A (two-sided) 95% confidence interval for µ µ 2 is I µ µ 2 = ( x x 2 ) ± t α/2 (n + n 2 2) s n + n 2, Page 2/4

where ( x x 2 ) = 28.6 9.08 = 9.52; t α/2 (n + n 2 2) = t 0.025 (4 + 4 2) = 2.45; s 2 = (n )s 2 + (n 2 )s 2 2 n + n 2 2 + = n n 2 4 + 4 = 0.707. = (4 )3.222 + (4 ).8 2 4 + 4 2 = 6.82225, so s = 2.69; Thus, (4.2). A (two-sided) 95% confidence interval for σ 2 is I µ µ 2 = 9.52 4.525 = (4.995, 4.045). I σ 2 = ( (n + n 2 2)s 2 χ 2 α/2 (n + n 2 2), (n + n 2 2)s 2 + 4 2)6.82225 χ 2 α/2 (n ) = ((4 + n 2 2) χ 2 0.025 (4 + 4 2), (4 + 4 2)6.82225 ) (4 + 4 2) = ( (4 + 4 2)6.82225, 4.46 Therefore a (two-sided) 95% confidence interval for σ is (4 + 4 2)6.82225 ) = (2.83, 33.0)..24 χ 2 0.975 I σ = ( 2.83, 33.0) = (.683, 5.746). (4.3). The test statistic is T S = ( x x 2 ) 0 = 5.55. s n + n 2 The rejection region is C = (t α (n + n 2 2), ) = (.94, ). Since T S C, we reject H 0 (namely we believe that µ > µ 2. This coincides with the confidence interval I µ µ 2 ) 5 (3 points) In an area there are a lot of flowers which are white, red or pink. We randomly pick up 00 flowers and get flowers frequency N i white 20 red 24 pink 56 Use χ 2 -test to test the following hypothesis with a significance level α = 0.05 H 0 : P (white flower) = /4; P (red flower) = /4; P (pink flower) = /2. Solution. The test statistic is T S = 3 (N i np i ) 2 =.76, np i i= where N = 20, N 2 = 24 and N 3 = 56, np = 00 /4 = 25, np 2 = 00 /4 = 25, and np 3 = 00 /2 = 50. The rejection region C = (χ 2 α(k #unknown parameters), ) = (χ 2 0.05(3 0), ) = (5.99, ). Since T S / C, we do NOT reject H 0 (namely we believe the hypothesis). Page 3/4

6 (3 points) In a study of the profitability of movie companies, 20 Hollywood films were selected randomly and for each film we observed values on y = gross revenue (unit: million dollar), x = production costs (unit: million dollar), x 2 = marketing costs (unit: million dollar), {, for a film based on a book, x 3 = 0, others. There is a data which has been analyzed according to the model Y = β 0 + β x + β 2 x 2 + β 3 x 3 + ε, where ε is assumed to be N(0, σ). Analyses from Minitab are: Regression Analysis: y versus x, x2, x3 The regression equation is y = 7.84 + 2.85 x + 2.28 x2 + 7.7 x3 Predictor Coef SE Coef Constant 7.836 2.333 x 2.8477 0.3923 x2 2.2782 0.2534 x3 7.66.88 S = 3.690 R-Sq = 96.7% Analysis of Variance Source DF SS MS Regression 3 6325. 208.4 Residual Error 6 27.8 3.6 Total 9 SST=? (6.). (p) Estimate σ. (6.2). (p) What is SS T =? (6.3). (p) Is the term x 3 useful as an explanatory variable in the model? Explain your answer using an appropriate 95% confidence interval or test. Solution. (6.). σ s = 3.69. (6.2). SS T = SS R + SS E = 6325. + 27.8 = 6542.9. (6.3). The first method is is use a confidence interval of β 3 as follows I β3 = ˆβ 3 t α/2 (n k ) se( ˆβ 3 ) = 7.66 2.2.88 = 7.66 3.854 = (3.32,.02). Since 0 / I β3, we believe that β 3 0. Thus x 3 is useful. The second method is to test hypotheses The test statistic and the rejection region are H 0 : β 3 = 0 against H : β 3 0. T S = ˆβ 3 0 se( ˆβ 3 ) = 7.66.88 = 3.942, C = (, t α/2(n k )) (t α/2 (n k ), ) = (, 2.2) (2.2, ). Since T S C, we reject H 0, which suggests that β 3 0. Thus x 3 is useful. Page 4/4

Kurskod: TAMS28 MATEMATISK STATISTIK Provkod: TEN 6 januari 205, kl. 8-2 Examinator: Xiangfeng Yang (Tel: 070 2234765). Vänligen svara på ENGELSKA om du kan. a. Tillåtna hjälpmedel är en räknare, formel -och tabellsamling utgiven av MAI. b. Betygsgränser: 8- poäng ger betyg 3;.5-4.5 poäng ger betyg 4; 5-8 poäng ger betyg 5. Svensk version (3 poäng) Den stokastiska variabeln X är diskret och X 0 f(x) 0.3 c 0. (.). (p) Beräkna värdet på c =? (.2). (p) Beräkna väntevärdet µ = E(X) för den stokastiska variabeln X. (.3). (p) Beräkna variansen σ 2 = V (X) för den stokastiska variabeln X. 2 (3 poäng) Tiderna i minuter för att plocka fram patientjournaler i ett sjukhusarkiv är oberoende och Re(3, 7), dvs har likformig fördelning över intervallet (3, 7). Arkivet sköts av en person. En morgon finns det beställningar av 96 patientjournaler. Beräkna sannolikheten att det går att hinna med detta under normal arbetstid plus högst 5 minuters övertid, dvs åtta timmar och 5 minuter. (Ledning: använd centrala gränsvärdessatsen). 3 (3 poäng) Följande datamaterial utgör ett stickprov från en kontinuerlig fördelning med täthetsfunktionen f(x) = θx θ, 0 x, där θ > 0 är okänd. I stickprovet har man observerade värden: 0.6, 0., 0.2, 0.8. (3.). (p) Hitta en punktskattning ˆθ MM av θ genom att använda momentmetoden. (3.2). (2p) Hitta en punktskattning ˆθ ML av θ genom att använda Maximum Likelihood-metoden. 4 (3 poäng) En viss andel av antibiotika som injiceras i blodet binds till serumproteiner. Detta fenomen påverkar direkt effektiviteten i medicineringen, eftersom upptagningen av preparatet minskar. I tabellen nedan anges för två vanliga antibiotikapreparat hur stor andel (enhet: procent) som binds vid försök med djurserum. Preparat Uppmätta värden x i s i Penicillin G 29.6 24.3 28.5 32.0 28.60 3.22 Erythromycin 2.6 7.4 8.3 9.0 9.08.8 Modell: Vi har två oberoende stickprov från N(µ i, σ), dvs, Penicillin G från N(µ, σ), och Erythromycin från N(µ 2, σ). (4.). (p) Konstruera ett (tvåsidiga) 95% konfidensintervall för µ µ 2. (4.2). (p) Konstruera ett (tvåsidiga) 95% konfidensintervall för σ. (4.3). (p) Pröva på nivån α = 5%: H 0 : µ = µ 2 mot H : µ > µ 2 Page /2

5 (3 poäng) I ett omr de finns det en hel del blommor som är vita, röda eller rosa. Vi hämtar slumpmässigt upp 00 blommor och få blommor frekvens N i vita 20 röda 24 rosa 56 Pröva med ett χ 2 -test på nivån α = 0.05 hypotesen H 0 : P (vit blomma) = /4; P (röd blomma) = /4; P (rosa blomma) = /2. 6 (3 poäng) I en studie av lönsamheten för filmbolag har man valt ut 20 hollywoodfilmer slumpmässigt och för varje film tagit fram observerade värden på Det finns en data som har analyserats enligt modellen där ε antas vara N(0, σ). Analyserna från Minitab är: y = bruttointäkt (enhet: miljoner dollar), x = produktionskostnad (enhet: miljoner dollar), x 2 = marknadsföringskostnad (enhet: miljoner dollar), {, för film baserad på en bok, x 3 = 0, annars. Regression Analysis: y versus x, x2, x3 The regression equation is y = 7.84 + 2.85 x + 2.28 x2 + 7.7 x3 Predictor Coef SE Coef Constant 7.836 2.333 x 2.8477 0.3923 x2 2.2782 0.2534 x3 7.66.88 S = 3.690 R-Sq = 96.7% Analysis of Variance Y = β 0 + β x + β 2 x 2 + β 3 x 3 + ε, Source DF SS MS Regression 3 6325. 208.4 Residual Error 6 27.8 3.6 Total 9 SST=? (6.). (p) Skatta σ. (6.2). (p) Vad är SS T =? (6.3). (p) Gör x 3 nytta som förklaringsvariabel i modellen? Motivera ditt svar med hjälp av ett lämpligt 95% konfidensintervall eller test. Page 2/2