Business Statistics - D A V I D F. G R O E B N E R & P A T R I C K W . S H A N N O N & P H I L L I P C . F R Y & K E N T D . S M I T H , 2011 Part2

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (14.15 MB, 439 trang )

(1)• Review the computational methods for the. • Review the basics of hypothesis testing discussed in Section 9.1.. chapter 12. Chapter 12 Quick Prep Links. • Re-examine the material on hypothesis testing for the difference between two population variances in Section 11.2.. sample mean and the sample variance in Chapter 3.. Analysis of Variance 12.1 One-Way Analysis of Variance (pg. 476–497). Outcome 1. Understand the basic logic of analysis of variance. Outcome 2. Perform a hypothesis test for a single-factor design using analysis of variance manually and with the aid of Excel or Minitab software.. 12.2 Randomized Complete Block Analysis of Variance (pg. 497–509). 12.3 Two-Factor Analysis of Variance with Replication (pg. 509–520). Outcome 3. Conduct and interpret post-analysis of variance pairwise comparisons procedures. Outcome 4. Recognize when randomized block analysis of variance is useful and be able to perform analysis of variance on a randomized block design. Outcome 5. Perform analysis of variance on a two-factor design of experiments with replications using Excel or Minitab and interpret the output.. Why you need to know Chapters 9 through 11 introduced hypothesis testing. By now you should understand that regardless of the population parameter in question, hypothesis-testing steps are basically the same: 1. 2. 3. 4. 5.. Specify the population parameter of interest. Formulate the null and alternative hypotheses. Specify the level of significance. Determine a decision rule defining the rejection and “acceptance” regions. Select a random sample of data from the population(s). Compute the appropriate sample statistic(s). Finally, calculate the test statistic. 6. Reach a decision. Reject the null hypothesis, H0, if the sample statistic falls in the rejection region; otherwise, do not reject the null hypothesis. If the test is conducted using the p-value approach, H0 is rejected whenever the p-value is smaller than the significance level; otherwise, H0 is not rejected. 7. Draw a conclusion. State the result of your hypothesis test in the context of the exercise or analysis of interest. Chapter 9 focused on hypothesis tests involving a single population. Chapters 10 and 11 expanded the hypothesis-testing process to include applications in which differences between two populations are involved. However, you will encounter many instances involving more than two populations. For example, the vice president of operations at Farber Rubber, Inc., oversees production at Farber’s six different U.S. manufacturing plants. Because each plant uses slightly different manufacturing processes, the vice president needs to know if there are any differences in average strength of the products produced at the different plants. Similarly, Golf Digest, a major publisher of articles about golf, might wish to determine which of five major brands of golf balls has the highest mean distance off the tee. The Environmental Protection Agency (EPA) might conduct a test to determine if there is a difference in the average miles-per-gallon performance. 475.

(2) 476. CHAPTER 12. |. Analysis of Variance of cars manufactured by the Big Three U.S. automobile producers. In each of these cases, testing a hypothesis involving more than two population means could be required. This chapter introduces a tool called analysis of variance (ANOVA), which can be used to test whether there are differences among three or more population means. There are several ANOVA procedures, depending on the type of test being conducted. Our aim in this chapter is to introduce you to ANOVA and to illustrate how to use Microsoft Excel and Minitab to help conduct hypothesis tests involving three or more population parameters. You will almost certainly need either to apply ANOVA in future decision-making situations or to interpret the results of an ANOVA study performed by someone else. Thus, you need to be familiar with this powerful statistical technique.. Chapter Outcome 1.. Completely Randomized Design An experiment is completely randomized if it consists of the independent random selection of observations representing each level of one factor.. 12.1 One-Way Analysis of Variance In Chapter 10 we introduced the t-test for testing whether two populations have equal means when the samples from the two populations are independent. However, you will often encounter situations in which you are interested in determining whether three or more populations have equal means. To conduct this test, you will need a new tool called analysis of variance (ANOVA). There are many different analysis of variance designs to fit different situations; the simplest is a completely randomized design. Analyzing a completely randomized design results in a one-way analysis of variance.. One-Way Analysis of Variance An analysis of variance design in which independent samples are obtained from two or more levels of a single factor for the purpose of testing whether the levels have equal means.. Factor A quantity under examination in an experiment as a possible cause of variation in the response variable.. Levels The categories, measurements, or strata of a factor of interest in the current experiment.. Introduction to One-Way ANOVA BUSINESS APPLICATION. APPLYING ONE-WAY ANALYSIS OF VARIANCE. BAYHILL MARKETING COMPANY The Bayhill Marketing Company is a full-service marketing and advertising firm in San Francisco. Although Bayhill provides many different marketing services, one of its most lucrative in recent years has been Web site sales designs. Companies that wish to increase Internet sales have contracted with Bayhill to design effective Web sites. Bayhill executives have learned that certain Web site features are more effective than others. For example, a major greeting card company wants to work with Bayhill on developing a Web-based sales campaign for its “Special Events” card set. The company plans to work with Bayhill designers to come up with a Web site that will maximize sales effectiveness. Sales effectiveness can be determined by the dollar value of the greeting card sets purchased. Through a series of meetings with the client and focus-group sessions with potential customers, Bayhill has developed four Web site design options. Bayhill plans to test the effectiveness of the designs by sending e-mails to a random sample of regular greeting card customers. The sample of potential customers will be divided into four groups of eight customers each. Group 1 will be directed to a Web site with design 1, group 2 to a Web site with design 2, and so forth. The dollar value of the cards ordered are recorded and shown in Table 12.1. In this example, we are interested in whether the different Web site designs result in different mean order sizes. In other words, we are trying to determine if “Web site designs” are one of the possible causes of the variation in the dollar value of the card sets ordered (the response variable). In this case, Web site design is called a factor. The single factor of interest is Web site design. This factor has four categories, measurements, or strata, called levels. These four levels are the four designs: 1, 2, 3, and 4. Because we are using only one factor, each dollar value of card sets ordered is associated with only one level (that is, with Web site design—type 1, 2, 3, or 4), as you can see in Table 12.1. Each level is a population of interest, and the values seen in Table 12.1 are sample values taken from those populations. The null and alternative hypotheses to be tested are H0: m1 m2 m3 m4 (mean order sizes are equal) HA: At least two of the population means are different. Balanced Design An experiment has a balanced design if the factor levels have equal sample sizes.. The appropriate statistical tool for conducting the hypothesis test related to this experimental design is analysis of variance. Because this ANOVA addresses an experiment with only one factor, it is a one-way ANOVA, or a one-factor ANOVA. Because the sample size for each Web site design (level) is the same, the experiment has a balanced design..

(3) CHAPTER 12. TABLE 12.1. |. |. Analysis of Variance. 477. Bayhill Marketing Company Web Site Order Data Web Site Design. Customer. 1. 2. 3. 4. 1 2. $4.10. $6.90. $4.60. $12.50. 5.90. 9.10. 11.40. 7.50. 3. 10.45. 13.00. 6.15. 6.25. 4. 11.55. 7.90. 7.85. 8.75. 5. 5.25. 9.10. 4.30. 11.15. 6. 7.75. 13.40. 8.70. 10.25. 7. 4.78. 7.60. 10.20. 6.40. 8. 6.22. 5.00. 10.80. 9.20 Grand Mean. Mean. x1 $7.00. x 2 $9.00. x3 $8.00. x4 $9.00. Variance. s12 7.341. s22 8.423. s32 7.632. s42 5.016. x $8.25. Note: Data are the dollar value of card sets ordered with each Web site design.. ANOVA tests the null hypothesis that three or more populations have the same mean. The test is based on four assumptions: Assumptions. 1. All populations are normally distributed. 2. The population variances are equal. 3. The observations are independent—that is, the occurrence of any one individual value does not affect the probability that any other observation will occur. 4. The data are interval or ratio level. If the null hypothesis is true, the populations have identical distributions. If so, the sample means for random samples from each population should be close in value. The basic logic of ANOVA is the same as the two-sample t-test introduced in Chapter 10. The null hypothesis should be rejected only if the sample means are substantially different.. Partitioning the Sum of Squares. Total Variation The aggregate dispersion of the individual data values across the various factor levels is called the total variation in the data.. Within-Sample Variation The dispersion that exists among the data values within a particular factor level is called the within-sample variation.. Between-Sample Variation Dispersion among the factor sample means is called the between-sample variation.. To understand the logic of ANOVA, you should note several things about the data in Table 12.1. First, the dollar values of the orders are different throughout the data table. Some values are higher; others are lower. Thus, variation exists across all customer orders. This variation is called the total variation in the data. Next, within any particular Web site design (i.e., factor level), not all customers ordered the same dollar value of greeting card sets. For instance, within level 1, order size ranged from $4.10 to $11.55. Similar differences occur within the other levels. The variation within the factor levels is called the within-sample variation. Finally, the sample means for the four Web site designs are not all equal. Thus, variation exists between the four designs’ averages. This variation between the factor levels is referred to as the between-sample variation. Recall that the sample variance is computed as s2 . ∑(x x )2 n 1. The sample variance is the sum of squared deviations from the sample mean divided by its degrees of freedom. When all the data from all the samples are included, s 2 is the estimator of the total variation. The numerator of this estimator is called the total sum of squares (SST) and can be partitioned into the sum of squares associated with the estimators of the betweensample variation and the within-sample variation, as shown in Equation 12.1..

(4) 478. CHAPTER 12. |. Analysis of Variance. Partitioned Sum of Squares SST SSB SSW. (12.1). where: SST Total sum of squares SSB Sum of squares between SSW Sum of squares within. After separating the sum of squares, SSB and SSW are divided by their respective degrees of freedom to produce two estimates for the overall population variance. If the between-sample variance estimate is large relative to the within-sample estimate, the ANOVA procedure will lead us to reject the null hypothesis and conclude the population means are different. The question is, how can we determine at what point any difference is statistically significant?. The ANOVA Assumptions Chapter Outcome 2.. BUSINESS APPLICATION. UNDERSTANDING THE ANOVA ASSUMPTIONS. BAYHILL MARKETING COMPANY (CONTINUED) Recall that Bayhill is testing whether the four Web site designs generate orders of equal average dollar value. The null and alternative hypotheses are H0: m1 m2 m3 m4 HA: At least two population means are different Before we jump into the ANOVA calculations, recall the four basic assumptions of ANOVA: 1. 2. 3. 4.. All populations are normally distributed. The population variances are equal. The sampled observations are independent. The data’s measurement level is interval or ratio.. Figure 12.1 illustrates the first two assumptions. The populations are normally distributed and the spread (variance) is the same for each population. However, this figure shows the FIGURE 12.1. |. Normal Populations with Equal Variances and Unequal Means. Population 1 x. 1. Population 2 x. 2. Population 3. 3. x. Population 4. 4. x.

(5) CHAPTER 12. FIGURE 12.2. |. Analysis of Variance. 479. |. Normal Populations with Equal Variances and Equal Means. Population 1. x. 1. Population 2. x. 2. Population 3. x. 3. Population 4. x. 4. populations have different means—and therefore the null hypothesis is false. Figure 12.2 illustrates the same assumptions but in a case in which the population means are equal; therefore, the null hypothesis is true. You can do a rough check to determine whether the normality assumption is satisfied by developing graphs of the sample data from each population. Histograms are probably the best graphical tool for checking the normality assumption, but they require a fairly large sample size. The stem and leaf diagram and box and whisker plot are alternatives when sample sizes are smaller. If the graphical tools show plots consistent with a normal distribution, then that evidence suggests the normality assumption is satisfied.1 Figure 12.3 illustrates the box and FIGURE 12.3. |. Box and Whisker Plot for Bayhill Marketing Company. Box and Whisker Plot 14 12 10 8 4. 6 4 2 0. 1. 2. 3. Box and Whisker Plot Five-Number Summary 1 Minimum 4.1 First Quartile 4.78 Median 6.06 Third Quartile 10.45 Maximum 11.55. 2 5.0 6.9 8.5 13.0 13.4. 3 4.3 4.6 8.275 10.8 11.4. 4 6.25 6.4 8.975 11.15 12.5. 1Chapter 13 introduces a goodness-of-fit approach to testing whether sample data come from a normally distributed population..

(6) 480. CHAPTER 12. |. Analysis of Variance. whisker plot for the Bayhill data. Note, when the sample sizes are very small, as they are here, the graphical techniques may not be very effective. In Chapter 11, you learned how to test whether two populations have equal variances using the F-test. To determine whether the second assumption is satisfied, we can hypothesize that all the population variances are equal: H 0: s12 s 22 ⋅ ⋅ ⋅ s k2 HA : Not all variances are equal Because you are now testing a null hypothesis involving more than two population variances, you need an alternative to the F-test introduced in Chapter 11. This alternative method is called Hartley’s Fmax . The Hartley’s F-test statistic is computed as shown in Equation 12.2. Hartley’s F-Test Statistic Fmax . 2 smax 2 smin. (12.2). where: 2 smax Largest sample variance 2 smin Smallest sample variance. We can use the F-value computed using Equation 12.2 to test whether the variances are equal by comparing the calculated F to a critical value from the Hartley’s Fmax distribution, which appears in Appendix I.2 For the Bayhill example, the computed variance for each of the four samples is s12 7.341. s22 8.423. s33 7.632. s42 5.016. Using Equation 12.2, we compute the Fmax value as Fmax . 8.423 1.679 5.016. This value is now compared to the critical value Fa from the table in Appendix I for a 0.05, with k 4 and n 1 7 degrees of freedom. The value k is the number of populations (k 4). The value n is the average sample size, which equals 8 in this example. If n is not an integer value, then set n equal to the integer portion of the computed n. If Fmax Fa, reject the null hypothesis of equal variances. If Fmax Fa, do not reject the null hypothesis and conclude the population variances are equal. From the Hartley’s Fmax distribution table, the critical F0.05 8.44. Because Fmax 1.679 8.44, the null hypothesis of equal variances is not rejected.3 Examining the sample data to see whether the basic assumptions are satisfied is always a good idea, but you should be aware that the analysis of variance procedures discussed in this chapter are robust, in the sense that the analysis of variance test is relatively unperturbed when the equal-variance assumption is not met. This is especially so when all samples are the same size, as in the Bayhill Marketing Company example. Hence, for one-way analysis of variance, or any other ANOVA design, try to have equal sample sizes when possible. Recall, we earlier referred to an analysis of variance design with equal sample sizes as a balanced design. If for some reason you are unable to use a balanced design, the rule of thumb is that the ratio of the largest sample size to the smallest sample size should not exceed 1.5. When the samples are the same size (or meet the 1.5 ratio rule), the analysis of variance is also robust with respect to the assumption that the populations are normally distributed. So, in brief, the one-way ANOVA for independent samples can be applied to virtually any set of interval- or ratio-level data.. 2Other tests for equal variances exist. For example, Minitab has a procedure that uses Bartlett’s and Levine’s test. 3Hartley’s F max test is very dependent on the populations being normally distributed and should not be used if the populations’ distributions are skewed. Note also in Hartley’s Fmax table, c k and v n 1 ..

(7) CHAPTER 12. |. 481. Analysis of Variance. Finally, if the data are not interval or ratio level, or if they do not satisfy the normal distribution assumption, Chapter 17 introduces an ANOVA procedure called the Kruskal-Wallis One-Way ANOVA, which does not require these assumptions.. Applying One-Way ANOVA Although the previous discussion covers the essence of ANOVA, to determine whether the null hypothesis should be rejected requires that we actually determine values of the estimators for the total variation, between-sample variation, and within-sample variation. Most ANOVA tests are done using a computer, but we will illustrate the manual computational approach one time to show you how it is done. Because software such as Excel and Minitab can be used to perform all calculations, future examples will be done using the computer. The software packages will do all the computations while we focus on interpreting the results.. BUSINESS APPLICATION. DEVELOPING THE ANOVA TABLE. BAYHILL MARKETING COMPANY (CONTINUED) Now we are ready to perform the necessary one-way ANOVA computations for the Bayhill example. Recall from Equation 12.1 that we can partition the total sum of squares into two components: SST SSB SSW The total sum of squares is computed as shown in Equation 12.3.. Total Sum of Squares k. SST . ni. ∑ ∑ (xij x )2. (12.3). i1 j1. where: SST Total sum of squares k Number of populatiions (treatments) n i Sample size from population i xij jth measurement from population i x Grand mean (mean of all the data values). Equation 12.3 is not as complicated as it appears. Manually applying Equation 12.3 to the Bayhill data shown in Table 12.1 on page 477 ( Grand mean x 8.25), we can compute the SST as follows: SST (4.10 - 8.25)2 (5.90 - 8.25)2 (10.45 - 8.25)2 . . . (9.20 - 8.25)2 SST 220.88 Thus, the sum of the squared deviations of all values from the grand mean is 220.88. Equation 12.3 can also be restated as k. SST . ni. ∑ ∑ (xij x )2 (nT 1)s2 i1 j1. where s 2 is the sample variance for all data combined, and nT is the sum of the combined sample sizes. We now need to determine how much of this total sum of squares is due to between-sample sum of squares and how much is due to within-sample sum of squares. The between-sample portion is called the sum of squares between and is found using Equation 12.4..

(8) 482. CHAPTER 12. |. Analysis of Variance. Sum of Squares Between k. SSB =. ∑ ni ( xi − x )2. (12.4). i =1. where: SSB Sum of squares between samples k Number of populations ni Sample size from population i xi Sample mean from population i x Grand mean. We can use Equation 12.4 to manually compute the sum of squares between for the Bayhill data, as follows: SSB 8(7 - 8.25)2 8(9 - 8.25)2 8(8 - 8.25)2 8(9 - 8.25)2 SSB 22 Once both the SST and SSB have been computed, the sum of squares within (also called the sum of squares error, SSE) is easily computed using Equation 12.5. The sum of squares within can also be computed directly, using Equation 12.6.. Sum of Squares Within SSW SST - SSB. (12.5). or Sum of Squares Within k. SSW . ni. ∑ ∑ (xij xi )2. (12.6). i1 j1. where: SSW Sum of squares within samples k Number off populations ni Sample size from population i xi Sample mean from population i xij jth measurement from population i. For the Bayhill example, the SSW is SSW 220.88 - 22.00 198.88 These computations are the essential first steps in performing the ANOVA test to determine whether the population means are equal. Table 12.2 illustrates the ANOVA table format used to conduct the test. The format shown in Table 12.2 is the standard ANOVA table layout. For the Bayhill example, we substitute the numerical values for SSB, SSW, and SST and complete the ANOVA table, as shown in Table 12.3. The mean square column contains the MSB (mean square between samples) and the MSW (mean square within samples).4 These values are computed by dividing the sum of squares by their respective degrees of freedom, as shown in Table 12.3. 4MSW. is also known as the mean square for error (MSE)..

(9) |. CHAPTER 12 TABLE 12.2. Analysis of Variance. 483. | One-Way ANOVA Table: The Basic Format. Source of Variation. SS. df. MS. F-Ratio. Between samples. SSB. k 1. MSB. MSB. Within samples. SSW. nT − k. MSW. MSW. SST. nT − 1. Total where:. k = Number of populations nT = Sum of the sample sizes from all populations df = Degreesof freedom SSB MSB = Mean square between = k −1 SSW MSW = Mean square within = nT − k. Restating the null and alternative hypotheses for the Bayhill example: H0: m1 m2 m3 m4 HA: At least two population means are different Glance back at Figures 12.1 and 12.2. If the null hypothesis is true (that is, all the means are equal—Figure 12.2), the MSW and MSB will be equal, except for the presence of sampling error. However, the more the sample means differ (Figure 12.1), the larger the MSB becomes. As the MSB increases, it will tend to get larger than the MSW. When this difference gets too large, we will conclude that the population means must not be equal, and the null hypothesis will be rejected. But how do we determine what “too large” is? How do we know when the difference is due to more than just sampling error? To answer these questions, recall from Chapter 11 the F-distribution is used to test whether two populations have the same variance. In the ANOVA test, if the null hypothesis is true, the ratio of MSB over MSW forms an F-distribution with D1 k - 1 and D2 nT - k degrees of freedom. If the calculated F-ratio in Table 12.3 gets too large, the null hypothesis is rejected. Figure 12.4 illustrates the hypothesis test for a significance level of 0.05. Because the calculated F-ratio 1.03 is less than the critical F0.05 2.95 (found using Excel’s FINV function) with 3 and 28 degrees of freedom, the null hypothesis cannot be rejected. The F-ratio indicates that the between-levels estimate and the within-levels estimate are not different enough to conclude that the population means are different. This means there is insufficient statistical evidence to conclude that any one of the four Web site designs will generate higher average dollar values of orders than any of the other designs. Therefore, the choice of which Web site design to use can be based on other factors, such as company preference.. TABLE 12.3. |. One-Way ANOVA Table for the Bayhill Marketing Company. Source of Variation. SS. df. MS. Between samples. 22.00. 3. 7.33. 7.33. 198.88. 28. 7.10. 7.10. 220.88. 31. Within samples Total where:. F-Ratio = 1.03. SSB 22 = = 7.33 k −1 3 SSW 198.88 = = 7.10 MSW = M ean square within = nT − k 28 MSB = Mean square between =.

(10) 484. CHAPTER 12. FIGURE 12.4. |. Analysis of Variance. |. Bayhill Company Hypothesis Test. H0: 1 = 2 = 3 = 4 HA: At least two population means are different = 0.05 f(F) Degrees of Freedom: D1 = k – 1 = 4 – 1 = 3 D2 = nT – k = 32 – 4 = 28 Rejection Region. F0.05 = 2.95. F = 1.03. F. Decision Rule: If: F > F0.05 reject H0; otherwise do not reject H0. Then: F =. MSB 7.33 = = 1.03 MSW 7.10. Because: F = 1.03 < F0.05 = 2.95, we do not reject H0.. Chapter Outcome 2.. EXAMPLE 12-1. ONE-WAY ANALYSIS OF VARIANCE. Roderick, Wilterding & Associates Roderick, Wilterding & Associates (RWA) operates automobile dealerships in three regions: the West, Southwest, and Northwest. Recently, RWA’s general manager questioned whether the company’s mean profit margin per vehicle sold differed by region. To determine this, the following steps can be performed: Step 1 Specify the parameter(s) of interest. The parameter of interest is the mean dollars of profit margin in each region. Step 2 Formulate the null and alternative hypotheses. The appropriate null and alternative hypotheses are H0: mW mSW mNW HA: At least two populations have different means Step 3 Specify the significance level (a) for testing the hypothesis. The test will be conducted using an a 0.05. Step 4 Select independent simple random samples from each population, and compute the sample means and the grand mean. There are three regions. Simple random samples of vehicles sold in these regions have been selected: 10 in the West, 8 in the Southwest, and 12 in the Northwest. Note, even though the sample sizes are not equal, the largest sample is not more than 1.5 times as large as the smallest sample size. The following sample data were collected (in dollars): West. Southwest. Northwest. West. Southwest. Northwest. 3,700. 3,300. 2,900. 5,300. 2,700. 3,300. 2,900 4,100 4,900 4,900. 2,100 2,600 2,100 3,600. 4,300 5,200 3,300 3,600. 2,200 3,700 4,800 3,000. 4,500 2,400. 3,700 2,400 4,400 3,300 4,400 3,200.

(11) CHAPTER 12. |. Analysis of Variance. 485. The sample means are ∑ x $39,500 $3,950 10 n $23,300 $2,912.50 8 $44,000 $3,666.67 12. xW xSW x NW. and the grand mean is the mean of the data from all samples is ∑ ∑ x $3,700 $2,900 ⋅ ⋅ ⋅ $3,200 nT 30 $106,,800 30 $3,560. x. Step 5 Determine the decision rule. The F-critical value from the F-distribution table in Appendix H for D1 2 and D2 27 degrees of freedom is a value between 3.316 and 3.403. The exact value F0.05 3.354 can be found using Excel’s FINV function or Minitab’s Calc Probability Distributions command. The decision rule is If F 3.354, reject the null hypothesis; otherwise, do not reject the null hypothesis. Step 6 Check to see that the equal variance assumption has been satisfied. As long as we assume that the populations are normally distributed, Hartley’s Fmax test can be used to test whether the three populations have equal variances. The test statistic is Fmax . 2 smax 2 smin. The three variances are computed using. 2 sW. ∑(x x w )2 1, 062,777.8 n 1. 2 695,535.7 sSW. s N2 W 604,242.4 Hartley’s Fmax . 1,062,777.8 1.76 604,242.4. From the Fmax table in Appendix I, the critical value for a 0.05, c 3 (c k), and v 9 (v n 1 10 1 9) is 5.34. Because 1.76 5.34, we do not reject the null hypothesis of equal variances. Step 7 Create the ANOVA table. Compute the total sum of squares, sum of squares between, and sum of squares within, and complete the ANOVA table..

(12) 486. CHAPTER 12. |. Analysis of Variance. Total Sum of Squares k. SST . ni. ∑ ∑ (xij x )2 i1 j1. (3, 700 − 3, 560)2 (2, 900 − 3, 560)2 . . . (3, 200 − 3, 560)2 26, 092, 000 Sum of Squares Between k. SSB . ∑ ni ( xi x )2 i1. 10( 3, 950 3, 560 )2 8(2, 912.50 3, 560 )2 12( 3, 666.67 3, 560 )2 5, 011, 583 Sum of Squares Within SSW SST - SSB 26,092,000 - 5,011,583 21,080,417 The ANOVA table is Source of Variation. SS. df. MS. F-Ratio. 5,011,583. 2. 2,505,792. Within samples. 21,080,417. 27. 780,756. Total. 26,092,000. 29. Between samples. 2, 505, 792 = 3.209 780, 756. Step 8 Reach a decision. Because the F-test statistic 3.209 F0.05 3.354, we do not reject the null hypothesis based on these sample data. Step 9 Draw a conclusion. We are not able to detect a difference in the mean profit margin per vehicle sold by region. END EXAMPLE. TRY PROBLEM 12-2 (pg. 493). BUSINESS APPLICATION. Excel and Minitab. tutorials. Excel and Minitab Tutorial. USING SOFTWARE TO PERFORM ONE-WAY ANOVA. HYDRONICS CORPORATION The Hydronics Corporation makes and distributes health products. Currently, the company’s research department is experimenting with two new herbbased weight loss–enhancing products. To gauge their effectiveness, researchers at the company conducted a test using 300 human subjects over a six-week period. All the people in the study were between 30 and 40 pounds overweight. One third of the subjects were randomly selected to receive a placebo—in this case, a pill containing only vitamin C. One third of the subjects were randomly selected and given product 1. The remaining 100 people received product 2. The subjects did not know which pill they had been assigned. Each person was asked to take the pill regularly for six weeks and otherwise observe his or her normal routine. At the end of six weeks, the subjects’ weight loss was recorded. The company was hoping to find statistical evidence that at least one of the products is an effective weight-loss aid. The file Hydronics shows the study data. Positive values indicate that the subject lost weight, whereas negative values indicate that the subject gained weight during the six-week study period. As often happens in studies involving human subjects, people drop out. Thus, at the end of six weeks, only 89 placebo subjects, 91 product 1 subjects, and 83 product 2 subjects with valid data remained. Consequently, this experiment resulted in an unbalanced design. Although the sample sizes are not equal, they are close to being the same size and do not violate the 1.5-ratio rule of thumb mentioned earlier..

(13) CHAPTER 12. FIGURE 12.5A. |. Analysis of Variance. 487. |. Excel 2007 Output: Hydronics Weight Loss ANOVA Results. Excel 2007 Instructions: 1. Open file: Hydronics.xls. 2. On the Data tab, click Data Analysis. 3. Select ANOVA: Single Factor. 4. Define data range (columns B, C, and D). 5. Specify alpha level 0.05. 6. Indicate output location. 7. Click OK.. F, p-value and F-critical. The null and alternative hypotheses to be tested using a significance level of 0.05 are H0: m1 m2 m3 HA: At least two population means are different The experimental design is completely randomized. The factor is diet supplement, which has three levels: placebo, product 1, and product 2. We will use a significance level of a 0.05. Figure 12.5a and Figure 12.5b show the Excel and Minitab analysis of variance results. The top section of the Excel ANOVA and the bottom section of the Minitab ANOVA output provide descriptive information for the three levels. The ANOVA table is shown in the other section of the output. These tables look like the one we generated manually in the Bayhill example. However, Excel and Minitab also compute the p-value. In addition, Excel displays the critical value, F-critical, from the F-distribution table. Thus, you can test the null hypothesis by comparing the calculated F to the F-critical or by comparing the p-value to the significance level. The decision rule is If F F0.05 3.03, reject H0; otherwise, do not reject H0. FIGURE 12.5B. |. Minitab Output: Hydronics Weight Loss ANOVA Results F and p-value. Minitab Instructions: 1. Open file: Hydronics. MTW. 2. Choose Stat ANOVA One way. 3. In Response, enter data column, Loss. 4. In Factor, enter factor level column, Program. 5. Click OK..

(14) 488. CHAPTER 12. |. Analysis of Variance. or If p-value a 0.05, reject H0; otherwise, do not reject H0. Because F 20.48 F0.05 3.03 (or p-value 0.0000 a 0.05) we reject the null hypothesis and conclude there is a difference in the mean weight loss for people on the three treatments. At least two of the populations have different means. The top portion of Figure 12.5a shows the descriptive measures for the sample data. For example, the subjects who took the placebo actually gained an average of 1.75 pounds. Subjects on product 1 lost an average of 2.45 pounds, and subjects on product 2 lost an average of 2.58 pounds. Chapter Outcome 3.. The Tukey-Kramer Procedure for Multiple Comparisons What does this conclusion imply about which treatment results in greater weight loss? One approach to answering this question is to use confidence interval estimates for all possible pairs of population means, based on the pooling of the two relevant sample variances, as introduced in Chapter 10. sp . (n1 1) s12 (n2 1) s22 n1 n2 2. These confidence intervals are constructed using the formula also given in Chapter 10: (x1 x2 ) tsp. Experiment-Wide Error Rate The proportion of experiments in which at least one of the set of confidence intervals constructed does not contain the true value of the population parameter being estimated.. 1 1 n1 n2. It uses a weighted average of only the two sample variances corresponding to the two sample means in the confidence interval. However, in the Hydronics example, we have three samples, and thus three variances, involved. If we were to use the pooled standard deviation, sp shown here, we would be disregarding one third of the information available to estimate the common population variance. Instead, we use confidence intervals based on the pooled standard deviation obtained from the square root of MSW. This is the square root of the weighted average of all (three in this example) sample variances. This is preferred to the interval estimate shown here because we are assuming that each of the three sample variances is an estimate of the common population variance. A better method for testing which populations have different means after the one-way ANOVA has led us to reject the null hypothesis is called the Tukey-Kramer procedure for multiple comparisons.5 To understand why the Tukey-Kramer procedure is superior, we introduce the concept of an experiment-wide error rate. The Tukey-Kramer procedure is based on the simultaneous construction of confidence intervals for all differences of pairs of treatment means. In this example, there are three different pairs of means (m1 - m2, m1 - m3, m2 - m3). The Tukey-Kramer procedure simultaneously constructs three different confidence intervals for a specified confidence level, say 95%. Intervals that do not contain zero imply that a difference exists between the associated population means. Suppose we repeat the study a large number of times. Each time, we construct the TukeyKramer 95% confidence intervals. The Tukey-Kramer method assures us that in 95% of these experiments, the three confidence intervals constructed will include the true difference between the population means, mi - mj. In 5% of the experiments, at least one of the confidence intervals will not contain the true difference between the population means. Thus in 5% of the situations, we would make at least one mistake in our conclusions about which populations have different means. This proportion of errors (0.05) is known as the experiment-wide error rate. For a 95% confidence interval, the Tukey-Kramer procedure controls the experimentwide error to a 0.05 level. However, because we are concerned with only this one experiment (with one set of sample data), the error rate associated with any one of the three confidence intervals is actually less than 0.05. 5There are other methods for making these comparisons. Statisticians disagree over which method to use. Later, we introduce alternative methods..

(15) CHAPTER 12. |. 489. Analysis of Variance. The Tukey-Kramer procedure allows us to simultaneously examine all pairs of populations after the ANOVA test has been completed without increasing the true alpha level. Because these comparisons are made after the ANOVA F-test, the procedure is called a post-test (or post-hoc) procedure. The first step in using the Tukey-Kramer procedure is to compute the absolute differences between each pair of sample means. Using the results shown in Figure 12.5a, we get the following absolute differences:. | x1 x2 | | 1.75 2.45 | 4.20 | x1 x3 | | 1.75 2.58 | 4.33 | x2 x3 | | 2.45 2.58 | 0.13 The Tukey-Kramer procedure requires us to compare these absolute differences to the critical range that is computed using Equation 12.7.. Tukey-Kramer Critical Range Critical range q 1− a. MSW 2. ⎛1 1⎞ ⎜ ⎟ ⎝ ni n j ⎠. (12.7). where: q1-a Value from studentized range table (Appendix J), with D1 k and D2 nT - k degrees of freedom for the desired level of 1 - a [k Number of groups or factor levels, and nT Total number of data values from all populations (levels) combined] MSW Mean square within ni and nj Sample sizes from populations (levels) i and j, respectively. A critical range is computed for each pairwise comparison, but if the sample sizes are equal, only one critical-range calculation is necessary because the quantity under the radical in Equation 12.7 will be the same for all comparisons. If the calculated pairwise comparison value is greater than the critical range, we conclude the difference is significant. To determine the q-value from the studentized range table in Appendix J for a significance level equal to a 0.05 and k 3 and nT - k 260 degrees of freedom For D2 nT - k 260 degrees of freedom, we use the row labeled `. The studentized range value for 1 - 0.05 0.95 is approximately q0.95 3.31 Then, for the placebo versus product 1 comparison, n1 89. and. n2 91. we use Equation 12.7 to compute the critical range, as follows: ⎛1 1⎞ ⎜ ⎟ ⎝ ni n j ⎠. Critical range q1. MSW 2. Critical range 3.31. 26.18 ⎛ 1 1⎞ ⎜ ⎟ 1.785 2 ⎝ 89 91 ⎠.

(16) 490. CHAPTER 12. |. Analysis of Variance. TABLE 12.4. |. Hydronics Pairwise Comparisons—Tukey-Kramer Test | xixj |. Critical Range. Significant?. Placebo vs. product 1. 4.20. 1.785. Yes. Placebo vs. product 2. 4.33. 1.827. Yes. Product 1 vs. product 2. 0.13. 1.818. No. Because. | x1 x2 | 4.20 1.785 we conclude that m 1 m2 The mean weight loss for the placebo group is not equal to the mean weight loss for the product 1 group. Table 12.4 summarizes the results for the three pairwise comparisons. From the table we see that product 1 and product 2 both offer significantly higher average weight loss than the placebo. However, the sample data do not indicate a difference in the average weight loss between product 1 and product 2. Thus, the company can conclude that both product 1 and product 2 are superior to taking a placebo.. Chapter Outcome 3.. EXAMPLE 12-2. THE TUKEY-KRAMER PROCEDURE FOR MULTIPLE COMPARISON. Digitron, Inc. Digitron, Inc., makes disc brakes for automobiles. Digitron’s research and Excel and Minitab. tutorials. Excel and Minitab Tutorial. development (R&D) department recently tested four brake systems to determine if there is a difference in the average stopping distance among them. Forty identical mid-sized cars were driven on a test track. Ten cars were fitted with brake A, 10 with brake B, and so forth. An electronic, remote switch was used to apply the brakes at exactly the same point on the road. The number of feet required to bring the car to a full stop was recorded. The data are in the file Digitron. Because we care to determine only whether the four brake systems have the same or different mean stopping distances, the test is a one-way (single-factor) test with four levels and can be completed using the following steps: Step 1 Specify the parameter(s) of interest. The parameter of interest is the mean stopping distance for each brake type. The company is interested in knowing whether a difference exists in mean stopping distance for the four brake types. Step 2 Formulate the appropriate null and alternative hypotheses. The appropriate null and alternative hypotheses are H0: m1 m2 m3 m4 HA: At least two population means are different Step 3 Specify the significance level for the test. The test will be conducted using a 0.05. Step 4 Select independent simple random samples from each population. Step 5 Check to see that the normality and equal-variance assumptions have been satisfied..

(17) CHAPTER 12. |. Analysis of Variance. 491. Because of the small sample size, the box and whisker plot is used. 285. 275. 265. 255 Brake A. Brake B. Brake C. Brake D. The box plots indicate some skewness in the samples and question the assumption of equality of variances. However, if we assume that the populations are approximately normally distributed, Hartley’s Fmax test can be used to test whether the four populations have equal variances. The test statistic is F max . 2 smax 2 smin. The four variances are computed using s 2 s12 49.9001 Hartley’s Fmax . s22 61.8557. ∑(x x )2 : n 1. s32 21.7356. s42 106.4385. 106.4385 4.8970 21.7356. From the Fmax table in Appendix I, the critical value for a 0.05, k 4, and n 1 9 is F0.05 6.31. Because 4.8970 6.31, we conclude that the population variances could be equal. Recall our earlier discussion stating that when the sample sizes are equal, as they are in this example, the ANOVA test is robust in regards to both the equal variance and normality assumptions. Step 6 Determine the decision rule. Because k - 1 3 and nT - k 36, from Excel or Minitab F0.05 2.8663. The decision rule is If the calculated F F0.05 2.8663, reject H0, or if the p-value a 0.05, reject H0; otherwise, do not reject H0. Step 7 Use Excel or Minitab to construct the ANOVA table. Figure 12.6 shows the Excel output for the ANOVA. Step 8 Reach a decision. From Figure 12.6, we see that F 3.89 F0.05 2.8663, and p-value 0.0167 0.05 We reject the null hypothesis. Step 9 Draw a conclusion. We conclude that not all population means are equal. But which systems are different? Is one system superior to all the others? Step 10 Use the Tukey-Kramer test to determine which populations have different means. Because we have rejected the null hypothesis of equal means, we need to perform a post–ANOVA multiple comparisons test. Using Equation 12.7 to.

(18) 492. CHAPTER 12. FIGURE 12.6. |. Analysis of Variance. |. Excel 2007 One-Way ANOVA Output for the Digitron Example. Because calculated F = 3.8854 > 2.8663, we reject the null hypothesis and conclude the means are not equal.. Excel 2007 Instructions: 1. Open file: Digitron.xls. 2. On the Data tab, click Data Analysis. 3. Select ANOVA: Single Factor. 4. Define data range (columns B, C, D, E). 5. Specify alpha level 0.05. 6. Specify output location. 7. Click OK.. Minitab Instructions (for similar results): 3. In Response, enter data column, Distance. 1. Open file: Digitron.MTW. 4. In Factor, enter factor level column, Brake. 2. Choose Stat ANOVA One-way. 5. Click OK.. construct the critical range to compare to the absolute differences in all possible pairs of sample means, the critical range is6 Critical range q1−α. MSW 2. ⎛ 1 1⎞ 59.98 ⎛ 1 1⎞ 85 ⎟ ⎜ ⎟ 3.8 ⎜ ⎜⎝ ni n j ⎟⎠ 2 ⎝ 10 10 ⎠. Critical range 9.43 Only one critical range is necessary because the sample sizes are equal. If any pair of sample means has an absolute difference, | xi x j|, greater than the critical range, we can infer that a difference exists in those population means. The possible pairwise comparisons (part of a family of comparisons called contrasts) are. Contrast. Significant Difference. | x1 x2| |272.3590 271.3299| 1.0291 9.43. No. | x1 x3 | |272.3590 262.3140| 10.0450 9.43. Yes. | x1 x4 | | 272.3590 265.2357| 7.1233 9.43. No. | x2 x3 | |271.3299 262.3140| 9.0159 9.43. No. | x2 x4 | |271.3299 265.2357| 6.0942 9.43. No. | x3 x4| |262.3140 265.2357| 2.9217 9.43. No. 6The q-value from the studentized range table with a 0.05 and degrees of freedom equal to k 4 and n - k 36 T must be approximated using degrees of freedom 4 and 30 because the table does not show degrees of freedom of 4 and 36. This value is 3.85. Rounding down to 30 will give a larger q value and a conservatively large critical range..

(19) CHAPTER 12. |. Analysis of Variance. 493. Therefore, based on the Tukey-Kramer procedure, we can infer that population 1 (brake system A) and population 3 (brake system C) have different mean stopping distances. Because short stopping distances are preferred, system C would be preferred over system A, but no other differences are supported by these sample data. For the other contrasts, the difference between the two sample means is insufficient to conclude that a difference in population means exists. END EXAMPLE. TRY PROBLEM 12-6 (pg. 494). Fixed Effects Versus Random Effects in Analysis of Variance In the Digitron brake example, the company was testing four brake systems. These were the only brake systems under consideration. The ANOVA was intended to determine whether there was a difference in these four brake systems only. In the Hydronics weight-loss example, the company was interested in determining whether there was a difference in mean weight loss for two supplements and the placebo. In the Bayhill example involving Web site designs, the company narrowed its choices to four different designs, and the ANOVA test was used to determine whether there was a difference in means for these four designs only. Thus, in each of these examples, the inferences extend only to the factor levels being analyzed, and the levels are assumed to be the only levels of interest. This type of test is called a fixed effects analysis of variance test. Suppose in the Bayhill Web site example that instead of reducing the list of possible Web site designs to a final four, the company had simply selected a random sample of four Web site designs from all possible designs being considered. In that case, the factor levels included in the test would be a random sample of the possible levels. Then, if the ANOVA leads to rejecting the null hypothesis, the conclusion applies to all possible Web site designs. The assumption is the possible levels have a normal distribution and the tested levels are a random sample from this distribution. When the factor levels are selected through random sampling, the analysis of variance test is called a random effects test.. MyStatLab. 12-1: Exercises Skill Development 12-1. A start-up cell phone applications company is interested in determining whether household incomes are different for subscribers to three different service providers. A random sample of 25 subscribers to each of the three service providers was taken, and the annual household income for each subscriber was recorded. The partially completed ANOVA table for the analysis is shown here: ANOVA Source of Variation Between Groups Within Groups Total. SS. df. MS. F. b. Based on the sample results, can the start-up firm conclude that there is a difference in household incomes for subscribers to the three service providers? You may assume normal distributions and equal variances. Conduct your test at the a 0.10 level of significance. Be sure to state a critical F-statistic, a decision rule, and a conclusion. 12-2. An analyst is interested in testing whether four populations have equal means. The following sample data have been collected from populations that are assumed to be normally distributed with equal variances:. 2,949,085,157 Sample 1. Sample 2. Sample 3. Sample 4. 9 6 11 14 14. 12 16 16 12 9. 8 8 12 7 10. 17 15 17 16 13. 9,271,678,090. a. Complete the ANOVA table by filling in the missing sums of squares, the degrees of freedom for each source, the mean square, and the calculated F-test statistic..

(20) 494. |. CHAPTER 12. Analysis of Variance. Conduct the appropriate hypothesis test using a significance level equal to 0.05. 12-3. A manager is interested in testing whether three populations of interest have equal population means. Simple random samples of size 10 were selected from each population. The following ANOVA table and related statistics were computed: ANOVA: Single Factor Summary Groups. Count. Sum. Average. Variance. Sample 1. 10. 507.18. 50.72. 35.06. Sample 2. 10. 405.79. 40.58. 30.08. Sample 3. 10. 487.64. 48.76. 23.13. ANOVA Source. SS. Between Groups Within Groups Total. df. MS. F. 578.78 2 289.39 9.84 794.36 27 29.42. p-value. F-crit. 0.0006. 3.354. 1,373.14 29. a. State the appropriate null and alternative hypotheses. b. Conduct the appropriate test of the null hypothesis assuming that the populations have equal variances and the populations are normally distributed. Use a 0.05 level of significance. c. If warranted, use the Tukey-Kramer procedure for multiple comparisons to determine which populations have different means. (Assume a 0.05.) 12-4. Respond to each of the following questions using this partially completed one-way ANOVA table: Source of Variation. SS. Between Samples. 1,745. Within Samples Total. 6,504. df. MS. SS. Between Samples Within Samples Total. 3 405. __. 888. 31. Group 1. Group 2. Group 3. Group 4. 1 2 3 4 5 6 7. 20.9 27.2 26.6 22.1 25.3 30.1 23.8. 28.2 26.2 21.6 29.7 30.3 25.9. 17.8 15.9 18.4 20.2 14.1. 21.2 23.9 19.5 17.4. a. Based on the computations for the within- and between-sample variation, develop the ANOVA table and test the appropriate null hypothesis using a 0.05. Use the p-value approach. b. If warranted, use the Tukey-Kramer procedure to determine which populations have different means. Use a 0.05. 12-7. Examine the three samples obtained independently from three populations: Item. Group 1. Group 2. Group 3. 1 2 3 4 5 6. 14 13 12 15 16. 17 16 16 18. 17 14 15 16 14 16. a. Conduct a one-way analysis of variance on the data. Use alpha 0.05. b. If warranted, use the Tukey-Kramer procedure to determine which populations have different means. Use an experiment-wide error rate of 0.05.. 240 246. df. Item. F-ratio. a. How many different populations are being considered in this analysis? b. Fill in the ANOVA table with the missing values. c. State the appropriate null and alternative hypotheses. d. Based on the analysis of variance F-test, what conclusion should be reached regarding the null hypothesis? Test using a significance level of 0.01. 12-5. Respond to each of the following questions using this partially completed one-way ANOVA table: Source of Variation. a. How many different populations are being considered in this analysis? b. Fill in the ANOVA table with the missing values. c. State the appropriate null and alternative hypotheses. d. Based on the analysis of variance F-test, what conclusion should be reached regarding the null hypothesis? Test using a 0.05. 12-6. Given the following sample data. MS. F-ratio. Business Applications 12-8. In conjunction with the housing foreclosure crisis of 2009, many economists expressed increasing concern about the level of credit card debt and efforts of banks to raise interest rates on these cards. The banks claimed the increases were justified. A Senate sub-committee decided to determine if the average credit card balance depends on the type of credit card used. Under consideration are Visa, MasterCard, Discover, and American Express. The sample sizes to be used for each level are 25, 25, 26, and 23, respectively. a. Describe the parameter of interest for this analysis. b. Determine the factor associated with this experiment. c. Describe the levels of the factor associated with this analysis..

(21) CHAPTER 12. d. State the number of degrees of freedom available for determining the between-samples variation. e. State the number of degrees of freedom available for determining the within-samples variation. f. State the number of degrees of freedom available for determining the total variation. 12-9. EverRun Incorporated produces treadmills for use in exercise clubs and recreation centers. EverRun assembles, sells, and services its treadmills, but it does not manufacture the treadmill motors. Rather, treadmill motors are purchased from an outside vendor. Currently, EverRun is considering which motor to include in its new ER1500 series. Three potential suppliers have been identified: Venetti, Madison, and Edison; however, only one supplier will be used. The motors produced by these three suppliers are identical in terms of noise and cost. Consequently, EverRun has decided to make its decision based on how long a motor operates at a high level of speed and incline before it fails. A random sample of 10 motors of each type is selected, and each motor is tested to determine how many minutes (rounded to the nearest minute) it operates before it needs to be repaired. The sample information for each motor is as follows: Venetti. Madison. Edison. 14,722 14,699 12,627 13,010 13,570 14,217 13,687 13,465 14,786 12,494. 13,649 13,592 11,788 12,623 14,552 13,441 13,404 13,427 12,049 11,672. 13,296 13,262 11,552 11,036 12,978 12,170 12,674 11,851 12,342 11,557. a. At the a 0.01 level of significance, is there a difference in the average time before failure for the three different supplier motors? b. Is it possible for EverRun to decide on a single motor supplier based on the analysis of the sample results? Support your answer by conducting the appropriate post-test analysis. 12-10. ESSROC Cement Corporation is a leading North American cement producer, with over 6.5 million metric tons of annual capacity. With headquarters in Nazareth, Pennsylvania, ESSROC operates production facilities strategically located throughout the United States, Canada, and Puerto Rico. One of its products is Portland cement. Portland cement’s properties and performance standards are defined by its type designation. Each type is designated by a Roman numeral. Ninety-two percent of the Portland cement produced in North America is Type I, II, or I/II.. |. Analysis of Variance. 495. One characteristic of the type of cement is its compressive strength. Sample data for the compressive strength (psi) are shown as follows: Type. Compressive Strength. I. 4,972. 4,983. 4,889. 5,063. II. 3,216. 3,399. 3,267. 3,357. I/II. 4,073. 3,949. 3,936. 3,925. a. Develop the appropriate ANOVA table to determine if there is a difference in the average compressive strength among the three types of Portland cement. Use a significance level of 0.01. b. If warranted, use the Tukey-Kramer procedure to determine which populations have different mean compressive strengths. Use an experiment-wide error rate of 0.01. 12-11. The Weidmann Group Companies, with headquarters in Rapperswil, Switzerland, are worldwide leaders in insulation systems technology for power and distribution transformers. One facet of its expertise is the development of dielectric fluids in electrical equipment. Mineral oil–based dielectric fluids have been used more extensively than other dielectric fluids. Their only shortcomings are their relatively low flash and fire point. One study examined the fire point of mineral oil, high-molecular-weight hydrocarbon (HMWH), and silicone. The fire points for each of these fluids were as follows: Fluid. Fire Points (°C). Mineral Oil HMWH. 162 312. 151 310. 168 300. 165 311. 169 308. Silicone. 343. 337. 345. 345. 337. a. Develop the appropriate ANOVA table to determine if there is a difference in the average fire points among the types of dielectric fluids. Use a significance level of 0.05. b. If warranted, use the Tukey-Kramer procedure to determine which populations have different mean fire points. Use an experiment-wide error rate of 0.05. 12-12. The manager at the Hillsberg Savings and Loan is interested in determining whether there is a difference in the mean time that customers spend completing their transactions depending on which of four tellers they use. To conduct the test, the manager has selected simple random samples of 15 customers for each of the tellers and has timed them (in seconds) from the moment they start their transaction to the time the transaction is completed and they leave the teller station. The manager then asked one of her assistants to perform the appropriate statistical test. The assistant.

(22) 496. CHAPTER 12. |. Analysis of Variance. Computer Database Exercises. returned with the following partially completed ANOVA table. Summary Groups. Count. Sum. Average. Variance. Teller 1. 15. 3,043.9. 827.4. Teller 2. 15. 3,615.5. 472.2. Teller 3. 15. 3,427.7. 445.6. Teller 4. 15. 4,072.4. 619.4. ANOVA Source of Variation Between Groups. SS. df MS F-ratio p-value. 36,530.6. F-crit. 4.03E–09 2.7694. Within Groups Total. 69,633.7 59. a. State the appropriate null and alternative hypotheses. b. Test to determine whether the population variances are equal. Use a significance level equal to 0.05. c. Fill in the missing parts of the ANOVA table and perform the statistical hypothesis test using a 0.05. d. Based on the result of the test in part c, if warranted, use the Tukey-Kramer method with a 0.05 to determine which teller require the most time on average to complete a customer’s transaction. 12-13. Suppose as part of your job you are responsible for installing emergency lighting in a series of state office buildings. Bids have been received from four manufacturers of battery-operated emergency lights. The costs are about equal, so the decision will be based on the length of time the lights last before failing. A sample of four lights from each manufacturer has been tested with the following values (time in hours) recorded for each manufacturer: Type A. Type B. Type C. Type D. 1,024. 1,270. 1,121. 923. 1,121. 1,325. 1,201. 983. 1,250. 1,426. 1,190. 1,087. 1,022. 1,322. 1,122. 1,121. a. Using a significance level equal to 0.01, what conclusion should you reach about the four manufacturers’ battery-operated emergency lights? Explain. b. If the test conducted in part a reveals that the null hypothesis should be rejected, what manufacturer should be used to supply the lights? Can you eliminate one or more manufacturers based on these data? Use the appropriate test and a 0.01 for multiple comparisons. Discuss.. 12-14. Damage to homes caused by burst piping can be expensive to repair. By the time the leak is discovered, hundreds of gallons of water may have already flooded the home. Automatic shutoff valves can prevent extensive water damage from plumbing failures. The valves contain sensors that cut off water flow in the event of a leak, thereby preventing flooding. One important characteristic is the time (in milliseconds) required for the sensor to detect the water leak. Sample data obtained for four different shutoff valves are contained in the file entitled Waterflow. a. Produce the relevant ANOVA table and conduct a hypothesis test to determine if the mean detection time differs among the four shutoff valve models. Use a significance level of 0.05. b. Use the Tukey-Kramer multiple comparison technique to discover any differences in the average detection time. Use a significance level of 0.05. c. Which of the four shutoff valves would you recommend? State your criterion for your selection. 12-15. A regional package delivery company is considering changing from full-size vans to minivans. The company sampled minivans from each of three manufacturers. The number sampled represents the number the manufacturer was able to provide for the test. Each minivan was driven for 5,000 miles, and the operating cost per mile was computed. The operating costs, in cents per mile, for the 12 are provided in the data file called Delivery: Mini 1. Mini 2. Mini 3. 13.3 14.3 13.6 12.8 14.0. 12.4 13.4 13.1. 13.9 15.5 15.2 14.5. a. Perform an analysis of variance on these data. Assume a significance level of 0.05. Do the experimental data provide evidence that the average operating costs per mile for the three types of minivans are different? Use a p-value approach. b. Referring to part a, based on the sample data and the appropriate test for multiple comparisons, what conclusions should be reached concerning which type of car the delivery company should adopt? Discuss and prepare a report to the company CEO. Use a 0.05. c. Provide an estimate of the maximum and minimum difference in average savings per year if the CEO chooses the “best” versus the “worst” minivan using operating costs as a criterion. Assume that minivans are driven 30,000 miles a year. Use a 90% confidence interval. 12-16. The Lottaburger restaurant chain in central New Mexico is conducting an analysis of its restaurants,.

(23) CHAPTER 12. which take pride in serving burgers and fries to go faster than the competition. As a part of its analysis, Lottaburger wants to determine if its speed of service is different across its four outlets. Orders at Lottaburger restaurants are tracked electronically, and the chain is able to determine the speed with which every order is filled. The chain decided to randomly sample 20 orders from each of the four restaurants it operates. The speed of service for each randomly sampled order was noted and is contained in the file Lottaburger. a. At the a 0.05 level of service, can Lottaburger conclude that the speed of service is different across the four restaurants in the chain? b. If the chain concludes that there is a difference in speed of service, is there a particular restaurant the chain should focus its attention on? Use the appropriate test for multiple comparisons to support your decision. Use a 0.05. 12-17. Most auto batteries are made by just three manufacturers—Delphi, Exide, and Johnson Controls Industries. Each makes batteries sold under several. |. Analysis of Variance. 497. different brand names. Delphi makes ACDelco and some EverStart (Wal-Mart) models. Exide makes Champion, Exide, Napa, and some EverStart batteries. Johnson Controls makes Diehard (Sears), Duralast (AutoZone), Interstate, Kirkland (Costco), Motorcraft (Ford), and some EverStarts. To determine if who makes the auto batteries affects the average length of life of the battery, the samples in the file entitled Start were obtained. The data represent the length of life (months) for batteries of the same specifications for each of the three manufacturers. a. Determine if the average length of battery life is different among the batteries produced by the three manufacturers. Use a significance level of 0.05. b. Which manufacturer produces the battery with the longest average length of life? If warranted, conduct the Tukey-Kramer procedure to determine this. Use a significance level of 0.05. (Note: You will need to manipulate the data columns to obtain the appropriate factor levels). END EXERCISES 12-1. Chapter Outcome 4.. 12.2 Randomized Complete Block. Analysis of Variance Section 12.1 introduced one-way ANOVA for testing hypotheses involving three or more population means. This ANOVA method is appropriate as long as we are interested in analyzing one factor at a time and we select independent random samples from the populations. For instance, Example 12-2 involving brake assembly systems at the Digitron Corporation (Figure 12.6) illustrated a situation in which we were interested in only one factor: type of brake assembly system. The measurement of interest was the stopping distance with each brake system. To test the hypothesis that the four brake systems were equal with respect to average stopping distance, four groups of the same make and model cars were assigned to each brake system independently. Thus, the one-way ANOVA design was appropriate. There are, however, situations in which another factor may affect the observed response in a one-way design. Often, this additional factor is unknown. This is the reason for randomization within the experiment. However, there are also situations in which we know the factor that is impinging on the response variable of interest. Chapter 10 introduced the concept of paired samples and indicated that there are instances when you will want to test for differences in two population means by controlling for sources of variation that might adversely affect the analysis. For instance, in the Digitron example, we might be concerned that, even though we used the same make and model of car in the study, the cars themselves may interject a source of variability that could affect the result. To control for this, we could use the concept of paired samples by using the same 10 cars for each of the four brake systems. When an additional factor with two or more levels is involved, a design technique called blocking can be used to eliminate the additional factor’s effect on the statistical analysis of the main factor of interest.. Randomized Complete Block ANOVA Excel and Minitab. tutorials. Excel and Minitab Tutorial. BUSINESS APPLICATION. A RANDOMIZED BLOCK DESIGN. CITIZEN’S STATE BANK At Citizen’s State Bank, homeowners can borrow money against the equity they have in their homes. To determine equity, the bank determines the home’s value and subtracts the mortgage balance. The maximum loan is 90% of the equity..

(24) 498. CHAPTER 12. |. Analysis of Variance. The bank outsources the home appraisals to three companies: Allen & Associates, Heist Appraisal, and Appraisal International. The bank managers know that appraisals are not exact. Some appraisal companies may overvalue homes on average, whereas others might undervalue homes. Bank managers wish to test the hypothesis that there is no difference in the average house appraisal among the three different companies. The managers could select a random sample of homes for Allen & Associates to appraise, a second sample of homes for Heist Appraisal to work on, and a third sample of homes for Appraisal International. One-way ANOVA would be used to compare the sample means. Obviously a problem could occur if, by chance, one company received larger, higher-quality homes located in better neighborhoods than the other companies. This company’s appraisals would naturally be higher on average, not because it tended to appraise higher, but because the homes were simply more expensive. Citizen’s State Bank officers need to control for the variation in size, quality, and location of homes to fairly test that the three companies’ appraisals are equal on the average. To do this, they select a random sample of properties and have each company appraise the same properties. In this case, the properties are called blocks, and the test design is called a randomized complete block design. The data in Table 12.5 were obtained when each appraisal company was asked to appraise the same five properties. The bank managers wish to test the following hypothesis: H0: m1 m2 m3 HA: At least two populations have different means The randomized block design requires the following assumptions: Assumptions. 1. 2. 3. 4.. The populations are normally distributed. The populations have equal variances. The observations within samples are independent. The data measurement must be interval or ratio level.. Because the managers have chosen to have the same properties appraised by each company (block on property), the samples are not independent, and a method known as randomized complete block ANOVA must be employed to test the hypothesis. This method is similar to the one-way ANOVA in Section 12.1. However, there is one more source of variation to be accounted for, the block variation. As was the case in Section 12.1, we must find estimators for each source of variation. Identifying the appropriate sums of squares and then dividing each by its degrees of freedom does this. As was the case in the one-way ANOVA, the sums of squares are obtained by partitioning the total sum of squares (SST ). However, in this case the SST is divided into three components instead of two, as shown in Equation 12.8. TABLE 12.5. |. Citizen’s State Bank Property Appraisals (in thousands of dollars) Appraisal Company. Property (Block). Allen & Associates. Heist Appraisal. Appraisal International. Block Mean. 1. 78. 82. 79. 79.67. 2. 102. 102. 99. 101.00. 3. 68. 74. 70. 70.67. 4. 83. 88. 86. 85.67. 5. 95. 99. 92. 95.33. Factor-Level Mean. x1 85.2. x2 89. x3 85.2. x 86.47 Grand mean.

(25) CHAPTER 12. |. 499. Analysis of Variance. Sum of Squares Partitioning for Randomized Complete Block Design SST SSB SSBL SSW. (12.8). where: SST Total sum of squares SSB Sum of squares between factor levels SSBL Sum of squares between blocks SSW Sum of squares within levels. Both SST and SSB are computed just as we did with one-way ANOVA, using Equations 12.3 and 12.4. The sum of squares for blocking (SSBL) is computed using Equation 12.9.. Sum of Squares for Blocking b. SSBL . ∑ k(x j x )2. (12.9). j1. where: k Number of levels for the factor b Number off blocks x j The mean of the j th block x Grand mean. Finally, the sum of squares within (SSW) is computed using Equation 12.10. This sum of squares is what remains (the residual) after the variation for all known factors has been removed. This residual sum of squares may be due to the inherent variability of the data, measurement error, or other unidentified sources of variation. Therefore, the sum of squares within is also known as the sum of squares of error, SSE. Sum of Squares Within SSW SST - (SSB SSBL). (12.10). The effect of computing SSBL and subtracting it from SST in Equation 12.10 is that SSW is reduced. Also, if the corresponding variation in the blocks is significant, the variation within the factor levels will be significantly reduced. This can make it easier to detect a difference in the population means if such a difference actually exists. If it does, the estimator for the within variability will in all likelihood be reduced, and thus, the denominator for the F-test statistic will be smaller. This will produce a larger F-test statistic, which will more likely lead to rejecting the null hypothesis. This will depend, of course, on the relative size of SSBL and the respective changes in the degrees of freedom. Table 12.6 shows the completely randomized block ANOVA table format and equations for degrees of freedom, mean squares, and F-ratios. As you can see, we now have two F-ratios. The reason for this is that we test not only to determine whether the population means are equal but also to obtain an indication of whether the blocking was necessary by examining the ratio of the mean square for blocks to the mean square within. Although you could manually compute the necessary values for the randomized block design, both Excel and Minitab contain a procedure that will do all the computations and build the ANOVA table. The Citizen’s State Bank appraisal data are included in the file Citizens. (Note that the first column contains labels for each block.) Figures 12.9a and 12.9b show the ANOVA output. Using Excel or Minitab to perform the computations frees the decision maker to focus on interpreting the results. Note that Excel.

(26) 500. CHAPTER 12. |. Analysis of Variance. TABLE 12.6. |. Basic Format for the Randomized Block ANOVA Table. Source of Variation. SS. df. MS. F-ratio. Between blocks. SSBL. b-1. MSBL. MSBL MSW. Between samples. SSB. k-1. MSB. MSB MSW. Within samples. SSW SST. ( k − 1) (b − 1). MSW. Total where:. nT − 1 k = Number of levels b = Number of blocks of freedom df = Degreeso n T = Combined sample size SSBL b −1 SS SB MSB = Mean square between = k −1 SSW MSW = Mean square within = ( k − 1) ( b − 1). MSBL = Mean square blocking =. Note: Some randomized block ANOVA tables put SSB first, followed by SSBL.. refers to the randomized block ANOVA as Two-Factor ANOVA without replication. Minitab refers to the randomized block ANOVA as Two-Way ANOVA. The main issue is to determine whether the three appraisal companies differ in average appraisal values. The primary test is H0: m1 m2 m3 HA: At least two populations have different means a 0.05 Using the output presented in Figures 12.7a and 12.7b, you can test this hypothesis two ways. First, we can use the F-distribution approach. Figure 12.8 shows the results of this test. Based on the sample data, we reject the null hypothesis and conclude that the three appraisal companies do not provide equal average values for properties. The second approach to testing the null hypothesis is the p-value approach. The decision rule in an ANOVA application for p-values is If p-value a reject H0; otherwise, do not reject H0. In this case, a 0.05 and the p-value in Figure 12.9a is 0.0103. Because p-value 0.0103 a 0.05 we reject the null hypothesis. Both the F-distribution approach and the p-value approach give the same result, as they must. Was Blocking Necessary? Before we take up the issue of determining which company provides the highest mean property values, we need to discuss one other issue. Recall that the bank managers chose to control for variation between properties by having each appraisal company evaluate the same five properties. This restriction is called blocking, and the properties are the blocks. The ANOVA output in Figure 12.7a contains information that allows us to test whether blocking was necessary. If blocking was necessary, it would mean that appraisal values are in fact influenced by the particular property being appraised. The blocks then form a second factor of interest, and we formulate a secondary hypothesis test for this factor, as follows: H0: mb1 mb2 mb3 mb4 mb5 HA: Not all block means are equal.

(27) CHAPTER 12. FIGURE 12.7A. |. Analysis of Variance. 501. |. Excel 2007 Output: Citizen’s State Bank Analysis of Variance. Excel 2007 Instructions: 1. Open file: Citizens.xls. 2. On the Data tab, click Data Analysis. 3. Select ANOVA: TwoFactor Without Replication. 4. Define data range (include column A). 5. Specify alpha level 0.05. 6. Indicate output location. 7. Click OK.. FIGURE 12.7B. Blocks. Blocking Test Main Factor Test. Within. |. Minitab Output: Citizen’s State Bank Analysis of Variance. Main Factor Test Blocking Test. Minitab Instructions: 1. Open file: Citizens.MTW. 2. Choose Stat ANOVA Two-way. 3. In Response, enter the data column (Appraisal). 4. In Row Factor, enter main factor indicator column (Company) and select Display Means. 5. In Column Factor, enter the block indicator column (Property) and select Display Means. 6. Choose Fit additive model. 7. Click OK.. Blocks. Note that we are using mbj to represent the mean of the jth block. It seems only natural to use a test statistic that consists of the ratio of the mean square for blocks to the mean square within. However, certain (randomization) restrictions placed on the complete block design make this proposed test statistic invalid from a theoretical statistics.

(28) 502. CHAPTER 12. FIGURE 12.8. |. Analysis of Variance. |. Appraisal Company Hypothesis Test for Citizen’s State Bank. H0: 1 = 2 = 3 HA: At least two population means are different = 0.05. f(F). Degrees of Freedom: D1 = k – 1 = 3 – 1 = 2 D2 = (b – 1)(k – 1) = (4)(2) = 8 Rejection Region = 0.05. F. F0.05 = 4.459 Because F = 8.54 F0.05 = 4.459 reject H0.. F = 8.54. point of view. As an approximate procedure, however, the examination of the ratio MSBL/MSW is certainly reasonable. If it is large, it implies that the blocks had a large effect on the response variable and that they were probably helpful in improving the precision of the F-test for the primary factor’s means.7 In performing the analysis of variance, we may also conduct a pseudotest to see whether the average appraisals for each property are equal. If the null hypothesis is rejected, we have an indication that the blocking is necessary and that the randomized block design is justified. However, we should be careful to present this only as an indication and not as a precise test of hypothesis for the blocks. The output in Figure 12.9a provides the F-value and p-value for this pseudotest to determine if the blocking was a necessity. Because F 156.13 F0.05 3.838, we definitely have an indication that the blocking design was necessary. If a hypothesis test indicates blocking is not necessary, the chance of a Type II error for the primary hypothesis has been unnecessarily increased by the use of blocking. The reason is that by blocking we not only partition the sum of squares, we also partition the degrees of freedom. Therefore, the denominator of MSW is decreased, and MSW will most likely increase. If blocking isn’t needed, the MSW will tend to be relatively larger than if we had run a one-way design with independent samples. This can lead to failing to reject the null hypothesis for the primary test when it actually should have been rejected. Therefore, if blocking is indicated to be unnecessary, follow these rules: 1. If the primary H0 is rejected, proceed with your analysis and decision making. There is no concern. 2. If the primary H0 is not rejected, redo the study without using blocking. Run a one-way ANOVA with independent samples. Chapter Outcome 4.. EXAMPLE 12-3. PERFORMING A RANDOMIZED BLOCK ANALYSIS OF VARIANCE. Frankle Training & Education Frankle Training & Education conducts project management training courses throughout the eastern United States and Canada. The company has developed three 1,000-point practice examinations meant to simulate the certification exams given by the Project Management Institute (PMI). The Frankle leadership wants to know if the three exams will yield the same or different mean scores. To test this, a random sample of fourteen people who have been through the project management 7Many authors argue that the randomization restriction imposed by using blocks means that the F-ratio really is a test for the equality of the block means plus the randomization restriction. For a summary of this argument and references, see D. C. Montgomery, Design and Analysis of Experiments, 4th ed. (New York City: John Wiley & Sons, 1997) pp. 175–176..

(29) CHAPTER 12. |. Analysis of Variance. 503. training are asked to take the three tests. The order the tests are taken is randomized and the scores are recorded. A randomized block analysis of variance test can be performed using the following steps: Step 1 Specify the parameter of interest and formulate the appropriate null and alternative hypotheses. The parameter of interest is the mean test score for the three different exams, and the question is whether there is a difference among the mean scores for the three. The appropriate null and alternative hypotheses are H0: m1 m2 m3 HA: At least two populations have different means In this case, the Frankle leadership wants to control for variation in student ability by having the same students take all three tests. The test scores will be independent because the scores achieved by one student do not influence the scores achieved by other students. Here, the students are the blocks. Step 2 Specify the level of significance for conducting the tests. The tests will be conducted using a 0.05. Step 3 Select simple random samples from each population, and compute treatment means, block means, and the grand mean. The following sample data were observed: Student. Exam 1. Exam 2. Exam 3. Block Means. 1. 830. 647. 630. 702.33. 2. 743. 840. 786. 789.67. 3. 652. 747. 730. 709.67. 4. 885. 639. 617. 713.67. 5. 814. 943. 632. 796.33. 6. 733. 916. 410. 686.33. 7. 770. 923. 727. 806.67. 8. 829. 903. 726. 819.33. 9. 847. 760. 648. 751.67. 10. 878. 856. 668. 800.67. 11. 728. 878. 670. 758.67. 12. 693. 990. 825. 836.00. 13. 807. 871. 564. 747.33. 14. 901. 980. 719. 866.67. 793.57. 849.50. 668.00. 770.36 Grand mean. Treatment means. Step 4 Compute the sums of squares and complete the ANOVA table. Four sums of squares are required: Total Sum of Squares (Equation 12.3) ni. k. SST . ∑ ∑ (xij x )2 614,641.6 i1 j1. Sum of Squares Between (Equation 12.4) k. SSB . ∑ ni ( xi x )2 241,912.7 i1.

(30) 504. CHAPTER 12. |. Analysis of Variance. Sum of Squares Blocking (Equation 12.9) b. SSBL . ∑ k( x j x )2 116,605.0 j1. Sum of Squares Within (Equation 12.10) SSW SST - (SSB SSBL) 256,123.9 The ANOVA table is (see Table 12.6 format). Source. SS. df. MS. F-Ratio. Between blocks. 116,605.0. 13. 8,969.6. 0.9105. Between samples. 241,912.7. 2. 120,956.4. 12.2787. Within samples. 256,123.9. 26. 9,850.9. Total. 614,641.6. 41. Step 5 Test to determine whether blocking is effective. Fourteen people were used to evaluate the three tests. These people constitute the blocks, so if blocking is effective, the mean test scores across the three tests will not be the same for all 14 students. The null and alternative hypotheses are H0: mb1 mb2 mb3 ... mb14 HA: Not all means are equal (blocking is effective) As shown in step 3, the F-test statistic to test this null hypothesis is formed by F. MSBL 8, 969.6 0.9105 MSW 9, 850.9. The F-critical from the F-distribution, with a 0.05 and D1 13 and D2 26 degrees of freedom, can be approximated using the F-distribution table in Appendix H as Fa0.05 艐 2.15 The exact F-critical can be found using the FINV function in Excel or the Calc Probability Distributions command in Minitab as F0.05 2.119. Then, because F 0.9105 Fa0.05 2.119, do not reject the null hypothesis. This means that based on these sample data we cannot conclude that blocking was effective. Step 6 Conduct the main hypothesis test to determine whether the populations have equal means. We have three different project management exams being considered. At issue is whether the mean score is equal for the three exams. The appropriate null and alternative hypotheses are H0: m1 m2 m3 HA: At least two populations have different means As shown in the ANOVA table in Step 3, the F-test statistic for this null hypothesis is formed by F. MSB 120, 956.4 12.2787 MSW 9, 850.9.

(31) CHAPTER 12. |. Analysis of Variance. 505. The F-critical from the F-distribution, with a 0.05 and D1 2 and D2 26 degrees of freedom, can be approximated using the F-distribution table in Appendix H as Fa0.05 艐 3.40 The exact F-critical can be found using the FINV function in Excel or the Calc Probability Distributions command in Minitab as F 3.369. Then, because F 12.2787 Fa0.05 3.369, reject the null hypothesis. Even though in step 5 we concluded that blocking was not effective, the sample data still lead us to reject the primary null hypothesis and conclude that the three tests do not all have the same mean score. The Frankle leaders will now be interested in looking into the issue in more detail to determine which tests yield higher or lower average scores. (See Example 12-4.) END EXAMPLE. TRY PROBLEM 12-21 (pg. 507). Chapter Outcome 3.. Fisher’s Least Significant Difference Test An analysis of variance test can be used to test whether the populations of interest have different means. However, even if the null hypothesis of equal population means is rejected, the ANOVA does not specify which population means are different. In Section 12.1, we showed how the Tukey-Kramer multiple comparisons procedure is used to determine where the population differences occur for a one-way ANOVA design. Likewise, Fisher’s least significant difference test is one test for multiple comparisons that we can use for a randomized block ANOVA design. If the primary null hypothesis has been rejected, then we can compare the absolute differences in sample means from any two populations to the least significant difference (LSD), as computed using Equation 12.11. Fisher’s Least Significant Difference LSD t a 2 MSW. 2 b. (12.11). where: t 2 One-tailed value from Student’s t -distriibution for /2 and (k 1)(b 1) degrees of freedom MSW Mean square within from ANOVA table b Number of blocks k Number of levels of the main factor EXAMPLE 12-4. APPLYING FISHER’S LEAST SIGNIFICANT DIFFERENCE TEST. Frankle Training & Education (continued) Recall that in Example 12-3 the Frankle leadership used a randomized block ANOVA design to conclude that the three project management tests do not all have the same mean test score. To determine which populations (tests) have different means, you can use the following steps: Step 1 Compute the LSD statistic using Equation 12.11. LSD t a 2 MSW. 2 b. Using a significance level equal to 0.05, the t-critical value for (3 - 1) (14 - 1) 26 degrees of freedom is t0.05/2 2.0555 The mean square within from the ANOVA table (see Example 12-3, Step 3) is MSW 9,850.9.

(32) 506. CHAPTER 12. |. Analysis of Variance. The LSD is 2 2 2.0555 9, 850.9 77.11 14 b. LSD t a 2 MSW. Step 2 Compute the sample means from each population. x1 . ∑x 793.57 n. x2 . ∑x 849.50 n. x3 . ∑x 668 n. Step 3 Form all possible contrasts by finding the absolute differences between all pairs of sample means. Compare these to the LSD value. Absolute Difference. Comparison. Significant Difference. | x1 x2| |793.57 849.50| 55.93. 55.93 77.11. No. | x1 x3| |793.57 668| 125.57. 125.57 77.11. Yes. | x2 x3| |849.50 668| 181.50. 181.50 77.11. Yes. We infer, based on the sample data, that the mean score for test 1 exceeds the mean for test 3, and the mean for test 2 exceeds the mean for test 3. Now the manager may wish to evaluate test 3 to see why the scores are lower than for the other two tests. No difference is detected between tests 1 and 2. END EXAMPLE. TRY PROBLEM 12-22 (pg. 507). MyStatLab. 12-2: Exercises Skill Development 12-18. A study was conducted to determine if differences in new textbook prices exist between on-campus bookstores, off-campus bookstores, and Internet bookstores. To control for differences in textbook prices that might exist across disciplines, the study randomly selected 12 textbooks and recorded the price of each of the 12 books at each of the three retailers. You may assume normality and equal-variance assumptions have been met. The partially completed ANOVA table based on the study’s findings is shown here:. b. Based on the study’s findings, was it correct to block for differences in textbooks? Conduct the appropriate test at the a 0.10 level of significance. c. Based on the study’s findings, can it be concluded that there is a difference in the average price of textbooks across the three retail outlets? Conduct the appropriate hypothesis test at the a 0.10 level of significance. 12-19. The following data were collected for a randomized block analysis of variance design with four populations and eight blocks:. ANOVA Source of Variation Textbooks Retailer. SS. df. MS. 16,624 2.4. Error Total. Group 1. Group 2. Group 3. Group 4. Block 1 Block 2. 56 34. 44 30. 57 38. 84 50. Block 3. 50. 41. 48. 52. Block 4. 19. 17. 21. 30. Block 5. 33. 30. 35. 38. Block 6. 74. 72. 78. 79. Block 7. 33. 24. 27. 33. Block 8. 56. 44. 56. 71. F. 17,477.6. a. Complete the ANOVA table by filling in the missing sums of squares, the degrees of freedom for each source, the mean square, and the calculated F-test statistic for each possible hypothesis test..

(33) CHAPTER 12. a. State the appropriate null and alternative hypotheses for the treatments and determine whether blocking is necessary. b. Construct the appropriate ANOVA table. c. Using a significance level equal to 0.05, can you conclude that blocking was necessary in this case? Use a test-statistic approach. d. Based on the data and a significance level equal to 0.05, is there a difference in population means for the four groups? Use a p-value approach. e. If you found that a difference exists in part d, use the LSD approach to determine which populations have different means. 12-20. The following ANOVA table and accompanying information are the result of a randomized block ANOVA test. Summary. Count. Sum. Average. Variance. 1. 4. 443. 110.8. 468.9. 2. 4. 275. 68.8. 72.9. 3. 4. 1,030. 257.5. 1891.7. 4. 4. 300. 75.0. 433.3. 5. 4. 603. 150.8. 468.9. 6. 4. 435. 108.8. 72.9. 7. 4. 1,190. 297.5. 1891.7. 8. 4. 460. 115.0. 433.3. Sample 1. 8. 1,120. 140.0. 7142.9. Sample 2. 8. 1,236. 154.5. 8866.6. Sample 3. 8. 1,400. 175.0. 9000.0. Sample 4. 8. 980. 122.5. 4307.1. ANOVA Source of Variation Rows Columns Error Total. SS 199,899 11,884 5,317. df. MS. F. 7 28557.0 112.8 3 21. 3961.3 253.2. 15.7. p-value F-crit 0.0000. 2.488. 0.0000. 3.073. 217,100 31. a. How many blocks were used in this study? b. How many populations are involved in this test? c. Test to determine whether blocking is effective using an alpha level equal to 0.05. d. Test the main hypothesis of interest using a 0.05. e. If warranted, conduct an LSD test with a 0.05 to determine which population means are different. 12-21. The following sample data were recently collected in the course of conducting a randomized block analysis of variance. Based on these sample data, what conclusions should be reached about blocking effectiveness and about the means of the three populations involved? Test using a significance level equal to 0.05.. |. 507. Analysis of Variance. Block. Sample 1. Sample 2. Sample 3. 1 2 3 4 5 6. 30 50 60 40 80 20. 40 70 40 40 70 10. 40 50 70 30 90 10. 12-22. A randomized complete block design is carried out, resulting in the following statistics: x1. x2. x3. x4. Primary Factor. 237.15. 315.15. 414.01. 612.52. Block. 363.57. 382.22. 438.33. Source. SST 364,428. a. Determine if blocking was effective for this design. b. Using a significance level of 0.05, produce the relevant ANOVA and determine if the average responses of the factor levels are equal to each other. c. If you discovered that there were differences among the average responses of the factor levels, use the LSD approach to determine which populations have different means.. Business Applications 12-23. Frasier and Company manufactures four different products that it ships to customers throughout the United States. Delivery times are not a driving factor in the decision as to which type of carrier to use (rail, plane, or truck) to deliver the product. However, breakage cost is very expensive, and Frasier would like to select a mode of delivery that reduces the amount of product breakage. To help it reach a decision, the managers have decided to examine the dollar amount of breakage incurred by the three alternative modes of transportation under consideration. Because each product’s fragility is different, the executives conducting the study wish to control for differences due to type of product. The company randomly assigns each product to each carrier and monitors the dollar breakage that occurs over the course of 100 shipments. The dollar breakage per shipment (to the nearest dollar) is as follows:. Product 1 Product 2 Product 3 Product 4. Rail. Plane. Truck. $7,960 $8,399 $9,429 $6,022. $8,053 $7,764 $9,196 $5,821. $8,818 $9,432 $9,260 $5,676. a. Was Frasier and Company correct in its decision to block for type of product? Conduct the appropriate hypothesis test using a level of significance of 0.01. b. Is there a difference due to carrier type? Conduct the appropriate hypothesis test using a level of significance of 0.01..

(34) 508. CHAPTER 12. |. Analysis of Variance. 12-24. The California Lettuce Research Board was originally formed as the Iceberg Lettuce Advisory Board in 1973. The primary function of the board is to fund research on iceberg and leaf lettuce. The California Lettuce Research Board published research (M. Cahn and H. Ajwa, “Salinity Effects on Quality and Yield of Drip Irrigated Lettuce”) concerning the effect of varying levels of sodium absorption ratios (SAR) on the yield of head lettuce. The trials followed a randomized complete block design where variety of lettuce (Salinas and Sniper) was the main factor and salinity levels were the blocks. The measurements (the number of lettuce heads from each plot) of the kind observed were SAR. Salinas. 3 5 7 10. Sniper. 104 160 142 133. 109 163 146 156. a. Determine if blocking was effective for this design. b. Using a significance level of 0.05, produce the relevant ANOVA and determine if the average number of lettuce heads among the SARs are equal to each other. c. If you discovered that there were differences among the average number of lettuce heads among the SARs, use the LSD approach to determine which populations have different means. 12-25. CB Industries operates three shifts every day of the week. Each shift includes full-time hourly workers, nonsupervisory salaried employees, and supervisors/ managers. CB Industries would like to know if there is a difference among the shifts in terms of the number of hours of work missed due to employee illness. To control for differences that might exist across employee groups, CB Industries randomly selects one employee from each employee group and shift and records the number of hours missed for one year. The results of the study are shown here:. Hourly Nonsupervisory Supervisors/Managers. Shift 1. Shift 2. 48 31 25. 54 36 33. Shift 3 60 55 40. a. Develop the appropriate test to determine whether blocking is effective or not. Conduct the test at the a 0.05 level of significance. b. Develop the appropriate test to determine whether there are differences in the average number of hours missed due to illness across the three shifts. Conduct the test at the a 0.05 level of significance. c. If it is determined that a difference in the average hours of work missed due to illness is not the same for the three shifts, use the LSD approach to determine which shifts have different means. 12-26. Grant Thornton LLP is the U.S. member firm of Grant Thornton International, one of the six global accounting,. tax, and business advisory organizations. It provides firmwide auditing training for its employees in three different auditing methods. Auditors were grouped into four blocks according to the education they had received: (1) high school, (2) bachelor’s, (3) master’s, (4) doctorate. Three auditors at each education level were used—one assigned to each method. They were given a posttraining examination consisting of complicated auditing scenarios. The scores for the 12 auditors were as follows:. Doctorate Master’s Bachelor’s High School. Method 1. Method 2. 83 77 74 72. 81 75 73 70. Method 3 82 79 75 69. a. Indicate why blocking was employed in this design. b. Determine if blocking was effective for this design by producing the relevant ANOVA. c. Using a significance level of 0.05, determine if the average posttraining examination scores among the auditing methods are equal to each other. d. If you discovered that there were differences among the average posttraining examination scores among the auditing methods, use the LSD approach to determine which populations have different means.. Computer Database Exercises 12-27. Applebee’s International, Inc., is a U.S. company that develops, franchises, and operates the Applebee’s Neighborhood Grill and Bar restaurant chain. It is the largest chain of casual dining restaurants in the country, with over 1,500 restaurants across the United States. The headquarters is located in Overland Park, Kansas. The company is interested in determining if mean weekly revenue differs among three restaurants in a particular city. The file entitled Applebees contains revenue data for a sample of weeks for each of the three locations. a. Test to determine if blocking the week on which the testing was done was necessary. Use a significance level of 0.05. b. Based on the data gathered by Applebee’s, can it be concluded that there is a difference in the average revenue among the three restaurants? c. If you did conclude that there was a difference in the average revenue, use Fisher’s LSD approach to determine which restaurant has the lowest mean sales. 12-28. In a local community there are three grocery chain stores. The three have been carrying out a spirited advertising campaign in which each claims to have the lowest prices. A local news station recently sent a reporter to the three stores to check prices on several items. She found that for certain items each store had the lowest price. This survey didn’t really answer the question for consumers. Thus, the station set up a test in which 20 shoppers were given different lists of grocery items and were sent to each of the three chain stores. The sales receipts from each of the three stores are recorded in the data file Groceries..

(35) CHAPTER 12. a. Why should this price test be conducted using the design that the television station used? What was it attempting to achieve by having the same shopping lists used at each of the three grocery stores? b. Based on a significance level of 0.05 and these sample data, test to determine whether blocking was necessary in this example. State the null and alternative hypotheses. Use a test-statistic approach. c. Based on these sample data, can you conclude the three grocery stores have different sample means? Test using a significance level of 0.05. State the appropriate null and alternative hypotheses. Use a p-value approach. d. Based on the sample data, which store has the highest average prices? Use Fisher’s LSD test if appropriate. 12-29. The Cordage Institute, based in Wayne, Pennsylvania, is an international association of manufacturers, producers, and resellers of cordage, rope, and twine. It is a not-for-profit corporation that reports on research concerning these products. Although natural fibers like manila, sisal, and cotton were once the predominant rope materials, industrial synthetic fibers dominate the marketplace today, with most ropes made of nylon, polyester, or polypropylene. One of the principal traits of rope material is its breaking strength. A research project generated data given in the file entitled Knots. The data listed were gathered on 10 different days from 1 '' ⁄2 -diameter ropes.. |. Analysis of Variance. 509. a. Test to determine if inserting the day on which the testing was done was necessary. Use a significance level of 0.05. b. Based on the data gathered by the Cordage Institute, can it be concluded that there is a difference in the average breaking strength of nylon, polyester, and polypropylene? c. If you concluded that there was a difference in the average breaking strength of the rope material, use Fisher’s LSD approach to determine which material has the highest breaking strength. 12-30. When the world’s largest retailer, Wal-Mart, decided to enter the grocery marketplace in a big way with its “Super Stores,” it changed the retail grocery landscape in a major way. The other major chains such as Albertsons have struggled to stay competitive. In addition, regional discounters such as WINCO in the western United States have made it difficult for the traditional grocery chains. Recently, a study was conducted in which a “market basket” of products was selected at random from those items offered in three stores in Boise, Idaho: Wal-Mart, Winco, and Albertsons. At issue was whether the mean prices at the three stores are equal or whether there is a difference in prices. The sample data are in the data file called Food Price Comparisons. Using an alpha level equal to 0.05, test to determine whether the three stores have equal population mean prices. If you conclude that there are differences in the mean prices, perform the appropriate posttest to determine which stores have different means. END EXERCISES 12-2. Chapter Outcome 5.. 12.3 Two-Factor Analysis of Variance. with Replication Section 12.2 introduced an ANOVA procedure called the randomized complete block ANOVA. This method is used when we are interested in testing whether the means for the populations (levels) for a factor of interest are equal and we want to control for potential variation due to a second factor. The second factor is called the blocking factor. Consider again the Citizen’s State Bank property appraisal application, in which the bank was interested in determining whether the mean property valuation was the same for three different appraisal companies. The company used the same five properties to test each appraisal company in an attempt to reduce any variability that might exist due to the properties involved in the test. The properties were the blocks in that example, but we were not really interested in knowing whether the mean appraisal was the same for all properties. The single factor of interest was the appraisal companies. However, you will encounter many situations in which there are actually two or more factors of interest in the same study. In this section, we limit our discussion to situations involving only two factors. The technique that is used when we wish to analyze two factors is called two-factor ANOVA with replications..

(36) 510. CHAPTER 12. |. Analysis of Variance. Two-Factor ANOVA with Replications BUSINESS APPLICATION Excel and Minitab. tutorials. Excel and Minitab Tutorial. USING SOFTWARE FOR TWO-FACTOR ANOVA. FLY HIGH AIRLINES Like other major U.S. airlines, Fly High Airlines is concerned because many of its frequent flier program members have accumulated large quantities of free miles.8 The airline worries that at some point in the future there will be a big influx of customers wanting to use their miles and the airline will have difficulty satisfying all the requests at once. Thus, Fly High recently conducted an experiment in which each of three methods for redeeming frequent flier miles was offered to a sample of 16 customers. Each customer had accumulated more than 100,000 frequent flier miles. The customers were equally divided into four age groups. The variable of interest was the number of miles redeemed by the customers during the six-week trial. Table 12.7 shows the number of miles redeemed for each person in the study. These data are also contained in the Fly High file. Method 1 offered cash inducements to use miles. Method 2 offered discount vacation options, and method 3 offered access to a discount-shopping program through the Internet. The airline wants to know if the mean number of miles redeemed under the three redemption methods is equal and whether the mean miles redeemed is the same across the four age groups. A two-factor ANOVA design is the appropriate method in this case because the airline has two factors of interest. Factor A is the redemption offer type with three levels. Factor B is the age group of each customer with four levels. As shown in Table 12.7, there are 3 4 12 cells in the study and four customers in each cell. The measurements are called replications because we get four measurements (miles redeemed) at each combination of redemption offer level (factor A) and age level (factor B). Two-factor ANOVA follows the same logic as all other ANOVA designs. Each factor of interest introduces variability into the experiment. As was the case in Sections 12.1 and 12.2, we must find estimators for each source of variation. Identifying the appropriate sums of squares and then dividing each by its degrees of freedom does this. As in the one-way ANOVA, the total sum of squares (SST) in two-factor ANOVA can be partitioned. The SST is partitioned into four parts as follows: 1. One part is due to differences in the levels of factor A (SSA). 2. Another part is due to the levels of factor B (SSB). 3. Another part is due to the interaction between factor A and factor B (SSAB). (We will discuss the concept of interaction between factors later.) 4. The final component making up the total sum of squares is the sum of squares due to the inherent random variation in the data (SSE). TABLE 12.7. |. Fly High Airlines Frequent Flier Miles Data. Under 25 years. 25 to 40 years. 41 to 60 years. Over 60 years. 8Name. changed at request of the airline.. Cash Option. Vacation. Shopping. 30,000 0 25,000 0 60,000 0 0 25,000 40,000 25,000 25,000 0 0 5,000 25,000 50,000. 40,000 25,000 0 0 40,000 25,000 5,000 25,000 25,000 50,000 0 25,000 45,000 25,000 0 50,000. 25,000 25,000 75,000 5,000 30,000 25,000 50,000 0 25,000 50,000 0 0 30,000 25,000 25,000 50,000.

(37) CHAPTER 12. FIGURE 12.9. |. Analysis of Variance. 511. |. Two-Factor ANOVA— Partitioning of Total Sums of Squares. SSA. Factor A. SSB. Factor B. SSAB. Interaction between A and B. SSE. Inherent Variation (Error). SST. Figure 12.9 illustrates this partitioning concept. The variations due to each of these components will be estimated using the respective mean squares obtained by dividing the sums of squares by their degrees of freedom. If the variation accounted for by factor A and factor B is large relative to the error variation, we will tend to conclude that the factor levels have different means. Table 12.8 illustrates the format of the two-factor ANOVA. Three different hypotheses can be tested from the information in this ANOVA table. First, for factor A (redemption options), we have H0: mA1 mA2 mA3 HA: Not all factor A means are equal. TABLE 12.8. |. Basic Format of the Two-Factor ANOVA Table. Source of Variation. SS. df. MS. F-Ratio. Factor A. SSA. a-1. MSA. MSA MSE. Factor B. SSB. b-1. MSB. MSB MSE. AB interaction. SSAB. (a - 1)(b - 1). MSAB. MSAB MSE. Error. SSE. nT - ab. MSE. Total. SST. nT - 1. where: a = Number of levels of factor A b = Number of levells of factor B nT = Total number of observation in all cells SS A MS A = Mean square factor A = a −1 SS MS B = Mean square factor B = B b −1 SS AB uare interaction = MS AB = Mean squ ( a − 1) ( b − 1) SSE MSE = Mean square error = nT − ab.

(38) 512. CHAPTER 12. |. Analysis of Variance. For factor B (age levels): H0: mB1 mB2 mB3 mB4 HA: Not all factor B means are equal Test to determine whether interaction exists between the two factors: H0: Factors A and B do not interact to affect the mean response HA: Factors A and B do interact Here is what we must assume to be true to use two-factor ANOVA: Assumptions. 1. The population values for each combination of pairwise factor levels are normally distributed. 2. The variances for each population are equal. 3. The samples are independent. 4. The data measurement is interval or ratio level.. Although all the necessary values to complete Table 12.8 could be computed manually using the equations shown in Table 12.9, this would be a time-consuming task for even a small example because the equations for the various sum-of-squares values are quite complicated. Instead, you will want to use software such as Excel or Minitab to perform the two-factor ANOVA. Interaction Explained Before we share the ANOVA results for the Fly High Airlines example, a few comments regarding the concept of factor interaction are needed. Consider our example involving the two factors: miles-redemption-offer type and age category of customer. The response variable is the number of miles redeemed in the six weeks after the offer. Suppose one redemption-offer type is really better and results in a higher average miles being redeemed. If there is no interaction between age and offer type, then customers of all ages will have uniformly higher average miles redeemed for this offer type compared with the other offer types. If another offer type yields lower average miles, and if there is no interaction, all age groups receiving this offer type will redeem uniformly lower miles on average than the other offer types. Figure 12.10 illustrates a situation with no interaction between the two factors. However, if interaction exists between the factors, we would see a graph similar to the one shown in Figure 12.11. Interaction would be indicated if one age group redeemed higher average miles than the other age groups with one program but lower average miles than the other age groups on the other mileage-redemption programs. In general, interaction occurs if the differences in the averages of the response variable for the various levels of one factor— say, factor A—are not the same for each level of the other factor—say, factor B. The general idea is that interaction between two factors means that the effect due to one of them is not uniform across all levels of the other factor. Another example in which potential interaction might exist occurs in plywood manufacturing, where thin layers of wood called veneer are glued together to form plywood. One of the important quality attributes of plywood is its strength. However, plywood is made from different species of wood (pine, fir, hemlock, etc.), and different types of glue are available. If some species of wood work better (stronger plywood) with certain glues, whereas other species work better with different glues, we say that the wood species and the glue type interact. If interaction is suspected, it should be accounted for by subtracting the interaction term (SSAB) from the total sum-of-squares term in the ANOVA. From a strictly arithmetic point of view, the effect of computing SSAB and subtracting it from SST is that SSE is reduced. Also, if the corresponding variation due to interaction is significant, the variation within the factor levels (error) will be significantly reduced. This can make it easier to detect a difference in the population means if such a difference actually exists. If so, MSE will most likely be reduced..

(39) CHAPTER 12. TABLE 12.9. |. |. Analysis of Variance. 513. Two-Factor ANOVA Equations. Total Sum of Squares a. SST =. b. n. ∑ ∑ ∑ (x. ijk. −x. i =1 j =1 k =1. ). 2. (12.12). Sum of Squares Factor A a. SS A = bn. ∑(x. i. .. −x. i =1. ). 2. (12.13). Sum of Squares Factor B b. SSB = an. ∑(x. . j.. −x. j =1. ). 2. (12.14). Sum of Squares Interaction between Factors A and B. ∑∑(x a. SSAB = n. b. ij .. − x i .. − x. j . + x. i =1 j =1. ). 2. (12.15). Sum of Squares Error a. SSE =. b. n. ∑ ∑ ∑ (x. ijk. − x ij .. i =1 j =1 k =1. ). 2. where: a. x=. abn ijk. j =1 k =1. = Mean of each level of factor A. bn n. ∑∑ x i =1 k =1 n. xij. =. ∑ k =1. = Grand mean. n. ∑∑ x a. x. j . =. n. i =1 j =1 k =1 b. xi.. =. b. ∑∑∑. an xijk n. ijk. = Mean of each level of factor B. = Mean of each cell. a = Number of leveels of factor A b = Number of levels of factor B n = Number of replications in each cell. |. Differences between FactorLevel Mean Values: No Interaction. Mean Response. FIGURE 12.10. Factor B Level 1 Factor B Level 4 Factor B Level 3 Factor B Level 2 1. 2 Factor A Levels. 3. (12.16).

(40) CHAPTER 12. FIGURE 12.11. |. Analysis of Variance. |. Differences between FactorLevel Mean Values when Interaction is Present. Factor B Level 1 Mean Response. 514. Factor B Level 2 Factor B Level 3. Factor B Level 4. 1. 2. 3. Factor A Levels. This will produce a larger F-test statistic, which will more likely lead to correctly rejecting the null hypothesis. Thus by considering potential interaction, your chances of finding a difference in the factor A and factor B mean values, if such a difference exists, is improved. This will depend, of course, on the relative size of SSAB and the respective changes in the degrees of freedom. We will comment later on the appropriateness of testing the factor hypotheses if interaction is present. Note that to measure the interaction effect, the sample size for each combination of factor A and factor B must be 2 or greater. Excel and Minitab contain a data analysis tool for performing two-factor ANOVA with replications. They can be used to compute the different sums of squares and complete the ANOVA table. However, Excel requires that the data be organized in a special way, as shown in Figure 12.12.9 (Note, the first row must contain the names for each level of factor A. Also, column 1 contains the factor B level names. These must be in the row corresponding to the first sample item for each factor B level.) The Excel two-factor ANOVA output for this example is actually too big to fit on one screen. The top portion of the printout shows summary information for each cell, including. FIGURE 12.12. |. Excel 2007 Data Format for Two-Factor ANOVA for Fly High Airlines. Factor A Names. Excel 2007 Instruction: 1. Open file: Fly High.xls.. Factor B Names. 9Minitab uses the same data input format for two-factor ANOVA as for randomized block ANOVA (see Section 12.2)..

(41) CHAPTER 12. FIGURE 12.13. |. Analysis of Variance. 515. |. Excel 2007 Output (Part 1) for Two-Factor ANOVA for Fly High Airlines. Excel 2007 Instructions: 1. Open file: Fly High.xls. 2. On the Data tab, click Data Analysis. 3. Select ANOVA: Two Factor with Replication. 4. Define data range (include factor A and B labels). 5. Specify the number of rows per sample. 6. Specify alpha level.. means and variances (see Figure 12.13). At the bottom of the output (scroll down) is the ANOVA table shown in Figure 12.14a. Figure 12.14b shows the Minitab output. Excel changes a few labels. For example, factor A (the miles redemption options) is now referred to as Columns. Factor B (age groups) is referred to as Sample. In Figures 12.14a and 12.14b, we see all the information necessary to test whether the three redemption offers (factor A) result in different mean miles redeemed. H0: mA1 mA2 mA3 HA: Not all factor A means are equal a 0.05 Both the p-value and F-distribution approaches can be used. Because p-value (columns) 0.5614 a 0.05 the null hypothesis H0 is not rejected. (Also, F 0.59 F0.05 3.259; the null hypothesis is not rejected.) This means the test data do not indicate that a difference exists between the average amounts of mileage redeemed for the three types of offers. None seems superior to the others. We can also test to determine if age level makes a difference in frequent flier miles redeemed. H0: mB1 mB2 mB3 mB4 HA: Not all factor B means are equal a 0.05.

(42) 516. CHAPTER 12. FIGURE 12.14A. |. Analysis of Variance. |. Excel 2007 Output (Part 2) for Two-Factor ANOVA for Fly High Airlines. Excel terminology: Sample Factor B (age) Columns Factor A (program) Within Error. F-statistics and p-values for testing the three different hypotheses of interest in the two-factor test.. In Figure 12.14a, we see that the p-value 0.8796 a 0.05 (Also, F 0.22 F0.05 2.866) Thus, the null hypothesis is not rejected. The test data do not indicate that customer age significantly influences the average number of frequent flier miles that will be redeemed. Finally, we can also test for interaction. The null hypothesis is that no interaction exists. The alternative is that interaction does exist between the two factors. The ANOVA table in Figure 12.14b shows a p-value of 0.939, which is greater than a 0.05. Based on these data,. FIGURE 12.14B. |. Minitab Output for Two-Factor ANOVA for Fly High Airlines. F-statistics and p-values for testing the three different hypotheses of interest in the two-factor test. Minitab Instructions: 1. Open file: Fly High.MTW. 2. Choose Stat ANOVA Two-way. 3. In Response, enter the data Column (Value).. 4. In Row Factor, enter main factor indicator column (Redemption Option). 5. In Column Factor, enter the block indicator column (Age). 6. Click OK..

(43) CHAPTER 12. |. Analysis of Variance. 517. interaction between the two factors does not appear to exist. This would indicate that the differences in the average mileage redeemed between the various age categories are the same for each redemption-offer type.. A Caution about Interaction In this example, the sample data indicate that no interaction between factors A and B is present. Based on the sample data, we were unable to conclude that the three redemption offers resulted in different average frequent flier miles redeemed. Finally, we were unable to conclude that a difference in average miles redeemed occurred over the four different age groups. The appropriate approach is to begin by testing for interaction. If the interaction null hypothesis is not rejected, proceed to test the factor A and factor B hypotheses. However, if we conclude that interaction is present between the two factors, hypothesis tests for factors A and B generally should not be performed. The reason is that findings of significance for either factor might be due only to interactive effects when the two factors are combined and not to the fact that the levels of the factor differ significantly. It is also possible that interactive effects might mask differences between means of one of the factors for at least some of the levels of the other factor. If significant interaction is present, the experimenter may conduct a one-way ANOVA to test the levels of one of the factors, for example, factor A, using only one level of the other factor, factor B. Thus, when conducting hypothesis tests for a two-factor ANOVA: 1. Test for interaction. 2. If interaction is present, conduct a one-way ANOVA to test the levels of one of the factors using only one level of the other factor.10 3. If no interaction is found, test factor A and factor B.. 10There are, however, some instances in which the effects of the factors provide important and meaningful information even though interaction is present. See D. R. Cox, Planning of Experiments (New York City: John Wiley and Sons, 1992), pp. 107–108.. MyStatLab. 12-3: Exercises Skill Development 12-31. Consider the following data from a two-factor experiment: Factor A Factor B Level 1 Level 2. Level 1. Level 2. Level 3. 43 49 50 53. 25 26 27 31. 37 45 46 48. a. Determine if there is interaction between factor A and factor B. Use the p-value approach and a significance level of 0.05. b. Does the average response vary among the levels of factor A? Use the test-statistic approach and a significance level of 0.05.. c. Determine if there are differences in the average response between the levels of factor B. Use the p-value approach and a significance level of 0.05. 12-32. Examine the following two-factor analysis of variance table: Source Factor A Factor B AB Interaction Error Total. SS. df. 162.79. 4. 262.31 _______ 1,298.74. 12 __ 84. MS. F-Ratio. 28.12. a. Complete the analysis of variance table. b. Determine if interaction exists between factor A and factor B. Use a 0.05..

(44) 518. CHAPTER 12. |. Analysis of Variance. c. Determine if the levels of factor A have equal means. Use a significance level of 0.05. d. Does the ANOVA table indicate that the levels of factor B have equal means? Use a significance level of 0.05. 12-33. Consider the following data for a two-factor experiment:. 12-35. A two-factor experiment yielded the following data: Factor A Factor B. Level 1. Level 2. Level 3. Level 1. 375 390. 402 396. 395 390. Level 2. 335 342. 336 338. 320 331. Level 3. 302 324. 485 455. 351 346. Factor A. Level 1. Level 1. Level 2. Level 3. 33 31 35. 30 42 36. 21 30 30. 23 32 27. 30 27 25. 21 33 18. Factor B Level 2. a. Based on the sample data, do factors A and B have significant interaction? State the appropriate null and alternative hypotheses and test using a significance level of 0.05. b. Based on these sample data, can you conclude that the levels of factor A have equal means? Test using a significance level of 0.05. c. Do the data indicate that the levels of factor B have different means? Test using a significance level equal to 0.05. 12-34. Consider the following partially completed two-factor analysis of variance table, which is an outgrowth of a study in which factor A has four levels and factor B has three levels. The number of replications was 11 in each cell. Source of Variation. SS. df. Factor A Factor B AB Interaction Error Total. 345.1. 4. 1,123.2 256.7 1,987.3. 12. MS. F-Ratio. a. Determine if there is interaction between factor A and factor B. Use the p-value approach and a significance level of 0.05. b. Given your findings in part a, determine any significant differences among the response means of the levels of factor A for level 1 of factor B. c. Repeat part b at levels 2 and 3 of factor B, respectively.. Business Applications 12-36. A PEW Research Center survey concentrated on the issue of weight loss. It investigated how many pounds heavier the respondents were than their perceived ideal weight. It investigated whether these perceptions differed among different regions of the country and gender of the respondents. The following data (pounds) reflect the survey results: Region Gender Men Women. West. Midwest. South. Northeast. 14 13 16 13. 18 16 20 18. 15 15 17 17. 16 14 17 13. 28.12. 84. a. Complete the analysis of variance table. b. Based on the sample data, can you conclude that the two factors have significant interaction? Test using a significance level equal to 0.05. c. Based on the sample data, should you conclude that the means for factor A differ across the four levels or the means for factor B differ across the three levels? Discuss. d. Considering the outcome of part b, determine what can be said concerning the differences of the levels of factors A and B. Use a significance level of 0.10 for any hypothesis tests required. Provide a rationale for your response to this question.. a. Determine if there is interaction between Region and Gender. Use the p-value approach and a significance level of 0.05. b. Given your findings in part a, determine any significant differences among the discrepancy between the average existing and desired weights in the regions. c. Repeat part b for the Gender factor. 12-37. A manufacturing firm produces a single product on three production lines. Because the lines were developed at different points in the firm’s history, they use different equipment. The firm is considering changing the layouts of the lines and would like to know what effects different layouts would have on production output. A study was conducted to determine the average output for each line over four randomly selected weeks using each of the three layouts under consideration. The output (in hundreds of units.

(45) CHAPTER 12. produced) was measured for each line for each of the four weeks for each layout being evaluated. The results of the study are as follows: Line 1. Line 2. Line 3. Layout 1. 12 10 12 12. 12 14 10 11. 11 10 14 12. Layout 2. 17 18 15 17. 16 15 16 17. 18 18 17 18. Layout 3. 12 12 11 11. 10 11 11 11. 11 11 10 12. a. Based on the sample data, can the firm conclude that there is an interaction effect between the type of layout and the production line? Conduct the appropriate test at the 0.05 level of significance. b. At the 0.05 level of significance, can the firm conclude that there is a difference in mean output across the three production lines? c. At the 0.05 level of significance, can the firm conclude that there is a difference in mean output due to the type of layout used? 12-38. A popular consumer staple was displayed in different locations in the same aisle of a grocery store to determine what, if any, effect different placement might have on its sales. The product was placed at one of three heights on the aisle—low, medium, and high—and at one of three locations in the store—at the front of the store, at the middle of the store, or at the rear of the store. The number of units sold of the product at the various height and distance combinations was recorded each week for five weeks. The following results were obtained: Front. Middle. Rear. Low. 125 143 150 138 149. 195 150 160 195 162. 126 136 129 136 147. Medium. 141 137 145 150 130. 186 161 157 165 194. 128 133 148 145 141. High. 129 141 148 130 137. 157 152 186 164 176. 149 137 138 126 138. |. Analysis of Variance. 519. a. At the 0.10 level of significance, is there an interaction effect? b. At the 0.10 level of significance, does the height of the product’s placement have an effect on the product’s mean sales? c. At the 0.10 level of significance, does the location in the store have an effect on the product’s mean sales?. Computer Database Exercises 12-39. Mt. Jumbo Plywood Company makes plywood for use in furniture production. The first major step in the plywood process is the peeling of the logs into thin layers of veneer. A lathe that rotates the logs through a knife that peels the log into layers 3/8-inch thick conducts the peeling process. Ideally, when a log is reduced to a 4-inch core diameter, the lathe releases the core and a new log is loaded onto the lathe. However, a problem called “spinouts” occurs if the lathe kicks out a core that has more than 4 inches left. This wastes wood and costs the company money. Before going to the lathe, the logs are conditioned in a heated water-filled vat to warm the logs. The company is concerned that improper log conditioning may lead to excessive spinouts. Two factors are believed to affect the core diameter: the vat temperature and the time the logs spend in the vat prior to peeling. The lathe supervisor has recently conducted a test during which logs were peeled at each combination of temperature and time. The sample data for this experiment are in the data file called Mt Jumbo. The data are the core diameters in inches. a. Based on the sample data, is there an interaction between water temperature and vat hours? Test using a significance level of 0.01. Discuss what interaction would mean in this situation. Use a p-value approach. b. Based on the sample data, is there a difference in mean core diameter at the three water temperatures? Test using a significance level of 0.01. c. Do the sample data indicate a difference in mean core diameter across the three vat times analyzed in this study? Use a significance level of 0.10 and a p-value approach. 12-40. A psychologist is conducting a study to determine whether there are differences between the ability of history majors and mathematics majors to solve various types of puzzles. Five mathematics majors and five history majors were randomly selected from the students at a liberal arts college in Maine. Each student was given five different puzzles to complete: a crossword puzzle, a cryptogram, a logic problem, a maze, and a cross sums. The time in minutes (rounded to the nearest minute) was recorded for each student in the study. If a student could not complete a puzzle in the maximum time allowed, or completed a puzzle incorrectly, then a penalty of 10 minutes was added to his or her time. The results are shown in the file Puzzle..

(46) 520. CHAPTER 12. |. Analysis of Variance. a. Plot the mean time to complete a puzzle for each puzzle type by major. What conclusion would you reach about the interaction between major and puzzle type? b. At the 0.05 level of significance, is there an interaction effect? c. If interaction is present, conduct a one-way ANOVA to test whether the mean time to complete a puzzle for history majors depends on the type of puzzle. Does the mean time to complete a puzzle for mathematics majors depend on the type of puzzle? Conduct the one-way ANOVA tests at a level of significance of 0.05. 12-41. The Iams Company sells Eukanuba and Iams premium dog and cat foods (dry and canned) in 70 countries. Iams makes dry dog and cat food at plants in Lewisburg, Ohio; Aurora, Nebraska; Henderson, North Carolina; Leipsic, Ohio; and Coevorden, The Netherlands. Its Eukanuba brand dry dog foods come in five formulas. One of the ingredients is of particular importance: crude fat. To discover if there is a difference in the average percent of crude fat among the five formulas and among the production sites, the sample data found in the file entitled Eukanuba were obtained. a. Determine if there is interaction between the Eukanuba formulas and the plant sites where they are produced. Use the p-value approach and a significance level of 0.025.. b. Given your findings in part a, determine if there is a difference in the average percentage of crude fat in the Eukanuba formulas. Use a test-statistic approach with a significance level of 0.025. c. Repeat part b for the plant sites in which the formulas are produced. d. One important finding will be whether the average percent of crude fat for the “Reduced Fat” formula is equal to the advertised 9%. Conduct a relevant hypothesis test to determine this using a significance level of 0.05. 12-42. The amount of sodium in food has been of increasing concern due to its health implications. Beers from various producers have been analyzed for their sodium content. The file entitled Sodium contains the amount of sodium (mg) discovered in 12 fluid ounces of beer produced by the four major producers: Anheuser-Busch Inc., Miller Brewing Co., Coors Brewing Co., and Pabst Brewing Co. The types of beer (ales, lager, and specialty beers) were also scrutinized in the analysis. a. Determine if there is interaction between the producer and the type of beer. Use a significance level of 0.05. b. Given your findings in part a, determine if there is a difference in the average amount of sodium in 12 ounces of beer among the producers of the beer. Use a significance level of 0.05. c. Repeat part b for the types of beer. Use a significance level of 0.05. END EXERCISES 12-3.

(47) CHAPTER 12. |. Analysis of Variance. 521. Visual Summary Chapter 12: A group of procedures known as analysis of variance, ANOVA, was introduced in this chapter. The procedures presented here represent a wide range of techniques used to determine whether three or more populations have equal means. Depending upon the experimental design employed, there are different hypothesis tests that must be performed. The one-way design is used to test whether three or more populations have equal mean values when the samples from the populations are considered to be independent. If an outside source of variation is present, the randomized complete block design is used. If there are two factors of interest and we wish to test to see whether the levels of each separate factor have equal means, then a two-factor design with replications is used. Regardless of which method is used, if the null hypothesis of equal means is rejected, methods presented in this chapter enable you to determine which pairs of populations have different means. Analysis of variance is actually an array of statistical techniques used to test hypotheses related to these (and many other) experimental designs. By completing this chapter, you have been introduced to some of the most popular ANOVA techniques.. 12.1 One-Way Analysis of Variance (pg. 476 – 497) Summary There are often circumstances in which independent samples are obtained from two or more levels of a single factor to determine if the levels have equal means. The experimental design which produces the data for this experiment is referred to as a completely randomized design. The appropriate statistical tool for conducting the hypothesis test related to this experimental design is analysis of variance. Because this procedure addresses an experiment with only one factor, it is called a one-way analysis of variance. The concept acknowledges that the data produced by the completely randomized design will not all be the same value. This indicates that there is variation in the data. This is referred to as the total variation. Each level’s data exhibits dispersion as well and is called the within-sample variation. The dispersion between the factor levels is designated as the between-sample variation. The ratio between estimators of these two variances forms the test statistic used to detect differences in the levels’ means. If the null hypothesis of equal means is rejected, the Tukey-Kramer procedure was presented to determine which pairs of populations have different means.. Outcome 1. Understand the basic logic of analysis of variance. Outcome 2. Perform a hypothesis test for a single-factor design using analysis of variance manually and with the aid of Excel or Minitab software. Outcome 3. Conduct and interpret post-analysis of variance pairwise comparisons procedures.. 12.2 Randomized Complete Block Analysis of Variance (pg. 497– 509) Summary Section 12.1 addressed procedures for determining the equality of three or more population means of the levels of a single factor. In this case all other unknown sources of variation are addressed by the use of randomization. However, there are situations in which an additional known factor with at least two levels is impinging on the response variable of interest. A technique called blocking is used in such cases to eliminate the effects of the levels of the additional known factor on the analysis of variance. As was the case in Section 12.1, a multiple comparisons procedure known as Fisher’s least significant difference can be used to determine any difference among the population means of a randomized block ANOVA design. Outcome 3. Conduct and interpret post-analysis of variance pairwise comparisons procedures. Outcome 4. Recognize when randomized block analysis of variance is useful and be able to perform analysis of variance on a randomized block design.. Conclusion 12.3 Two-Factor Analysis of Variance with Replication (pg. 509 – 520) Summary Two-factor ANOVA follows the same logic as was the case in the one-way and complete block ANOVA designs. In the latter two procedures, there is only one factor of interest. In the two-factor ANOVA, there are two factors of interest. Each factor of interest introduces variability into the experiment. There are circumstances in which the presence of a level of one factor affects the relationship between the response variable and the levels of the other factor. This effect is called interaction and, if present, is another source of variation. As was the case in Sections 12.1 and 12.2, we must find estimators for each source of variation. Identifying the appropriate sums of squares and then dividing each by its degrees of freedom does this. If the variation accounted for by factor A, factor B, and interaction is large relative to the error variation, we will tend to conclude that the factor levels have different means. The technique that is used when we wish to analyze two factors as described above is called two-factor ANOVA with replications.. Outcome 5. Perform analysis of variance on a two-factor design of experiments with replications using Excel or Minitab and interpret the output.. Chapter 12 has illustrated there are many instances in business in which we are interested in testing to determine whether three or more populations have equal means. The technique for performing such tests is called analysis of variance. If the sample means tend to be substantially different, then the hypothesis of equal means is rejected. The most elementary ANOVA experimental design is the one-way design, which is used to test whether three or more populations have equal mean values when the samples from the populations are considered to be independent. If we need to control for an outside source of variation (analogous to forming paired samples in Chapter 10), we can use the randomized complete block design. If there are two factors of interest and we wish to test to see whether the levels of each separate factor have equal means, then a two-factor design with replications is used..

(48) 522. CHAPTER 12. |. Analysis of Variance. Equations (12.10) Sum of Squares Within pg. 499. (12.1) Partitioned Sum of Squares pg. 478. SSW SST (SSB SSBL). SST SSB SSW. (12.11) Fisher’s Least Significant Difference pg. 505. (12.2) Hartley’s F-Test Statistic pg. 480. Fmax . 2 smax 2 smin. LSD t a 2 MSW (12.12) Total Sum of Squares pg. 513. (12.3) Total Sum of Squares pg. 481. SST . a. ni. k. ∑ ∑ (xij. SST . x )2. b. ∑ ∑ ∑ (xijk x )2. (12.13) Sum of Squares Factor A pg. 513. (12.4) Sum of Squares Between pg. 482 k. a. ∑ ni ( xi x )2. SSA bn. i1. (12.14) Sum of Squares Factor B pg. 513. SSW SST SSB (12.6). SSW . b. SSB an. ni. ∑ (x. j. x )2 j1. ∑ ∑ (xij xi )2 i1 j1. (12.15) Sum of Squares Interaction between Factors A and B pg. 513. (12.7) Tukey-Kramer Critical Range pg. 489. Critical range q1− a. ∑ (xi.. x )2 i1. (12.5) Sum of Squares Within pg. 482. k. n. i1 j1 k1. i1 j1. SSB . 2 b. a. MSW ⎛ 1 1⎞ ⎜ ⎟ 2 ⎝ ni n j ⎠. (12.8) Sum of Squares Partitioning for Randomized Complete. SSAB n. b. ∑ ∑ (xij. xi..− x. j.+ x )2 i1 j1. (12.16) Sum of Squares Error pg. 513. Block Design pg. 499. a. SSE . SST SSB SSBL SSW. b. n. ∑ ∑ ∑ (xijk xij.)2 i1 j1 k1. (12.9) Sum of Squares for Blocking pg. 499 b. SSBL . ∑ k ( x j x )2 j1. Key Terms Balanced design pg. 476 Between-sample variation pg. 477 Completely randomized design pg. 476. Experiment-wide error rate pg. 488 Factor pg. 476 Levels pg. 476. Chapter Exercises Conceptual Questions 12-43. A one-way analysis of variance has just been performed. The conclusion reached is that the null hypothesis stating the population means are equal has not been rejected. What would you expect the. One-way analysis of variance pg. 476 Total variation pg. 477 Within-sample variation pg. 477. MyStatLab Tukey-Kramer procedure for multiple comparisons to show if it were performed for all pairwise comparisons? Discuss. 12-44. In journals related to your major locate two articles where tests of three or more population means were.

(49) CHAPTER 12. important. Discuss the issue being addressed, how the data were collected, the results of the statistical test, and any conclusions drawn based on the analysis. 12-45. Discuss why in some circumstances it is appropriate to use the randomized complete block design. Give an example other than those discussed in the text where this design could be used. 12-46. A two-way analysis of variance experiment is to be conducted to examine CEO salaries ($K) as a function of the number of years the CEO has been with the company and the size of the company’s sales. The years spent with the company are categorized into 0–3, 4–6, 7–9, and 9 years. The size of the company is categorized using sales ($million) per year into three categories: 0–50, 51–100, and 100. a. Describe the factors associated with this experiment. b. List the levels of each of the factors identified in part a. c. List the treatment combinations of the experiment. d. Indicate the components of the ANOVA table that will be used to explain the variation in the CEOs’ salaries. e. Determine the degrees of freedom for each of the components in the ANOVA if two replications are used. 12-47. In any of the multiple comparison techniques (Tukey-Kramer, LSD), the estimate of the withinsample variance uses data from the entire experiment. However, if one were to do a two-sample t-test to determine if there were a difference between any two means, the estimate of the population variances would only include data from the two specific samples under consideration. Explain this seeming discrepancy.. Business Applications 12-48. The development of the Internet has made many things possible, in addition to downloading music. In particular, it allows an increasing number of people to telecommute, or work from home. Although this has many advantages, it has required some companies to provide employees with the necessary equipment, which has made your job as office manager more difficult. Your company provides computers, printers, and Internet service to a number of engineers and programmers, and although the cost of hardware has decreased, the cost of supplies, in this case printer cartridges, has not. Because of the cost of name-brand printer replacement cartridges, several companies have entered the secondary market. You are currently considering offers from four companies. The prices are equivalent, so you will make your decision based on length of service, specifically number of pages printed. You have given samples of four cartridges to. |. 523. Analysis of Variance. 16 programmers and engineers and have received the following values: Supplier A. Supplier B. Supplier C. Supplier D. 424 521 650 422. 650 725 826 722. 521 601 590 522. 323 383 487 521. a. Using a significance level equal to 0.01, what conclusion should you reach about the four manufacturers’ printer cartridges? Explain. b. If the test conducted in part a reveals that the null hypothesis should be rejected, which supplier should be used? Is there one or more you can eliminate based on these data? Use the appropriate test for multiple comparisons. Discuss. 12-49. The W. Atlee Burpee Co. was founded in Philadelphia in 1876 by an 18-year-old with a passion for plants and animals and a mother willing to lend him $1,000 of “seed money” to get started in business. Today, it is owned by George Ball Jr. One of Burpee’s most demanded seeds is corn. Burpee continues to increase production to meet the growing demand. To this end, an experiment such as presented here is used to determine the combination of fertilizer and seed type that produces the largest number of kernels per ear.. Seed A Seed B Seed C. Fert. 1. Fert. 2. Fert. 3. Fert. 4. 807 800 1,010 912 1,294 1,097. 995 909 1,098 987 1,286 1,099. 894 907 1,000 801 1,298 1,099. 903 904 1,008 912 1,199 1,201. a. Determine if there is interaction between the type of seed and the type of fertilizer. Use a significance level of 0.05. b. Given your findings in part a, determine if there is a difference in the average number of kernels per ear among the seeds. c. Repeat part b for the types of fertilizer. Use a significance level of 0.05. 12-50. Recent news stories have highlighted errors national companies such as H & R Block have made in preparing taxes. However, many people rely on local accountants to handle their tax work. A local television station, which prides itself on doing investigative reporting, decided to determine whether similar preparation problems occur in its market area. The station selected eight people to have their taxes figured at each of three accounting offices in its market.

(50) 524. CHAPTER 12. |. Analysis of Variance. area. The following data shows the tax bills (in dollars) as figured by each of the three accounting offices: Return 1 2 3 4 5 6 7 8. Office 1 4,376.20 5,678.45 2,341.78 9,875.33 7,650.20 1,324.80 2,345.90 15,468.75. Office 2. Office 3. 5,100.10 6,234.23 2,242.60 10,300.30 8,002.90 1,450.90 2,356.90 16,080.70. 4,988.03 5,489.23 2,121.90 9,845.60 7,590.88 1,356.89 2,345.90 15,376.70. a. Discuss why this test was conducted as a randomized block design. Why did the station think it important to have all three offices do the returns for each of the eight people? b. Test to determine whether blocking was necessary in this situation. Use a significance level of 0.01. State the null and alternative hypotheses. c. Based on the sample data, can the station report statistical evidence that there is a difference in the mean taxes due on tax returns? Test using a significance level of 0.01. State the appropriate null and alternative hypotheses. d. Referring to part c, if you did conclude that a difference exists, use the appropriate test to determine which office has the highest mean tax due. 12-51. A senior analyst working for Ameritrade has reviewed purchases his customers have made over the last six months. He has categorized the mutual funds purchased into eight categories: (1) Aggressive Growth (AG), (2) Growth (G), (3) Growth-Income (G-I), (4) Income Funds (IF), (5) International (I), (6) Asset Allocation (AA), (7) Precious Metal (PM), and (8) Bond (B). The percentage gains accrued by 3 randomly selected customers in each group are as follows: Mutual Fund. AG. G. 6 7 12. 7 -2 0. G-I. IF. I. 5 6 2. 1 0 6. 14 13 10. AA -3 7 7. PM 5 7 5. B -1 3 2. a. Develop the appropriate ANOVA table to determine if there is a difference in the average percentage gains accrued by his customers among the mutual fund types. Use a significance level of 0.05. b. Use the Tukey-Kramer procedure to determine which mutual fund type has the highest average percentage gain. Use an experiment-wide error rate of 0.05. 12-52. Anyone who has gone into a supermarket or discount store has walked by displays at the end of aisles. These are referred to as endcaps and are often prized because they increase the visibility of products. A manufacturer. of tortilla chips has recently developed a new product, a blue corn tortilla chip. The manufacturer has arranged with a regional supermarket chain to display the chips on endcaps at four different locations in stores that have had similar weekly sales in snack foods. The dollar volumes of sales for the last six weeks in the four stores are as follows: Store Week. 1. 2. 3. 4. 1 2 3 4 5 6. $1,430 $2,200 $1,140 $ 880 $1,670 $ 990. $ 980 $1,400 $1,200 $1,300 $1,300 $ 550. $1,780 $2,890 $1,500 $1,470 $2,400 $1,600. $2,300 $2,680 $2,000 $1,900 $2,540 $1,900. a. If the assumptions of a one-way ANOVA design are satisfied in this case, what should be concluded about the average sales at the four stores? Use a significance level of 0.05. b. Discuss whether you think the assumptions of a one-way ANOVA are satisfied in this case and indicate why or why not. If they are not, what design is appropriate? Discuss. c. Perform a randomized block analysis of variance test using a significance level of 0.05 to determine whether the mean sales for the four stores are different. d. Comment on any differences between the means in parts b and c. e. Suppose blocking was necessary and the researcher chooses not to use blocks. Discuss what impact this could have on the results of the analysis of variance. f. Use Fisher’s least significant difference procedure to determine which, if any, stores have different true average weekly sales.. Computer Database Exercises 12-53. A USA Today editorial addressed the growth of compensation for corporate CEOs. As an example, quoting a study made by BusinessWeek, USA Today indicated that the pay packages for CEOs have increased almost sevenfold on average from 1994 to 2004. The file entitled CEODough contains the salaries of CEOs in 1994 and in 2004, adjusted for inflation. a. Use analysis of variance to determine if there is a difference in the CEOs’ average salaries between 1994 and 2004, adjusted for inflation. b. Determine if there is a difference in the CEOs’ average salaries between 1994 and 2004 using the two-sample t-test procedure. c. What is the relationship between the two test statistics and the critical values, respectively, that were used in parts a and b?.

(51) CHAPTER 12. 12-54. The use of high-technology materials and design has dramatically impacted the game of golf. Not only are the professionals hitting the balls farther but so too are the average players. This has led to a rush to design new and better equipment. Gordon Manufacturing produces golf balls. Recently, Gordon developed a golf ball made from a space-age material. This new golf ball promises greater distance off the tee. To test Gordon Manufacturing’s claim, a test was set up to measure the average distance of four different golf balls (the New Gordon, Competitor 1, Competitor 2, Competitor 3) hit by a driving machine using three different types of drivers (Driver 1, Driver 2, Driver 3). The results (rounded to the nearest yard) are listed in the data file called Gordon. Conduct a test to determine if there are significant differences due to type of golf ball. a. Does there appear to be interaction between type of golf ball and type of driver? b. Conduct a test to determine if there is a significant effect due to the type of driver used. c. How could the results of the tests be used by Gordon Manufacturing? 12-55. Maynards, a regional home improvement store chain located in the Intermountain West, is considering upgrading to a new series of scanning systems for its automatic checkout lanes. Although scanners can save customers a great deal of time, scanners will sometimes misread an item’s price code. Before investing in one of three new systems, Maynards would like to determine if there is a difference in scanner accuracy. To investigate possible differences in scanner accuracy, 30 shopping carts were randomly selected from customers at the Golden, Colorado, store. The 30 carts differed from each other in both the number and types of items each contained. The items in each cart were then scanned by the three new scanning systems under consideration as well as by the current scanner used in all stores at a specially designed test facility for the purposes of the analysis. Each item was also checked manually, and a count was kept of the number of scanning errors made by each scanner for each basket. Each of the scannings was repeated 30 times, and the average number of scanning errors was determined. The sample data are in the data file called Maynards. a. What type of experimental design did Maynards use to test for differences among scanning systems? Why was this type of design selected? b. State the primary hypotheses of interest for this test. c. At the 0.01 level of significance, is there a difference in the average number of errors among the four different scanners? d. (1) Is there a difference in the average number of errors by cart? (2) Was Maynards correct in blocking by cart?. |. Analysis of Variance. 525. e. If you determined that there is a difference in the average number of errors among the four different scanners, identify where those differences exist. f. Do you think that Maynards should upgrade from its existing scanner to Scanner A, Scanner B, or Scanner C? What other factors may it want to consider before making a decision? 12-56. PhoneEx provides call center services for many different companies. A large increase in its business has made it necessary to establish a new call center. Four cities are being considered—Little Rock, Wichita, Tulsa, and Memphis. The new center will employ approximately 1,500 workers, and PhoneEx will transfer 75 people from its Omaha center to the new location. One concern in the choice of where to locate the new center is the cost of housing for the employees who will be moving there. To help determine whether significant housing cost differences exist across the competing sites, PhoneEx has asked a real estate broker in each city to randomly select a list of 33 homes between 5 and 15 years old and ranging in size between 1,975 and 2,235 square feet. The prices (in dollars) that were recorded for each city are contained in the file called PhoneEx. a. At the 0.05 level of significance, is there evidence to conclude that the average price of houses between 5 and 15 years old and ranging in size between 1,975 and 2,235 square feet is not the same in the four cities? Use the p-value approach. b. At the 0.05 level of significance, is there a difference in average housing price between Wichita and Little Rock? Between Little Rock and Tulsa? Between Tulsa and Memphis? c. Determine the sample size required to estimate the average housing price in Wichita to within $500 with a 95% confidence level. Assume the required parameters’ estimates are sufficient for this calculation. 12-57. An investigation into the effects of various levels of nitrogren (M. L. Vitosh, Tri-State Fertilizer Recommendations for Corn, Soybeans, Wheat and Alfalfa, Bulletin E-2567) at Ohio State University addressed the pounds per acre of nitrogen required to produce certain yield levels of corn on fields that had previously been planted with other crops. The file entitled Nitrogen indicates the amount of nitrogen required to produce given quantities of corn planted. a. Determine if there is interaction between the yield levels of corn and the crop that had been previously planted in the field. Use a significance level of 0.05. b. Given your findings in part a, determine any significant differences among the average pounds per acre of nitrogen required to produce yield levels of corn on fields that had been planted with corn as the previous crop. c. Repeat part b for soybeans and grass sod, respectively..

(52) 526. CHAPTER 12. video. |. Analysis of Variance. Video Case 3. Drive-Thru Service Times @ McDonald’s When you’re on the go and looking for a quick meal, where do you go? If you’re like millions of people every day, you make a stop at McDonald’s. Known as “quick service restaurants” in the industry (not “fast food”), companies such as McDonald’s invest heavily to determine the most efficient and effective ways to provide fast, high-quality service in all phases of their business. Drive-thru operations play a vital role. It’s not surprising that attention is focused on the drive-thru process. After all, over 60% of individual restaurant revenues in the United States come from the drive-thru experience. Yet, understanding the process is more complex than just counting cars. Marla King, professor at the company’s international training center, Hamburger University, got her start 25 years ago working at a McDonald’s drive-thru. She now coaches new restaurant owners and managers. “Our stated drive-thru service time is 90 seconds or less. We train every manager and team member to understand that a quality customer experience at the drive-thru depends on them,” says Marla. Some of the factors that affect a customer’s ability to complete their purchases within 90 seconds include restaurant staffing, equipment layout in the restaurant, training, efficiency of the grill team, and frequency of customer arrivals, to name a few. Customerorder patterns also play a role. Some customers will just order drinks, whereas others seem to need enough food to feed an entire soccer team. And then there are the special orders. Obviously, there is plenty of room for variability here. Yet, that doesn’t stop the company from using statistical techniques to better understand the drive-thru action. In particular, McDonald’s utilizes statistical techniques to display data and to help transform the data into useful information. For restaurant managers to achieve the goal in their own restaurants, they need training in proper restaurant and drive-thru operations. Hamburger University, McDonald’s training center located near Chicago, satisfies that need. In the mock-up restaurant service lab, managers go thru a “before and after” training scenario. In the “before” scenario, they run the restaurant for 30 minutes as if they were back in their home restaurants. Managers in the training class are assigned to be crew, customers, drive-thru cars, special needs guests (such as hearing impaired, indecisive, or. clumsy), or observers. Statistical data about the operations, revenues, and service times are collected and analyzed. Without the right training, the restaurant’s operations usually start breaking down after 10–15 minutes. After debriefing and analyzing the data collected, the managers make suggestions for adjustments and head back to the service lab to try again. This time, the results usually come in well within standards. “When presented with the quantitative results, managers are pretty quick to make the connections between better operations, higher revenues, and happier customers,” Marla states. When managers return to their respective restaurants, the training results and techniques are shared with staff charged with implementing the ideas locally. The results of the training eventually are measured when McDonald’s conducts a restaurant operations improvement process study, or ROIP. The goal is simple: improved operations. When the ROIP review is completed, statistical analyses are performed and managers are given their results. Depending on the results, decisions might be made that require additional financial resources, building construction, staff training, or reconfiguring layouts. Yet one thing is clear: Statistics drive the decisions behind McDonald’s drive-thru service operations.. Discussion Questions: 1. After returning from the training session at Hamburger University, a McDonald’s store owner selected a random sample of 362 drive-thru customers and carefully measured the time it took from when a customer entered the McDonald’s property until the customer had received the order at the drive-thru window. These data are in the file called McDonald’s Drive-Thru Waiting Times. Note, the owner selected some customers during the breakfast period, others during lunch, and others during dinner. Test, using an alpha level equal to 0.05, to determine whether the mean drive-thru time is equal during the three dining periods (breakfast, lunch, and dinner.) 2. Referring to question 1, write a short report discussing the results of the test conducted. Make sure to include a discussion of any ramifications the results of this test might have regarding the efforts the manager will need to take to reduce drive-thru times.. Case 12.1 Agency for New Americans Denise Walker collapsed at home after her first outing as a volunteer for the Agency for New Americans in Raleigh, North Carolina. Denise had a fairly good career going with various federal agencies after graduating with a degree in accounting. She decided to stay at home after she and her husband started a family. Now that their. youngest is in high school, Denise decided she needed something more to do than manage the household. She decided on volunteer work and joined the Agency for New Americans. The purpose of the Agency for New Americans is to help new arrivals become comfortable with the basic activities necessary to function in American society. One of the major activities, of.

(53) CHAPTER 12. course, is shopping for food and other necessities. Denise had just returned from her first outing to a supermarket with a recently arrived Somali Bantu family. It was their first time also, and they were astonished by both the variety and selection. Since the family was on a very limited budget, Denise spent much time talking about comparison shopping, and for someone working with a new currency this was hard. She didn’t even want to tell them the store they were in was only one of four possible chains within a mile of their apartment. Denise realized the store she started with would be the one they would automatically return to when on their own. Next week Denise and the family were scheduled to go to a discount store. Denise typically goes to a national chain close to her house but hasn’t felt the need to be primarily a value shopper for some time. Since she feels the Somali family will automatically return to the store she picks, and she has her choice of two national chains and one regional chain, she decides to not automatically take them to “her” store. Because each store advertises low prices and meeting all competitors’ prices, she also doesn’t want to base her decision on what she hears on commercials. Instead, she picks. |. Analysis of Variance. 527. a random selection of items and finds the prices in each store. The items and prices are shown in the file New Americans. In looking at the data, Denise sees there are differences in some prices but wonders if there is any way to determine which store to take the family to.. Required Tasks: 1. Identify the major issue in the case. 2. Identify the appropriate statistical test that could be conducted to address the case’s major issue. 3. Explain why you selected the test you choose in (2). 4. State the appropriate null and alternative hypotheses for the statistical test you identified. 5. Perform the statistical test(s). Be sure to state your conclusion(s). 6. If possible, identify the stores that Diane should recommend to the family. 7. Summarize your analysis and findings in a short report.. Case 12.2 McLaughlin Salmon Works John McLaughlin’s father correctly predicted that a combination of declining wild populations of salmon and an increase in demand for fish in general would create a growing market for salmon grown in “fish farms.” Over recent years, an increasing percentage of salmon, trout, and catfish, for example, come from commercial operations. At first, operating a fish farm consisted of finding an appropriate location, installing the pens, putting in smelt, and feeding the fish until they grew to the appropriate size. However, as the number of competitors increased, successful operation required taking a more scientific approach to raising fish. Over the past year, John has been looking at the relationship between food intake and weight gain. Since food is a major cost of the operation, the higher the weight gain for a given amount of food, the more cost-effective the food. John’s most recent effort involved trying to determine the relationship between four component mixes and three size progressions for the food pellets. Since smaller fish require smaller food pellets but larger pellets contain more food, one question John was addressing is at what rate to move from smaller to larger pellets. Also, since fish are harder to individually identify than livestock, the study involved constructing small individual pens and giving fish in each pen a different. combination of pellet mix and size progression. This involved a reasonable cost but a major commitment of time, and John’s father wasn’t sure the cost and time were justified. John had just gathered his first set of data and has started to analyze it. The data are shown in the file called McLaughlin Salmon Works. John is not only interested in whether one component mix, or one pellet size progression, seemed to lead to maximum weight gain but would really like to find one combination of mix and size progression that proved to be superior.. Required Tasks: 1. 2. 3. 4.. Identify the major issues in the case. Identify an appropriate statistical analysis to perform. Explain why you selected the test you choose in (2) State the appropriate null and alternative hypotheses for the statistical test you identified. 5. Perform the statistical test(s). Be sure to state your conclusion(s). 6. Is there one combination of mix and size progression that is superior to the others? 7. Summarize your analysis and findings in a short report.. Case 12.3 NW Pulp and Paper Cassie Coughlin had less than a week to finish her presentation to the CEO of NW Pulp and Paper. Cassie had inherited a project started by her predecessor as head of the new-product development section of the company, and by the nature of the business, dealing with wood products, projects tended to have long lifetimes. Her. predecessor had successfully predicted the consequences of a series of events that, in fact, had occurred: 1. The western United States, where NW Pulp and Paper had its operations, was running out of water, caused by a combination of population growth and increased irrigation. The situation.

(54) 528. CHAPTER 12. |. Analysis of Variance. had currently been made worse by several years of drought. This meant many farming operations were becoming unprofitable. 2. The amount of timber harvesting from national forests continued to be limited. 3. At least some of the land that had been irrigated would become less productive due to alkaline deposits caused by taking water from rivers. Based on these three factors, Cassie’s predecessor had convinced the company to purchase a 2,000-acre farm that had four types of soil commonly found in the West and also had senior water rights. Water rights in the West are given by the state, and senior rights are those that will continue to be able to get irrigation water after those with junior rights are cut off. His idea had been to plant three types of genetically modified poplar trees (these are. generally fast-growing trees) on the four types of soil and assess growth rates. His contention was it might be economically feasible for the company to purchase more farms that were becoming less productive and to become self-sufficient in its supply of raw material for making paper. The project had been started 15 years ago, and since her predecessor had since retired, Cassie was now in charge of the project. The primary focus of the 15-year review was tree growth. Growth in this case did not refer to height but wood volume. Volume is assessed by measuring the girth of the tree three feet above the ground. She had just received data from the foresters who had been managing the experiment. They had taken a random sample of measurements from each of the tree types. The data are shown in the file NW Pulp and Paper. Cassie knew the CEO would at least be interested in whether one type of tree was generally superior and whether there was some unique combination of soil type and tree type that stood out.. Case 12.4 Quinn Restoration Last week John Quinn sat back in a chair with his feet on his deck and nodded at his wife, Kate. They had just finished a conversation that would likely influence the direction of their lives for the next several years or longer. John retired a little less than a year ago after 25 years in the Lake Oswego police department. He had steadily moved up the ranks and retired as a captain. Although his career had, in his mind, gone excellently, he had been working much more than he had been home. Initially upon retiring he had reveled in the ability to spend time doing things he was never able to do while working: complete repairs around the house, travel with his wife, spend time with the children still at home, and visit those who had moved out. He was even able to knock five strokes off his golf handicap. However, he had become increasingly restless, and both he and Kate agreed he needed something to do, but that something did not involve a full-time job. John had, over the years, bought, restored, and sold a series of older Corvettes. Although this had been entirely a hobby, it also had been a profitable one. The discussion John and Kate just. concluded involved expanding this hobby, not into a full-time job, but into a part-time business. John would handle the actual restoration, which he enjoyed, and Kate would cover the paperwork, ordering parts, keeping track of expenses, and billing clients, which John did not like. The last part of their conversation involved ordering parts. In the past John had ordered parts for old Corvettes from one of three possible sources: Weckler’s, American Auto Parts, or Corvette Central. Kate, however, didn’t want to call all three any time John needed a part but instead wanted to set up an account with one of the three and be able to order parts over the Internet. The question was which company, if any, would be the appropriate choice. John agreed to develop a list of common parts. Kate would then call each of the companies asking for their prices, and, based on this information, determine with which company to establish the account. Kate spent time over the last week on the phone developing the data located in the data file called Quinn Restoration. The question John now faced is whether the prices he found could lead him to conclude one company will be less expensive, on average, than the other two.. Business Statistics Capstone Project Theme: Analysis of Variance Project Objective: The objective of this business statistics capstone project is to provide you with an opportunity to integrate the statistical tools and concepts you have learned in your business statistics course. As in all real-world applications, it is not expected through the completion of this project that you will have utilized every statistical technique you have been taught in this course. Rather, an objective of. the assignment will be for you to determine which of the statistical tools and techniques are appropriate to employ for the situation you have selected.. Project Description: You are to identify a business or organizational issue that is appropriately addressed using analysis of variance or experimental design. You will need to specify one or more sets of null and alternative hypotheses to be tested in order to reach conclusions.

(55) CHAPTER 12. pertaining to the business or organizational issue you have selected. You are responsible for designing and carrying out an “experiment” or otherwise collecting appropriate data required to test the hypotheses using one or more of the analysis of variance designs introduced in your text and statistics course. There is no minimum sample size. The sample size should depend on the design you choose and the cost and difficulty in obtaining the data. You are responsible for making sure that the data are accurate. All methods (or sources) for data collection should be fully documented.. Project Deliverables: To successfully complete this capstone project, you are required to deliver, at a minimum, the following items in the context of a management report: • A complete description of the central issue of your project and of the background of the company or organization you have selected as the basis for the project. |. Analysis of Variance. 529. • A clear and concise explanation of the data collection method used. Included should be a discussion of your rationale for selecting the analysis of variance technique(s) used in your analysis. • A complete descriptive analysis of all variables in the data set, including both numerical and graphical analysis. You should demonstrate the extent to which the basic assumptions of the analysis of variance designs have been satisfied. • Provide a clear and concise review of the hypotheses tests that formed the objective of your project. Show any post–ANOVA multiple comparison tests where appropriate. • Offer a summary and conclusion section that relates back to the central issue(s) of your project and discusses the results of the hypothesis tests. • All pertinent appendix materials The final report should be presented in a professional format using the style or format suggested by your instructor.. References Berenson, Mark L., and David M. Levine, Basic Business Statistics: Concepts and Applications, 11th ed. (Upper Saddle River, NJ: Prentice Hall, 2009). Bowerman, Bruce L., and Richard T. O’Connell, Linear Statistical Models: An Applied Approach, 2nd ed. (Belmont, CA: Duxbury Press, 1990). Cox, D. R., Planning of Experiments (New York City: John Wiley & Sons, 1992). Cryer, Jonathan D., and Robert B. Miller, Statistics for Business: Data Analysis and Modeling, 2nd ed. (Belmont, CA: Duxbury Press, 1994). Kutner, Michael H., Christopher J. Nachtsheim, John Neter, William Li, Applied Linear Statistical Models, 5th ed. (New York City, McGraw-Hill Irwin, 2005). Microsoft Excel 2007 (Redmond,WA: Microsoft Corp., 2007). Minitab for Windows Version 15 (State College, PA: Minitab, 2007). Montgomery, D. C., Design and Analysis of Experiments, 6th ed. (New York City: John Wiley & Sons, 2004). Searle, S. R., and R. F. Fawcett, “Expected mean squares in variance component models having finite populations.” Biometrics 26 (1970), pp. 243–254..

(56) chapters 8–12. Special Review Section Chapter 8. Estimating Single Population Parameters. Chapter 9. Introduction to Hypothesis Testing. Chapter 10 Estimation and Hypothesis Testing for Two Population Parameters Chapter 11 Hypothesis Tests and Estimation for Population Variances Chapter 12 Analysis of Variance. This review section, which is presented using block diagrams and flowcharts, is intended to help you tie together the material from several key chapters. This section is not a substitute for reading and studying the chapters covered by the review. However, you can use this review material to add to your understanding of the individual topics in the chapters.. Chapters 8 to 12 Statistical inference is the process of reaching conclusions about a population based on a random sample selected from the population. Chapters 8 to 12 introduced the fundamental concepts of statistical inference involving two major categories of inference, estimation and hypothesis testing. These chapters have covered a fairly wide range of different situations that for beginning students can sometimes seem overwhelming. The following diagrams will, we hope, help you better identify which specific estimation or hypothesistesting technique to use in a given situation. These diagrams form something resembling a decision support system that you should be able to use as a guide through the estimation and hypothesis-testing processes.. 530.

(57) CHAPTERS 8–12. | Special Review Section. A. Business Application. Hypothesis Test. Estimation. 1 Population. Go to B. 2 Populations. Go to C. Estimating 1 Population Parameter. 1 Population. 2 Populations. ≥3 Populations. Go to D. Go to E. Go to F. B. Population Mean. Population Proportion. Estimate . Estimate . σ known. σ unknown. Go to B-1. Go to B-3. Go to B-3. 531.

(58) 532. CHAPTERS 8–12. |. Special Review Section. Estimate ,. B-1. Known. Σx. Point Estimate for . x±z n. Confidence Interval Estimate for . 2 2 n z e2. Determine Sample Size. x n Critical z from Standard Normal Distribution e Margin of Error. Estimate ,. B-2. Unknown. Critical t from t-Distribution with n – 1 Degrees of Freedom. s=. Σ (x – x). 2. Σx x=. Point Estimate for . x±t s n. Confidence Interval Estimate for . n. n–1. Assumption: Population Is Normally Distributed.. Estimate Population Proportion. Critical z from Standard Normal Distribution e Margin of Error Estimated from Pilot Sample or Specified. B-3. x p= n. Point Estimate for . p ± z p(1 – p) n. Confidence Interval Estimate for . n. z2(1 – ) e2. Requirement: np ≥ 5 and n(1 – p) ≥ 5. Determine Sample Size.

(59) CHAPTERS 8–12. Estimating 2 Population Parameters. | Special Review Section. C. Population Means, Independent Samples. Population Proportions Paired Samples. Estimate 1 2 σ1 and σ2 known. σ1 and σ2 unknown. Estimate d. Estimate 1 2. Go to C-3. Go to C-4. Go to C-1. Go to C-2. Estimate 1 2, 1 and 2 Known. Critical z from Standard Normal Distribution. C-1. x1 – x2. Point Estimate for 1 2. 2 2 (x1 – x2) ± z σ1 + σ2 n1 n2. Confidence Interval Estimate for 1 2. Estimate 1 2, 1 and 2 Unknown. C-2. x1 – x2 Critical t from t-Distribution with. (x1 – x2) ± tsp. n1 n2 2. where:. Degrees of Freedom. sp =. 1 1 + n 1 n2. Point Estimate for. 1 2 Confidence Interval Estimate for. 1 2. (n1 – 1)s12 + (n2 – 1)s22 n1 + n2 – 2. Assumptions: 1. Populations Are Normally Distributed. 2. Populations Have Equal Variances. 3. Independent Random Samples. 4. Measurements Are Interval or Ratio.. 533.

(60) 534. CHAPTERS 8–12. |. Special Review Section. Estimate d Paired Samples. C-3. Σd d= n. Critical t from t-Distribution with n – 1 Degrees of Freedom. sd n. d±t where: sd =. Estimate 1 2 Difference Between Proportions. Σ (d – d)2. Point Estimate for d. Confidence Interval Estimate for d. n–1. C-4. Point Estimate for 1 2. p1 – p 2 Critical z from Standard Normal Distribution. (p1 – p2) ± z. Hypothesis Tests for 1 Population Parameter. Confidence Interval Estimate for 1 2. p1(1 – p1) p (1 – p2) 2 n1 n2. D. Population Mean. Population Proportion Population Variance Go to Go to D-3. Test for . σ known. Go to D-1. Go to D-4. σ unknown. Go to D-2. Test for .

(61) CHAPTERS 8–12. Hypothesis Test for , Known. D-1. Null and Alternative Hypothesis Options for . H0: μ = 20 H0: μ ≤ 20 H0: μ ≥ 20 HA: μ ≠ 20 HA: μ > 20 HA: μ < 20. z=. Critical z from Standard Normal Distribution. x–μ. n. z-Test Statistic. Significance level One-tailed test, critical value z or –z Two-tailed test, critical values z /2. Hypothesis Test for , Unknown. D-2. H0: μ = 20 H0: μ 20 H0: μ

(62) 20 HA: μ ≠ 20 HA: μ > 20 HA: μ < 20. t=. Critical t from t-Distribution with n – 1 Degrees of Freedom. | Special Review Section. x–μ s n. Significance level One-tailed test, critical value t or t Two-tailed test, critical values t /2. Assumption: Population Is Normally Distributed.. Null and Alternative Hypothesis Options for . t-Test Statistic. 535.

(63) 536. CHAPTERS 8–12. |. Special Review Section. Hypothesis Test for 2. D-3. H0: 2 = 50 HA: 2 ≠ 50. H0: 2 50 HA: 2 50. 2 . H0: 2

(64) 50 HA: 2 50. (n – 1)s2 2. Null and Alternative Hypothesis Options for 2. 2 Test Statistic. Significance level df n 1 One-tailed test, critical value 2 or 21. 2 Two-tailed test, critical value /2 and 21 /2. Assumption: Population Is Normally Distributed.. Hypothesis Test for . D-4. H0: = 0.20 H0: ≤ 0.20 H0: ≥ 0.20 HA: ≠ 0.20 HA: > 0.20 HA: < 0.20. z=. Critical z from Standard Normal Distribution. Null and Alternative Hypothesis Options for . p–. (1 – ) n. z-Test Statistic. significance level one-tailed test, critical value z or z two-tailed test, critical values z /2. Requirement: n

(65) 5 and n(1 )

(66) 5.

(67) CHAPTERS 8–12. Hypothesis Tests for 2 Population Parameters. | Special Review Section. E. Population Means, Independent Samples Population Proportions. Paired Samples Test 1 – 2 σ1 and σ2 known. σ1 and σ2 unknown. Go to E-1. Test d. Test 1 – 2. Test 1 – 2. Go to E-3. Go to E-4. Go to E-5. 2. 2. Go to E-2. Test 1 2, 1 and 2 Known. E-1. H0: μ1 μ2 = 0 H0: μ1 μ2

(68) 0 H0: μ1 μ2 0 HA: μ1 μ2 ≠ 0 HA: μ1 μ2 0 HA: μ1 μ2 0. z=. Critical z from Standard Normal Distribution. Population Variances. (x1 – x2) – (μ1 – μ2) σ12 σ22 n1 + n2. Significance level One-tailed test, critical value z or z Two-tailed test, critical values z /2. Hypothesis Options for Testing 1 2 z-Test Statistic for 1 2. 537.

(69) 538. CHAPTERS 8–12. |. Special Review Section. Test 1 2, 1 and 2 Unknown. E-2. H0: μ1 μ2 = 0 H0: μ1 μ2

(70) 0 H0: μ1 μ2 0 HA: μ1 – μ2 ≠ 0 HA: μ1 μ2 0 HA: μ1 μ2 0. t. Hypothesis Options for Testing 1 2. (x1 – x2) – (μ1 – μ2) sp. t-Test Statistic for 1 2. 1 1 + n1 n2. where: Pooled Standard Deviation. Critical t from t-distribution with n1 + n2 – 2 Degrees of Freedom. sp =. (n1 – 1)s12 + (n2 – 1)s22 n1 + n2 – 2. Significance level One-tailed test, critical value t␣ or t␣ Two-tailed test, critical values t␣/2 Assumptions: 1. Populations Are Normally Distributed. 2. Populations Have Equal Variances. 3. Samples Are Independent. 4. Measurements Are Interval or Ratio.. Test d Paired Samples. E-3. H0: μd = 0 HA: μd ≠ 0. H0: μd 0 HA: μd 0. t=. H0: μd

(71) 0 HA: μd 0. d – μd sd n. where: sd = Critical t from t-Distribution with n–1 Degrees of Freedom. Σ (d – d)2 n–1. Significance level One-tailed test, critical value t or t Two-tailed test, critical values t /2. Hypothesis Options for Testing d. t-Test Statistic for d.

(72) CHAPTERS 8–12. Test for Difference Between Proportions 1 2. | Special Review Section. E-4. H0: π1 – π2 = 0 HA: π1 – π2 ≠ 0. Hypothesis Options for Testing 1 2. H0: π1 – π2 0 H0: π1 π2

(73) 0 HA: π1 – π2 0 HA: π1 π2 0. z. (p1 – p2) – (π1 – π2) p(1 – p). z-Test Statistic for Testing 1 2. )n1 + n1 ) 1. 2. where:. p Critical z from Standard Normal Distribution. n1p1 + n2p2 n1 + n2. Significance level One-tailed test, critical value z or z Two-tailed test, critical values z /2. Test for Difference Between Population Variances 12 22. E-5. 2. 2. 2. 2. 2. 2. 2. 2. H0: σ1 = σ2. H0: σ1 ≤ σ2. H0: σ12 ≥ σ22. HA: σ1 ≠ σ2. HA: σ1 > σ2. HA: σ1 <σ 22. For two-tailed test put larger sample variance in numerator df = D1 = n1 – 1 and D2 = n2 – 1. F=. Hypothesis Options for Testing 21 22. 2. s12 s22. F-Test Statistic for Testing 21 22. Significance level One-tailed test, critical value F Two-tailed test, critical value F /2. Hypothesis Tests for 3 or More Population Means. F. One-Way ANOVA Design. Go to F-1. Randomized Block ANOVA Design Without Replications. Go to F-2. Two-Factor ANOVA Design with Replications. Go to F-3. 539.

(74) 540. CHAPTERS 8–12. |. Special Review Section. One-Way ANOVA Design for 3 or More Population Means. F-1. Null and Alternative Hypotheses. H0: μ1 μ2 μ3 . . . μk HA: At least two populations have different means.. ANOVA Table Source of Variation Between samples. SS SSB. df k–1. MS MSB. Within samples. SSW. nT – k. MSW. Total. SST. Critical F from F-Distribution with D1 = k – 1 and D2 = nT – k Degrees of Freedom If null hypothesis is rejected, compare all possible pairs, xi – xj to TukeyKramer Critical Range. F-Ratio MSB MSW. nT – 1. Significance level Critical value F Tukey-Kramer Critical Range. critical range = q1–α MSW 1 + 1 2 ni nj. ). ). Assumptions: 1. Populations Are Normally Distributed. 2. Populations Have Equal Variances. 3. Samples Are Independent. 4. Measurements Are Interval or Ratio..

(75) CHAPTERS 8–12. Randomized Block ANOVA Design for 3 or More Population Means. | Special Review Section. F-2. Primary Null and Alternative Hypotheses. H0: μ1 = μ2 = μ3 = . . . = μk HA: At least two populations have different means.. H0: μb1 = μb2 = μb3 = . . . = μbn (blocking is not effective). HA: At least two populations have different means (blocking is effective).. Blocking Null and Alternative Hypotheses. ANOVA Table Source of Variation Between blocks. SS SSBL. df b–1. MS MSBL. Between samples. SSB. k–1. MSB. Within samples. SSW. (k – 1)(b – 1). MSW. Total. SST. nT – 1. Critical F from F-Distribution. F-Ratio MSBL MSW MSB MSW. Significance level Blocking critical value F df D1 b 1 and D2 (k 1)(b 1) Primary critical value F , df D1 k 1 and D2 (k 1)(b 1). Assumptions: 1. Populations Are Normally Distributed. 2. Populations Have Equal Variances. 3. Observations Within Samples Are Independent. 4. Measurements Are Interval or Ratio.. If null is rejected, compare all ⎜xi – xj ⎜ to Fisher’s LSD = tα/2 MSW. 2 b. 541.

(76) 542. CHAPTERS 8–12. |. Special Review Section. Two-Factor ANOVA Design with Replications. F-3. Factor A Null and Alternative Hypotheses. Factor B Null and Alternative Hypotheses. H0: μB1 = μB2 = μB3 = . . . = μBn HA: Not all Factor B means are equal.. H0: μA1 = μA1 = μA3 = . . . = μAk HA: Not all Factor A means are equal.. Null and alternative hypotheses for testing whether the two factors interact. H0: Factors A and B do not interact to affect the mean response. HA: Factors A and B do interact.. ANOVA Table Source of Variation Factor A. SS SSA. df a–1. MS MSA. Factor B. SSB. b–1. MSB. AB Interaction. SSAB. (a – 1)(b – 1). MSAB. Error. SSE. nT – ab. MSE. Total. SST. nT – 1. Significance level Factor A critical value F , df D1 a 1 and D2 nT ab Factor B critical value F , df D1 b 1 and D2 nT ab Interaction critical value F , df D1 (a – 1)(b – 1) and D2 nT ab. Assumptions: 1. The Population Values for Each Combination of Pairwise Factor Levels Are Normally Distributed. 2. The Variances for Each Population Are Equal. 3. The Samples Are Independent. 4. Measurements Are Interval or Ratio.. F-Ratio MSA MSE MSB MSE MSAB MSE.

(77) CHAPTERS 8–12. | Special Review Section. 543. Using the Flow Diagrams Example Problem: A travel agent in Florida is interested in determining whether there is a difference in the mean out-of-pocket costs incurred by customers on two major cruise lines. To test this, she has selected a simple random sample of 20 customers who have taken cruise line I and has asked these people to track their costs over and above the fixed price of the cruise. She did the same for a second simple random sample of 15 people who took cruise line II. You can use the flow diagrams to direct you to the appropriate statistical tool.. A The travel agency wishes to test a hypothesis involving two populations.. Business Application. Estimation. 1 Population. Hypothesis Test. 2 Populations. Go to B. 1 Population. 2 Populations.

(78) 3 Populations. Go to D. Go to E. Go to F. Go to C. Proceed to E. Hypothesis Tests for 2 Population Parameters. E. Hypothesis test is for two population means. The samples are independent because the spending by customers on one cruise line in no way influences the spending by customers on the second cruise line. Population standard deviations are unknown.. Population Means, Independent Samples Paired Samples. σ1 and σ2 unknown. Variances. Test d. Test 1 2. Test 12 22. Go to E-1. Go to E-3. Go to E-4. Go to E-5. Go to E-2. Proceed to E-2. Test 1 2 σ1 and σ2 known. Proportions.

(79) 544. CHAPTERS 8–12. |. Special Review Section. At E-2, we determine the null hypothesis to be H0: m1 m2 0 HA: m1 m2 0 Next, we establish the test statistic as t. (x1 x2) (m1 m2) sp. 1 1 n1 n2. where: sp . (n1 1)s12 (n2 1)s22 n1 n2 2. Finally, the critical value is a t-value from the t-distribution with 20 + 15 – 2 = 33 degrees of freedom. Note, if the degrees of freedom are not shown in the t table, use Excel’s TINV or use Minitab to determine the t-value. Thus, by using the flow diagrams and answering a series of basic questions, you should be successful in identifying the statistical tools required to address any problem or application covered in Chapters 8 to 12. You are encouraged to apply this process to the application problems and projects listed here.. MyStatLab. Exercises Integrative Application Problems SR.1. Brandon Outdoor Advertising supplies neon signs to retail stores. A major complaint from its clients is that letters in the signs can burn out and leave the signs looking silly, depending on which letters stop working. The primary cause of neon letters not working is the failure of the starter unit attached to each letter. Starter units fail primarily based on turn-on/turn-off cycles. The present unit bought by Brandon averages 1,000 cycles before failure. A new manufacturer has approached Brandon claiming to have a model that is superior to the current unit. Brandon is skeptical but agrees to sample 50 starter units. It says it will buy from the new supplier if the sample results indicated the new unit is better. The sample of 50 gives the following values: Sample mean = 1,010 cycles Sample standard deviation = 48 cycles Would you recommend changing suppliers? SR.2. PestFree Chemicals has developed a new fungus preventative that may have a significant market among potato growers. Unfortunately, the actual extent of the fungus problem in any year depends on rainfall, temperature, and many other factors. To test the new chemical, PestFree has used it on 500 acres of potatoes and has used the leading competitor on an additional 500 acres. At the end of the season, 120 acres treated by the new chemical show significant levels of fungus. infestation, whereas 160 of the acres treated by the leading chemical show significant infestation. Do these data provide statistical proof that the new product is superior to the leading competitor? SR.3. Last year Tucker Electronics decided to try to do something about turnover among assembly-line workers at its plants. It implemented two trial personnel policies, one based on an improved hiring policy and the other based on increasing worker responsibility. These policies were put into effect at two different plants, with the following results: Plant 1. Plant 2. Improved Hiring. Increased Responsibility. Workers in trial group. 800. 900. Turnover proportion. 0.05. 0.09. Do these data provide evidence that there is a difference between the turnover rates for the two trial policies? SR.4. A Big 10 University has been approached by Wilson Sporting Goods. Wilson has developed a football designed specifically for practice sessions. Wilson would like to claim the ball will last for 500 practice hours before it needs to be replaced. Wilson has supplied six balls for use during spring and fall.

(80) CHAPTERS 8–12. practice. The following data have been gathered on the time used before the ball must be replaced. Hours 551. 511. 479. 435. 440. 466. Do you see anything wrong with Wilson claiming the ball will last 500 hours? SR.5. The management of a chain of movie theaters believes the average weekend attendance at its downtown theater is greater than at its suburban theater. The following sample results were found from their accounting data. Downtown Number of weekends Average attendance Sample variance. 11. Suburban 10. 855. 750. 1,684. 1,439. | Special Review Section. 545. SR.8. AstraZeneca is the maker of the stomach medicine Prilosec, which is the second-best-selling drug in the world. Recently, the company has come under close scrutiny concerning the cost of its medicines. The company’s internal audit department selected a random sample of 300 purchases for Prilosec. They wished to characterize how much is being spent on this medicine. In the sample, the mean price per 20-milligram tablet of Prilosec was $2.70. The sample had a standard deviation of $0.30. Determine an estimate that will characterize the average range of values charged for a tablet of Prilosec. SR.9. A manufacturer of PC monitors is interested in the effects that the type of glass and the type of phosphor used in the manufacturing process have on the brightness of the monitors. The director of research and development has received anecdotal evidence that the type of glass does not affect the brightness of the monitor as long as phosphor type 2 is used. However, the evidence seems to indicate that the type of glass does make a difference if two other phosphor types are used. Here are data to validate this anecdotal evidence. Phosphor Type. Do these data provide sufficient evidence to indicate there is a difference in average attendance? The company is also interested in whether there is a significant difference in the variability of attendance. SR.6. A large mail-order company has placed an order for 5,000 thermal-powered fans to sit on wood-burning stoves from a supplier in Canada, with the stipulation that no more than 2% of the units will be defective. To check the shipment, the company tests a random sample of 400 fans and finds 11 defective. Should this sample evidence lead the company to conclude the supplier has violated the terms of the contract? SR.7. A manufacturer of automobile shock absorbers is interested in comparing the durability of its shocks with that of its two biggest competitors. To make the comparison, a set of one each of the manufacturer’s and of the competitor’s shocks were randomly selected and installed on the rear wheels of each of six randomly selected cars of the same type. After the cars had been driven 20,000 miles, the strength of each test shock was measured, coded, and recorded. Car number. Manufacturer’s. Glass Type 1. 2. 1. 2. 3. 279. 307. 287. 254. 313. 290. 297. 294. 285. 243. 253. 252. 245. 232. 236. 267. 223. 278. Conduct a procedure to verify or repudiate the anecdotal evidence. SR.10. The Vilmore Corporation is considering two word processing programs for its PCs. One factor that will influence its decision is the ease of use in preparing a business report. Consequently, Jody Vilmore selected a random sample of nine typists from the clerical pool and asked them to type a typical report using both word processors. The typists then were timed (in seconds) to determine how quickly they could type one of the frequently used forms. The results were as follows.. Competitor 1. Competitor 2. Typist. Processor 1. Processor 2. 1. 8.8. 9.3. 8.6. 1. 82. 75. 2. 10.5. 9.0. 13.7. 2. 76. 80. 3. 12.5. 8.4. 11.2. 3. 90. 70. 4. 9.7. 13.0. 9.7. 4. 55. 58. 5. 9.6. 12.0. 12.2. 5. 49. 53. 6. 13.2. 10.1. 8.9. 6. 82. 75. 7. 90. 80. 8. 45. 45. 9. 70. 80. Do these data present sufficient evidence to conclude there is a difference in the mean strength of the three types of shocks after 20,000 miles?.

(81) 546. CHAPTERS 8–12. |. Special Review Section. Jody wishes to have an estimate of the smallest and biggest differences that might exist in the average time required for typing the business form using the two programs. Provide this information. SR.11. The research department of an appliance manufacturing firm has developed a solid-state switch for its blender that the department claims will reduce the percentage of appliances being returned under the one-year full warranty by a range of 3% to 6%. To determine if the claim can be supported, the testing department selects a group of the blenders manufactured with the new switch and the old switch and subjects them to a normal year’s worth of wear. Out of 250 blenders tested with the new switch, 9 would have been returned. Sixteen would have been. returned out of the 250 blenders with the old switch. Use a statistical procedure to verify or refute the department’s claim. SR.12. The Ecco Company makes electronics products for distribution throughout the world. As a member of the quality department, you are interested in the warranty claims that are made by customers who have experienced problems with Ecco products. The file called Ecco contains data for a random sample of warranty claims. Large warranty claims not only cost the company money but also provide adverse publicity. The quality manager has asked you to provide her with a range of values that would represent the percentage of warranty claims filed for more than $300. Provide this information for your quality manager. END EXERCISES. Term Project Assignments Investigate whether there are differences in grocery prices for three or more stores in your city. a. Specify the type of testing procedure you will use. b. What type of experimental design will be used? Why? c. Develop a “typical” market basket of at least 10 items that you will price-check. Collect price data on these items at three or more different stores that sell groceries. d. Analyze your price data using the testing procedure and experimental design you specified in parts a and b. e. Present your findings in a report. Did you find differences in average prices of the “market basket” across the different grocery stores?. Business Statistics Capstone Project Theme: Financial Data Analysis Project Objective: The objective of this business statistics capstone project is to provide you with an opportunity to integrate the statistical tools and concepts that you have learned in your business statistics course. Like all realworld applications, it is not expected that through the completion of this project you will have utilized every statistical technique you have been taught in this course. Rather, an objective of the assignment will be for you to determine which of the statistical tools and techniques are appropriate to apply for the situation you have selected. Project Description: Assume that you are working as an intern for a financial management company. Your employer has a large number of clients who trust the company managers to invest their funds. In your position, you have the responsibility for producing reports for clients when they request information. Your company has two large data files with financial information for a large number of U.S. companies. The first is called US Companies 2003, which contains financial information for the companies’ 2001 or 2002 fiscal-year end. The second file is called US Companies 2005, which has data for the fiscal 2003 or 2004 year-end. The 2003 file has data for 7,441. companies. The 2005 file has data for 6,992 companies. Thus, many companies are listed in both files but some are just in one or the other. The two files have many of the same variables, but the 2003 file has a larger range of financial variables than the 2005 file. For some companies, the data for certain variables are not available and a code of NA is used to so indicate. The 2003 file has a special worksheet containing the description of each variable. These descriptions apply to the 2005 data file as well. You have been given access to these two data files for use in preparing your reports. Your role will be to perform certain statistical analyses that can be used to help convert these data into useful information in order to respond to the clients’ questions. This morning, one of the partners of your company received a call from a client who asked for a report that would compare companies in the financial services industry (SIC codes in the 6000’s) to companies in production-oriented industries (SIC codes in the 2000’s and 3000’s) There are no firm guidelines on what the report should entail, but the partner has suggested the following: ●. ●. ●. ●. Start with the 2005 data file. Pull the data for all companies with the desired SIC codes into a new worksheet. Prepare a complete descriptive analysis of key financial variables using appropriate charts and graphs to help compare the two types of businesses. Determine whether there are statistical differences between the two classes of companies in terms of key financial measures. Using data from the 2003 file for companies that have these SIC codes and which are also in the 2005 file, develop a comparison that shows the changes over the time span both within SIC code grouping and between SIC code grouping.. Project Deliverables: To successfully complete this capstone project, you are required to deliver a management report that addresses the partner’s requests (listed above) and also contains at least one other substantial type of analysis not mentioned by the partner. This latter work should be set off in a special section of the report. The final report should be presented in a professional format using the style or format suggested by your instructor..

(82) • Review the logic involved in testing a hypothesis discussed in Chapter 9.. • Review the characteristics of probability distributions such as the binomial, Poisson, uniform, and normal distributions in Chapters 5 and 6.. chapter 13. Chapter 13 Quick Prep Links. • Review the definitions of Type I and Type II errors in Chapter 9.. Goodness-of-Fit Tests and Contingency Analysis 13.1 Introduction to Goodnessof-Fit Tests (pg. 548–562). Outcome 1. Utilize the chi-square goodness-of-fit test to determine whether data from a process fit a specified distribution.. 13.2 Introduction to Contingency Analysis (pg. 562–572). Outcome 2. Set up a contingency analysis table and perform a chi-square test of independence.. Why you need to know The previous 12 chapters introduced a wide variety of statistical techniques that are frequently used in business decision making. We have discussed numerous descriptive tools and techniques, as well as estimation and hypothesis tests for one and two populations, hypothesis tests using the t-distribution, and analysis of variance. However, as we have often mentioned, these statistical tools are limited to use under those conditions for which they were originally developed. For example, the tests based on the standard normal distribution assume that the data can be measured at least at the interval level. The tests that employ the t-distribution assume that the sampled populations are normally distributed. In those situations in which the conditions just mentioned are not satisfied, we suggest using nonparametric statistics. Several of the more widely used nonparametric techniques will be discussed in Chapter 17. These procedures will be shown to be generally the nonparametric equivalent of the classical procedures discussed in Chapters 8–12. The obvious questions when faced with a realistic decision-making situation are “Which test do I use? Should I consider a nonparametric test?” These questions are generally followed by a second question: “Do the data come from a normal distribution?” But recall that we also described situations involving data from Poisson or binomial distributions. How do we know which distribution applies to our situation? Fortunately, a statistical technique called goodness-of-fit exists that can help us answer this question. Using goodness-of-fit tests, we can decide whether a set of data comes from a specific hypothesized distribution. You will also encounter many business situations in which the level of data measurement for the variable of interest is either nominal or ordinal, not interval or ratio. For example, a bank may use a code to indicate whether a customer is a good or poor credit risk. The bank may also have data for these customers that indicate, by a code, whether each person is buying or renting a home. The loan officer may be interested in determining whether credit-risk status is independent of home ownership. Because both credit risk and home ownership are qualitative, or categorical, variables, their measurement level is nominal and the statistical techniques introduced in Chapters 8–12 cannot be used to analyze this problem. We therefore need a new statistical tool to assist the manager in reaching an inference about the customer population. That statistical tool is contingency analysis. Contingency analysis is a widely used tool for analyzing the relationship between qualitative variables, one that decision makers in all business areas find helpful for data analysis.. 547.

(83) 548. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. Chapter Outcome 1.. | Customer Door Entrance Data TABLE 13.1. Entrance. Number of Customers. East West. 260. North. 230. South. 220. 290. 13.1 Introduction to Goodness-of-Fit Tests Many of the statistical procedures introduced in earlier chapters require that the sample data come from populations that are normally distributed. For example, when we use the t-distribution in confidence interval estimation or hypothesis testing about one or two population means, the population(s) of interest is (are) assumed to be normally distributed. The F-test introduced in Chapters 11 and 12 is based on the assumption that the populations are normally distributed. But how can you determine whether these assumptions are satisfied? In other instances, you may wish to employ a particular probability distribution to help solve a problem related to an actual business process. To solve the problem, you may find it necessary to know whether the actual data from the process fit the probability distribution being considered. In such instances, a statistical technique known as a goodness-of-fit test can be used. The term goodness-of-fit aptly describes the technique. Suppose Macy’s, a major retail department store, believes the proportions of customers who use each of the four entrances to its Portland, Oregon, store are the same. This would mean that customer arrivals are uniformly distributed across the four entrances. Suppose a sample of 1,000 customers is observed entering the store, and the entrance (East, West, North, South) selected by each customer is recorded. Table 13.1 shows the results of the sample. If the manager’s assumption about the entrances being used uniformly holds and if there was no sampling error involved, we would expect one fourth of the customers, or 250, to enter through each door. When we allow for the potential of sampling error, we would still expect close to 250 customers to enter through each entrance. The question is, how “good is the fit” between the sample data in Table 13.1 and the expected number of 250 people at each entrance? At what point do we no longer believe that the differences between what is actually observed at each entrance and what we expected can be attributed to sampling error? If these differences get too big, we will reject the uniformity assumption and conclude that customers prefer some entrances to others.. Chi-Square Goodness-of-Fit Test The chi-square goodness-of-fit test is one of the statistical tests that can be used to determine whether the sample data come from any hypothesized distribution. Consider the following application.. BUSINESS APPLICATION. CONDUCTING A GOODNESS-OF-FIT TEST. VISTA HEALTH GUARD Vista Health Guard, a Pennsylvania health clinic with 25 offices, is open seven days a week. The operations manager was recently hired from a similar position at a smaller chain of clinics in Florida. She is naturally concerned that the level of staffing—physicians, nurses, and other support personnel—be balanced with patient demand. Currently, the staffing level is balanced Monday through Friday, with reduced staffing on Saturday and Sunday. Her predecessor explained that patient demand is fairly level throughout the week and about 25% less on weekends, but the new manager suspects that the staff members want to have weekends free. Although she was willing to operate with this schedule for a while, she has decided to study patient demand to see whether the assumed demand pattern still applies. The operations manager requested a random sample of 20 days for each day of the week that showed the number of patients on each of the sample days. A portion of those data follows: Day Monday, May 6 Monday, October 7 Tuesday, July 2. Patient Count 325 379 456. Day Monday, July 15 Wednesday, April 3 etc.. Patient Count 323 467 etc..

(84) CHAPTER 13. FIGURE 13.1. |. Goodness-of-Fit Tests and Contingency Analysis. 549. | 12,000. Graph of Actual Frequencies for Vista Health Guard. 10,000. Patient Count. 8,000. 6,000. 4,000. 2,000. 0. Sunday. Monday. Tuesday. Wednesday. Thursday. Friday. Saturday. Day of Week. | Patient Count Data for the Vista Health Guard Example TABLE 13.2. Day Sunday. Total Patient Count 4,502. Monday. 6,623. Tuesday. 8,308. Wednesday. 10,420. Thursday. 11,032. Friday. 10,754. Saturday Total. 4,361 56,000. For the 140 days observed, the total count was 56,000 patients. The total patient counts for each day of the week are shown in Table 13.2 and are graphed in Figure 13.1. Recall that the previous operations manager at Vista Health Guard based his staffing on the premise that from Monday to Friday the patient count remained essentially the same and on Saturdays and Sundays it went down 25%. If this is so, how many of the 56,000 patients would we expect on Monday? How many on Tuesday, and so forth? To figure out this demand, we determine weighting factors by allocating four units each to days Monday through Friday and three units each (representing the 25% reduction) to Saturday and Sunday. The total number of units is then (5 4) (2 3) 26. The proportion of total patients expected on each weekday is 4/26 and the proportion expected on each weekend day is 3/26. The expected number of patients on a weekday is (4/26) 56,000 8,615.38, and the expected number on each weekend day is (3/26) 56,000 6,461.54. Figure 13.2 shows a graph with the actual sample data and the expected values. With the exception of what might be attributed to sampling error, if the distribution claimed by the previous operations manager is correct, the actual frequencies for each day of the week should fit quite closely with the expected frequencies. As you can see in Figure 13.2, the actual data and the expected data do not match perfectly. However, is the difference enough to warrant a change in staffing patterns? The situation facing Vista Health Guard is one for which a number of statistical tests have been developed. One of the most frequently used is the chi-square goodness-of-fit test. What we need to examine is how well the sample data fit the hypothesized distribution. The following null and alternative hypotheses can represent this: H0: The patient demand distribution is evenly spread through the weekdays and is 25% lower on the weekend. HA: The patient demand follows some other distribution. Equation 13.1 is the equation for the chi-square goodness-of-fit test statistic. The logic behind this test is based on determining how far the actual observed frequency is from the expected frequency. Because we are interested in whether a difference exists, positive or negative, we remove the effect of negative values by squaring the differences. In addition, how important this difference is really depends on the magnitude of the expected frequency (e.g., a difference of 5 is more important if the expected frequency is 10 than if the expected.

(85) 550. CHAPTER 13. FIGURE 13.2. |. Goodness-of-Fit Tests and Contingency Analysis. | 12,000. Actual and Expected Frequencies for Vista Health Guard. Actual Data 10,000. Hypothesized Distribution. Patient Count. 8,000. 6,000. 4,000. 2,000. 0. Sunday. Monday. Tuesday. Wednesday. Thursday. Friday. Saturday. Day of Week. frequency is 1,000), so we divide the squared difference by the expected frequency. Finally, we sum these difference ratios for all days. This sum is a statistic that has an approximate chi-square distribution.. Chi-Square Goodness-of-Fit Test Statistic k. 2 . (oi ei )2 ei i1. ∑. (13.1). where: oi Observed frequency for category i ei Expected frequency for category i k Number of categories. The statistic is distributed approximately as a chi-square only if the sample size is large. 2. Special Note. A sample size of at least 30 is sufficient in most cases provided that none of the expected frequencies is too small.. This issue of expected cell frequencies will be discussed later. If the calculated chi-square statistic gets large, this is evidence to suggest that the fit of the actual data to the hypothesized distribution is not good and that the null hypothesis should be rejected. Figure 13.3 shows the hypothesis-test process and results for this chi-square goodnessof-fit test. Note that the degrees of freedom for the chi-square test are equal to k - 1, where k is the number of categories or observed cell frequencies. In this example, we have 7 categories corresponding to the days of the week, so the degrees of freedom are 7 - 1 6. The critical value of 12.5916 is found in Appendix G for an upper-tail test with 6 degrees of freedom and a significance level of 0.05..

(86) CHAPTER 13. FIGURE 13.3. |. Chi-Square Goodness-of-Fit Test for Vista Health Guard. |. Goodness-of-Fit Tests and Contingency Analysis. 551. Hypotheses: H0: Patient demand is evenly spread through the weekdays and is 25% lower on weekends. HA: Patient demand follows some other distribution. a = 0.05 Total Patient Count Expected Observed ei oi. Day Sunday Monday Tuesday Wednesday Thursday Friday Saturday Total Test Statistic:. 2 . 4,502 6,623 8,308 10,420 11,032 10,754 4,361 56,000. 6,461.54 8,615.38 8,615.38 8,615.38 8,615.38 8,615.38 6,461.54 56,000. k. (4,361 – 6,461.54)2 (4,502 – 6,461.54)2 (6,623 – 8,615.38)2 (oi ei )2 ... e 6,461.54 8,615.38 6,461.54 i i1. ∑. χ2 594.2 460.8 . . . 682.9 χ2 3,335.6 f (χ2). df = k − 1 = 7 − 1 = 6 Rejection Region a = 0.05. χ20.05 = 12.5916. χ2. Decision Rule: If χ2 12.5916, reject H0. Otherwise, do not reject H0. Because 3,335.6 12.5916, reject H0. Based on the sample data, we can conclude that the patient distribution is not the same as previously indicated. 2 As Figure 13.3 indicates, 3,335.6 12.5916, so the null hypothesis is rejected and we should conclude that the demand pattern does not match the previously defined distribution. The data in Figure 13.3 indicate that demand is heavier than expected Wednesday through Friday and less than expected on the other days. The operations manager may now wish to increase staffing on Wednesday, Thursday, and Friday to more closely approximate current demand patterns.. EXAMPLE 13-1. CHI-SQUARE GOODNESS-OF-FIT TEST. Standardized Exams Now that you are in college you have already taken a number of standardized exams such as the SAT and ACT exams. A Southern California company that creates standardized exams a variety of organizations use for applications such as employee aptitude assessment, tries to develop multiple choice questions that are not so difficult that those taking the exams are forced to guess at the answers to most questions. Recently, the company received a complaint about one of the math problems on an exam. The problem listed five possible answers. If the test takers were forced to guess, the company might expect an even percentage of choices across the five possible answers. That.

(87) 552. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. is, each answer is as likely to be a guess as any other answer. To determine if people were actually forced to guess, the company plans to conduct a study to see if the choices on this question are in fact uniformly distributed across the five answers. A chi-square goodness-offit test can be conducted to determine whether test takers are forced to guess at the answer using the following steps: Step 1 Formulate the appropriate null and alternative hypotheses. Because the number of selected choices is supposed to be the same across the five possible answers, the following null and alternative hypotheses are formed: H0: Distribution of choices is uniform across the five answers. HA: Distribution of choices is not uniform. Step 2 Specify the significance level. The test will be conducted using a 0.05. Step 3 Determine the critical value. The critical value for the goodness-of-fit test is a chi-square value from the chisquare distribution, with k 1 5 1 4 degrees of freedom, and a 0.05 is 9.4877. Step 4 Collect the sample data and compute the chi-square test statistic. A random sample of the responses to the exam question for 2,120 people is selected. The following data represent the number of times each of the possible answers was chosen: Answer Times Selected. 1 358. 2 402. 3 577. 4 403. 5 380. Under the hypothesis of a uniform distribution, 20% of this total (424) should be selected for each answer. This is the expected cell frequency. Equation 13.1 is used to form the test statistic based on these sample data. k. 2 . ∑. (oi ei )2 ei. i=11. . (358 424)2 (402 424)2 424 424. (577 424)2 (403 424)2 (380 424)2 424 424 424 72.231 . Step 5 Reach a decision. The decision rule is If 9.4877, reject H0. 2. Otherwise, do not reject H0. Because 72.231 9.4877, reject the null hypothesis. Step 6 Draw a conclusion. We conclude that the answer choices are not occurring equally across the five possible answers. This is evidence to suggest that this particular question is not being answered by random guessing. 2. >>. END EXAMPLE. TRY PROBLEM 13-2 (pg. 559). BUSINESS APPLICATION Excel and Minitab. tutorials. Excel and Minitab Tutorial. USING SOFTWARE TO CONDUCT A GOODNESS-OF-FIT TEST. WOODTRIM PRODUCTS, INC. Woodtrim Products, Inc., makes wood moldings, doorframes, and window frames. It purchases lumber from mills throughout New England and eastern Canada. The first step in the production process is to rip the lumber into narrower.

(88) CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. 553. strips. Different widths are used for different products. For example, wider pieces with no imperfections are used to make door and window frames. Once an operator decides on the appropriate width, that information is locked into a computer and a ripsaw automatically cuts the board to the desired size. The manufacturer of the saw claims that the ripsaw cuts an average deviation of zero from target and that the differences from target will be normally distributed, with a standard deviation of 0.01 inch. Woodtrim has recently become concerned that the ripsaw may not be cutting to the manufacturer’s specifications because operators at other machines downstream in the production process are finding excessive numbers of ripped pieces that are too wide or too narrow. A quality improvement team (QIT) has started to investigate the problem. Team members selected a random sample of 300 boards just as they came off the ripsaw. To provide a measure of control, the only pieces sampled in the initial study had stated widths of 27⁄8 (2.875) inches. Each piece’s width was measured halfway from its end. A portion of the data and the differences between the target 2.875 inches and the actual measured width are shown in Figure 13.4. The full data set is contained in the file Woodtrim. The team can use these data and the chi-square goodness-of-fit testing procedure to test the following null and alternative hypotheses: H0: The differences are normally distributed, with μ 0 and s 0.01. HA: The differences are not normally distributed, with μ 0 and s 0.01. This example differs slightly from the previous examples because the hypothesized distribution is continuous rather than discrete. Thus, we must organize the data into a grouped-data frequency distribution (see Chapter 2), as shown in Figure 13.5. Our choice of classes requires careful consideration. The chi-square goodness-of-fit test compares the actual cell frequencies with the expected cell frequencies. The test statistic from Equation 13.1, k. 2 . ∑ i =1. (oi ei )2 ei. is approximately chi-square distributed if the expected cell frequencies are large. Because the expected cell frequencies are used in computing the test statistic, the general recommendation is that the goodness-of-fit test be performed only when all expected cell frequencies are at least 5. If any of the cells have expected frequencies less than 5, the cells should be combined in a meaningful way such that the expected frequencies are at least 5. We have chosen to use k 6 classes. The number of classes is your choice. You can perform the chi-square goodness-of-fit. FIGURE 13.4. |. Woodtrim Products Test Data. Excel 2007 Instruction:. 1. Open file: Woodtrim.xls..

(89) 554. CHAPTER 13. FIGURE 13.5. |. Goodness-of-Fit Tests and Contingency Analysis. |. Excel 2007 Results— Goodness-of-Fit Test for the Woodtrim Example. Excel 2007 Instructions:. 1. Open file: Woodtrim.xls (see Figure 13.4). 2. Define Classes (column J). 3. Determine observed frequencies [(i.e., cell K4 formula is =COUNTIF($D$2: $D$301,“<0.0”)-SUM($K$2:K3)]. 4. Determine normal distribution probabilities, assuming the mean 0.0 and st.dev 0.01 [ i.e., cell L4 formula is NORMDIST(0,0,0.01,TRUE)-SUM ($L$2:L3)].. 5. Determine expected frequencies by multiplying normal probability by the sample size (n 300). 6. Compute values for chi-square in column N (cell N5 formula is =[K5-M5)^2/M5]. 7. Sum column N to get chi-square statistic. 8. Find p-value using CHITEST function [cell N10 formula is =CHITEST(K2:K7, M2:M7)].. test using Excel functions and formulas and Minitab commands. (The Excel and Minitab tutorials that accompany this text take you through the specific steps required to complete this example.) Figure 13.5 shows the normal distribution probabilities, expected cell frequencies, 2 and the chi-square calculation. The calculated chi-square statistic is 26.432. The p-value 2 associated with 26.432 and 6 1 5 degrees of freedom is 0.0001. Therefore, because p-value 0.0001 is less than any reasonable level of alpha, we reject the null hypothesis and conclude the ripsaw is not currently meeting the manufacturer’s specification. The saw errors are not normally distributed with mean equal to 0 and a standard deviation equal to 0.01. Special Note. Note that in this case, because the null hypothesis specified both the mean and the standard deviation, the normal distribution probabilities were computed using these values. However, if the mean and/or the standard deviation had not been specified, the sample mean and standard deviation would be used in the probability computation. You would lose 1 additional degree of freedom for each parameter that was estimated from the sample data. This is true any time sample statistics are specified in place of population parameters in the hypothesis.. Minitab has a procedure for performing a goodness-of-fit test for a normal distribution. In fact, it offers three different approaches, none of which are exactly the chi-square approach.

(90) CHAPTER 13. FIGURE 13.6. |. Goodness-of-Fit Tests and Contingency Analysis. 555. |. Minitab Output—Test of Normal-Distributed Ripsaw Cuts for the Woodtrim Example. Minitab Instructions:. 1. Open file: Woodtrim.MTW. 2. Choose Stat > Basic Statistics > Normality Test. 3. In Variable, enter data column (Difference). 4. Under Normality test, select KolmogorovSmirnov. 5. Click OK. Because the p-value 0.01 is less than any reasonable level of significance, reject the null hypothesis of normality.. just outlined. Figure 13.6 shows the Minitab results for the Woodtrim example. Consistent with our other Minitab and Excel results, this output illustrates that the null hypothesis should be rejected because the p-value 0.01. EXAMPLE 13-2. GOODNESS-OF-FIT TEST. Early Dawn Egg Company The Early Dawn Egg Company operates an egg-producing operation in Maine. One of the key steps in the egg-production business is packaging eggs into cartons so that eggs arrive at stores unbroken. That means the eggs have to leave the Early Dawn plant unbroken. Because of the high volumes of egg cartons shipped each day, the employees at Early Dawn can’t inspect every carton of eggs. Instead, every hour, 10 cartons are inspected. If two or more contain broken or cracked eggs, a full inspection is done for all eggs produced since the previous inspection an hour earlier. If the inspectors find one or fewer cartons containing cracked or broken eggs, they ship that hour’s production without further analysis. The company’s contract with retailers calls for at most 10% of the egg cartons to have broken or cracked eggs. At issue is whether Early Dawn Egg Company managers can evaluate this sampling plan using a binomial distribution with n 10 and p 0.10. To test this, a goodness-of-fit test can be performed using the following steps: Step 1 State the appropriate null and alternative hypotheses. In this case, the null and alternative hypotheses are H0: Distribution of defects is binomial, with n 10 and p 0.10. HA: Distribution is not binomial, with n 10 and p 0.10. Step 2 Specify the level of significance. The test will be conducted using a 0.025..

(91) 556. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. Step 3 Determine the critical value. The critical value depends on the number of degrees of freedom and the level of significance. The degrees of freedom will be equal to k 1, where k is the number of categories for which observed and expected frequencies will be recorded. In this case, the managers have set up the following groups: Defects: 0 1 2 3 and over Therefore, k 4, and the degrees of freedom are 4 1 3. The critical chisquare value for a 0.025 found in Appendix G is 9.3484. Step 4 Collect the sample data and compute the chi-square test statistic using Equation 13.1. The company selected a simple random sample of 100 hourly test results from past production records and recorded the number of defective cartons when the sample of 10 cartons was inspected. The following table shows the computations for the chi-square statistic.. o Observed Defects. Binomial Probability n 10, p 0.10. e Expected Frequency. (oi ei ) 2 ei. 0. 30. 0.3487. 34.87. 0.6802. 1 2. 40 20. 0.3874 0.1937. 38.74 19.37. 0.0410 0.0205. 3 and over. 10. 0.0702. 7.02. 1.2650. Defective Cartons. Total. 100. 2.0067. The calculated chi-square test statistic is 2.0067. Step 5 Reach a decision. 2 Because 2.0067 is less than the critical value of 9.3484, we do not reject the null hypothesis. Step 6 Draw a conclusion. The binomial distribution may be the appropriate distribution to describe the company’s sampling plan. 2. >>END. EXAMPLE. TRY PROBLEM 13-1 (pg. 559). EXAMPLE 13-3. GOODNESS-OF-FIT TEST. University Internet Service Students in a computer information systems class at a major university have established an Internet service provider (ISP) company for the university’s students, faculty, and staff. Customers of this ISP connect via a wireless signal available throughout the campus and surrounding business area. Capacity is always an issue for an ISP, and the students had to estimate the capacity demands for their service. Before opening for business, the students conducted a survey of likely customers. Based on this survey, they estimated that demand during the late afternoon and evening hours is Poisson distributed (refer to Chapter 5) with a mean equal to 10 users per hour. Based on this assumption, the students developed the ISP with the capacity to handle 20 users simultaneously. However, they have lately been receiving complaints from customers saying they have been denied access to the system because 20 users are.

(92) CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. 557. already online. The students are now interested in determining whether the demand distribution is still Poisson distributed with a mean equal to 10 per hour. To test this, they have collected data on the number of user requests for ISP access for 225 randomly selected time periods during the heavy-use hours. The following steps can be used to conduct the statistical test: Step 1 State the appropriate null and alternative hypotheses. The null and alternative hypotheses are H0: Demand distribution is Poisson distributed with mean equal to 10 users per time period. HA: The demand distribution is not distributed as a Poisson distribution with mean equal to 10 per period. Step 2 Specify the level of significance. The hypothesis test will be conducted using a 0.05. Step 3 Determine the critical value. The critical value depends on the level of significance and the number of degrees of freedom. The degrees of freedom is equal to k 1, where k is the number of categories. In this case, after collapsing the categories to get the expected frequencies to be at least 5, we have 13 categories. Thus, the degrees of freedom for the chi-square critical value is 13 1 12. For 12 degrees of freedom and a level of significance equal to 0.05, from 2 Appendix G we find a critical value of 21.0261. Thus the decision rule is If 21.0261, reject the null hypothesis. 2. Otherwise, do not reject. Step 4 Collect the sample data and compute the chi-square test statistic using Equation 13.1. A random sample of 225 time periods was selected, and the number of users requesting access to the ISP at each time period was recorded. The observed frequencies based on the sample data are as follows:. Number of Requests. Observed Frequency. Number of Requests. Observed Frequency. 0 1. 0 2. 10 11. 18 14. 2. 1. 12. 17. 3. 3. 13. 18. 4. 4. 14. 25. 5. 3. 15. 28. 6. 8. 16. 23. 7. 6. 17. 17. 8. 11. 18. 9. 9. 7. 19 and over. 11. Total. 225. To compute the chi-square test statistic you must determine the expected frequencies. Start by determining the probability for each number of user requests based on the hypothesized distribution. (Poisson with lt 10.).

(93) 558. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. The expected frequencies are calculated by multiplying the probability by the total observed frequency of 225. These results are as follows:. Number of Requests. Observed Frequency. Poisson Probability lt 10. Expected Frequency. 0 1. 0 2. 0.0000 0.0005. 0.00 0.11. 2. 1. 0.0023. 0.52. 3. 3. 0.0076. 1.71. 4. 4. 0.0189. 4.25. 5. 3. 0.0378. 8.51. 6. 8. 0.0631. 14.20. 7. 6. 0.0901. 20.27. 8. 11. 0.1126. 25.34. 9. 7. 0.1251. 28.15. 10. 18. 0.1251. 28.15. 11. 14. 0.1137. 25.58. 12. 17. 0.0948. 21.33. 13. 18. 0.0729. 16.40. 14. 25. 0.0521. 11.72. 15. 28. 0.0347. 7.81. 16. 23. 0.0217. 4.88. 17. 17. 0.0128. 2.88. 18 19 and over. 9 11. 0.0071 0.0072. 1.60 1.62. Total. 225. 1. 225. Now you need to check if any of the expected cell frequencies are less than 5. In this case, we see there are several instances where this is so. To deal with this, collapse categories so that all expected frequencies are at least 5. Doing this gives the following:. Number of Requests. Observed Frequency. Poisson Probability lt 10. Expected Frequency. 4 or fewer 5. 10 3. 0.0293 0.0378. 6.59 8.51. 6. 8. 0.0631. 14.20. 7. 6. 0.0901. 20.27. 8. 11. 0.1126. 25.34. 9. 7. 0.1251. 28.15. 10. 18. 0.1251. 28.15. 11. 14. 0.1137. 25.58. 12. 17. 0.0948. 21.33. 13. 18. 0.0729. 16.40. 14 15 16 or more. 25 28 60. 0.0521 0.0347 0.0488. 11.72 7.81 10.98. Total. 225. 1. 225.

(94) CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. 559. Now we can compute the chi-square test statistic using Equation 13.1 as follows: k. 2 . ∑ i =1. (oi ei )2 ei. (10 6.59)2 (3 8.51)2 . . . (60 10.98)2 6.59 8.51 10.98 338.1 . Step 5 Reach a decision. 2 Because 338.1 21.0261, reject the null hypothesis. Step 6 Draw a conclusion. The demand distribution is not Poisson distributed with a mean of 10. The students should conclude that either the mean demand per period has increased from 10 or the distribution is not Poisson or both. They may need to add more capacity to the ISP business. >>END. EXAMPLE. TRY PROBLEM 13-3 (pg. 559). MyStatLab. 13-1: Exercises Skill Development 13-1. A large retailer receives shipments of batteries for consumer electronic products in packages of 50 batteries. The packages are held at a distribution center and are shipped to retail stores as requested. Because some packages may contain defective batteries, the retailer randomly samples 400 packages from its distribution center and tests to determine whether the batteries are defective or not. The most recent sample of 400 packages revealed the following observed frequencies for defective batteries per package: # of Defective Batteries per Package. Frequency of Occurrence. 0 1. 165 133. 2. 65. 3. 28. 4 or more. 9. The retailer’s managers would like to know if they can evaluate this sampling plan using a binomial distribution with n 50 and p 0.02. Test at the a 0.01 level of significance. 13-2. The following frequency distribution shows the number of times an outcome was observed from the toss of a die. Based on the frequencies that were observed from 2,400 tosses of the die, can it be. concluded at the 0.05 level of significance that the die is fair? Outcome. Frequency. 1 2. 352 418. 3. 434. 4. 480. 5. 341. 6. 375. 13-3. Based on the sample data in the following frequency distribution, conduct a test to determine whether the population from which the sample data were selected is Poisson distributed with mean equal to 6. Test using a 0.05. x. Frequency. x. Frequency. 2 or less 3. 7 29. 9 10. 53 35. 4. 26. 11. 28. 5. 52. 12. 18. 6. 77. 13. 13. 7. 77. 14 or more. 8. 72. Total. 13 500.

(95) 560. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. 13-4. A chi-square goodness-of-fit test is to be conducted to test whether a population is normally distributed. No statement has been made regarding the value of the population mean and standard deviation. A frequency distribution has been formed based on a random sample of 1,000 values. The frequency distribution has k 8 classes. Assuming that the test is to be conducted at the a 0.10 level, determine the correct decision rule to be used. 13-5. An experiment is run that is claimed to have a binomial distribution with p 0.15 and n 18 and the number of successes is recorded. The experiment is conducted 200 times with the following results: Number of Successes Observed Frequency. 0 80. 1 75. 2 39. 3 6. 4 0. 5 0. Using a significance level of 0.01, is there sufficient evidence to conclude that the distribution is binomially distributed with p 0.15 and n 18? 13-6. Data collected from a hospital emergency room reflect the number of patients per day that visited the emergency room due to cardiac-related symptoms. It is believed that the distribution of the number of cardiac patients entering the emergency room per day over a two-month period has a Poisson distribution with a mean of 8 patients per day. 6 9 12 4 8 8. 7 9 12 9 8 11. 9 7 10 6 10 7. 7 2 8 4 7 9. 5 8 8 11 9 11. 6 5 14 9 2 7. 7 7 7 10 10 16. 7 10 9 7 12 7. 13-8. Managers of a major book publisher believe that the occurrence of typographical errors in the books the company publishes is Poisson distributed with a mean of 0.2 per page. Because of some customer quality complaints, the managers have arranged for a test to be conducted to determine if the error distribution still holds. A total of 400 pages were randomly selected and the number of errors per page was counted. These data are summarized in the following frequency distribution:. 5 6 10 5 10 9. 10 7 7 10 9 10. Use a chi-square goodness-of-fit test to determine if the data come from a Poisson distribution with mean of 8. Test using a significance level of 0.01.. Business Applications 13-7. HSBC Bank is a large, London-based international banking company. One of its most important sources of income is home loans. A component of its effort to maintain and increase its customer base is excellent service. The loan manager at one of its branches in New York keeps track of the number of loan applicants who visit his branch’s loan department per week. Having enough loan officers available is one of the ways of providing excellent service. Over the last year, the loan manager accumulated the following data: Number of Customers. 0. 1. 2. 3. 4. 5.

(96) 6. Frequencies. 1. 2. 9. 11. 14. 6. 9. From previous years, the manager believes that the distribution of the number of customer arrivals has a Poisson distribution with an average of 3.5 loan applicants per week. Determine if the loan officer’s belief is correct using a significance level of 0.025.. Errors. Frequency. 0 1 2 3 Total. 335 56 7 2 400. Conduct the appropriate hypothesis test using a significance level equal to 0.01. Discuss the results. 13-9. The Baltimore Steel and Pipe Company recently developed a new pipe product for a customer. According to specifications, the pipe is supposed to have an average outside diameter of 2.00 inches with a standard deviation equal to 0.10 inch, and the distribution of outside diameters is to be normally distributed. Before going into full-scale production, the company selected a random sample of 30 sections of pipe from the initial test run. The following data were recorded: Pipe Section. Diameter (inches). Pipe Section. Diameter (inches). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15. 2.04 2.13 2.07 1.99 1.90 2.06 2.19 2.01 2.05 1.98 1.95 1.90 2.10 2.02 2.11. 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30. 1.96 1.89 1.99 2.13 1.90 1.91 1.95 2.18 1.94 1.93 2.08 1.82 1.94 1.96 1.81. a. Using a significance level of 0.01, perform the appropriate test. b. Based on these data, should the company conclude that it is meeting the product specifications? Explain your reasoning. 13-10. Quality control managers work in every type of production environment possible, from producing dictionaries to dowel cutting for boat plugs. The Cincinnati Dowel & Wood Products Co., located in.

(97) CHAPTER 13. Mount Orab, Ohio, manufactures wood dowels and wood turnings. Four-inch-diameter boat plugs are one of its products. The quality control procedures aimed at maintaining the 4-inch diameter are only valid if the diameters have a normal distribution. The quality control manager recently obtained the following summary diameters taken from randomly selected boat plugs on the production line: Interval 3.872 3.872–3.916 3.917–3.948 3.949–3.975 3.976–4.000. Frequency 4 6 11 9 5. Interval 4.001–4.025 4.026–4.052 4.053–4.084 4.085–4.128 4.128. Frequency 8 2 4 0 1. The boat plug diameters are specified to have a normal distribution with a mean of 4 inches and a standard deviation of 0.10. Determine if the distribution of the 4-inch boat plugs is currently adhering to specification. Use a chi-square goodness-of-fit test and a significance level of 0.05.. Computer Database Exercises 13-11. The owners of Big Boy Burgers are considering remodeling their facility to include a drive-thru window. There will be room for three cars in the drivethru line if they build it. However, they are concerned that the capacity may be too low during their busy lunch time hours between 11:00 A.M. and 1:30 P.M. One of the factors they need to know is the distribution of the length of time it takes to fill an order for cars coming to the drive-thru. To collect information on this, the owners have received permission from a similar operation owned by a relative in a nearby town to collect some data at that drive-thru. The data in the file called Clair’s Deli reflect the service time per car. Based on these sample data, is there sufficient evidence to conclude that the distribution of service time is not normally distributed? Test using the chisquare distribution and a 0.05. 13-12. Executives at The Walt Disney Company are interested in estimating the mean spending per capita for people who visit Disney World in Orlando, Florida. Since they do not know the population standard deviation, they plan to use the t-distribution (see Chapter 9) to conduct the test. However, they realize that the t-distribution requires that the population be normally distributed. Six hundred customers were randomly surveyed, and the amount spent during their stay at Disney World was recorded. These data are in the file called Disney. Before using these sample data to estimate the population mean, the managers wish to test to determine whether the population is normally distributed. a. State the appropriate null and alternative hypotheses.. |. Goodness-of-Fit Tests and Contingency Analysis. 561. b. Organize the data into six classes and form the grouped data frequency distribution (refer to Chapter 2). c. Using the sample mean and sample standard deviation, calculate the expected frequencies, assuming that the null hypothesis is true. d. Conduct the test statistic and compare it to the appropriate critical value for a significance level equal to 0.05. What conclusion should be reached? Discuss. 13-13. Again, working with the data in Problem 13-11, the number of cars that arrive in each 10-minute period is another factor that will determine whether there will be the capacity to handle the drive-thru business. In addition to studying the service times, the owners also counted the number of cars that arrived at the deli in the nearby town in a sample of 10-minute time periods. These data are as follows: 3 2 3 0 0 1 2 1 2 4. 2 3 3 2 3 1 4 2 1 1. 0 3 3 3 3 0 9 4 1 3. Based on these data, is there evidence to conclude that the arrivals are not Poisson distributed? State the appropriate null and alternative hypotheses and test using a significance level of 0.025. 13-14. Damage to homes caused by burst piping can be expensive to repair. By the time the leak is discovered, hundreds of gallons of water may have already flooded the home. Automatic shutoff valves can prevent extensive water damage from plumbing failures. The valves contain sensors that cut off water flow in the event of a leak, thereby preventing flooding. One important characteristic is the time (in milliseconds) required for the sensor to detect the water flow. The data obtained for four different shutoff valves are contained in the file entitled Waterflow. The differences between the observed time for the sensor to detect the water flow and the predicted time (termed residuals) are listed and are assumed to be normally distributed. Using the four sets of residuals given in the data file, determine if the residuals have a normal distribution. Use a chi-square goodness-of-fit test and a significance level of 0.05. Use five groups of equal width to conduct the test. 13-15. An article in the San Francisco Chronicle indicated that just 38% of drivers crossing the San Francisco Bay Area’s seven state-owned bridges pay their tolls electronically, compared with rates nearing 80% at systems elsewhere in the nation. Albert Yee, director.

(98) 562. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. of highway and arterial operations for the regional Metropolitan Transportation Commission, indicated that the commission is eager to drive up the percentage of tolls paid electronically. In an attempt to see if its efforts are producing the required results, 15 vehicles each day are tracked through the toll lanes of the Bay Area bridges. The number of drivers using electronic payment to pay their toll for a period of three months appears in the file entitled Fastrak.. a. Determine if the distribution of the number of FasTrak users could be described as a binomial distribution with a population proportion equal to 0.50. Use a chi-square goodness-of-fit test and a significance level of 0.05. b. Conduct a test of hypothesis to determine if the percent of tolls paid electronically has increased to more than 70% since Yee’s efforts. END EXERCISES 13-1. Chapter Outcome 2.. 13.2 Introduction to Contingency Analysis In Chapters 9 and 10 you were introduced to hypothesis tests involving one and two population proportions. Although these techniques are useful in many cases, you will also encounter many situations involving multiple population proportions. For example, a mutual fund company offers six different mutual funds. The president of the company may wish to determine if the proportion of customers selecting each mutual fund is related to the four sales regions in which the customers reside. A hospital administrator who collects service-satisfaction data from patients might be interested in determining whether there is a significant difference in patient rating by hospital department. A personnel manager for a large corporation might be interested in determining whether there is a relationship between level of employee job satisfaction and job classification. In each of these cases, the proportions relate to characteristic categories of the variable of interest. The six mutual funds, four sales regions, hospital departments, and job classifications are all specific categories. These situations involving categorical data call for a new statistical tool known as contingency analysis to help make decisions when multiple proportions are involved. Contingency analysis can be used when a level of data measurement is either nominal or ordinal and the values are determined by counting the number of occurrences in each category.. 2 2 Contingency Tables BUSINESS APPLICATION. APPLYING CONTINGENCY ANALYSIS. DALGARNO PHOTO, INC. Dalgarno Photo, Inc., gets much of its business from taking photographs for college yearbooks. Dalgarno hired a first-year masters of business administration (MBA) student to develop the survey it mailed to 850 yearbook representatives at the colleges and universities in its market area. The representatives were unaware that Dalgarno Photo had developed the survey. The survey asked about the photography and publishing activities associated with yearbook development. For instance, what photographer and publisher services did the schools use, and what factors were most important in selecting services? The survey instrument contained 30 questions, which were coded into 137 separate variables. Among his many interests in this study, Dalgarno’s marketing manager questioned whether college funding source and gender of the yearbook editor were related in some manner. To analyze this issue, we examine these two variables more closely. Source of university funding is a categorical variable, coded as follows: 1 Private funding 2 State funding Of the 221 respondents who provided data for this variable, 155 came from privately funded colleges or universities and 66 were from state funded institutions..

(99) CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. 563. The second variable, gender of the yearbook editor, is also a categorical variable, with two response categories, coded as follows: 1 Male 2 Female Contingency Table A table used to classify sample observations according to two or more identifiable characteristics. It is also called a crosstabulation table.. | Contingency Table for Dalgarno Photo TABLE 13.3. Source of Funding Gender. Private. State. Male Female. 14. 43. 57. 141. 23. 164. 155. 66. 221. Of the 221 responses to the survey, 164 were from females and 57 were from males. In cases in which the variables of interest are both categorical and the decision maker is interested in determining whether a relationship exists between the two, a statistical technique known as contingency analysis is useful. We first set up a two-dimensional table called a contingency table. The contingency table for these two variables is shown in Table 13.3. Table 13.3 shows that 14 of the respondents were males from schools that are privately funded. The numbers at the extreme right and along the bottom are called the marginal frequencies. For example, 57 respondents were males, and 155 respondents were from privately funded institutions. The issue of whether there is a relationship between responses to these two variables is formally addressed through a hypothesis test, in which the null and alternative hypotheses are stated as follows: H0: Gender of yearbook editor is independent of the college’s funding source. HA: Gender of yearbook editor is not independent of the college’s funding source. If the null hypothesis is true, the population proportion of yearbook editors from private institutions who are males should be equal to the proportion of male editors from state-funded institutions. These two proportions should also equal the population proportion of male editors without regard to a school’s funding source. To illustrate, we can use the sample data to determine the sample proportion of male editors as follows: PM . 57 Number of male editors 0.2579 nts 221 Number of responden. Then, if the null hypothesis is true, we would expect 25.79% of the 155 privately funded schools, or 39.98 schools, to have a male yearbook editor. We would also expect 25.79% of the 66 state-funded schools, or 17.02, to have male yearbook editors. (Note that the expected numbers need not be integer values. Note also that the sum of expected frequencies in any column or row add up to the marginal frequency.) We can use this reasoning to determine the expected number of respondents in each cell of the contingency table, as shown in Table 13.4. You can simplify the calculations needed to produce the expected values for each cell. Note that the first cell’s expected value, 39.98, was obtained by the following calculation: e11 0.2579(155) 39.98 However, because the probability, 0.2579, is calculated by dividing the row total, 57, by the grand total, 221, the calculation can also be represented as e11 . ( Row total)(Column total) (57)(155) 39.98 Grand total 221. | Contingency Table for Dalgarno Photo TABLE 13.4. Source of Funding Gender Male. Female. Private. State. o11 14. o12 43. e11 39.98. e12 17.02. o21 141. o22 23. e21 115.02. e22 48.98. 155. 66. 57. 164. 221.

(100) 564. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. As a further example, we can calculate the expected value for the next cell in the same row. The expected number of male yearbook editors in state-funded schools is e12 . (Row total)(Column total) (57)(66) 17.02 221 Grand total. Keep in mind that the row and column totals (the marginal frequencies) must be the same for the expected values as for the observed values. Therefore, when there is only one cell left in a row or a column for which you must calculate an expected value, you can obtain it by subtraction. So, as an example, the expected value e12 could have been calculated as e12 57 39.98 17.02 Allowing for sampling error, we would expect the actual frequencies in each cell to approximately match the corresponding expected cell frequencies when the null hypothesis is true. The greater the difference between the actual and the expected frequencies, the more likely the null hypothesis of independence is false and should be rejected. The statistical test to determine whether the sample data support or refute the null hypothesis is given by Equation 13.2. Do not be confused by the double summation in Equation 13.2; it merely indicates 2 that all rows and columns must be used in calculating . As was the case in the goodness-offit tests, the degrees of freedom are the number of independent data values obtained from the experiment. In any given row, once you know c 1 of the data values, the remaining data value is determined. For instance, once you know that 14 of the 57 male editors were from privately funded institutions, you know that 43 were from state-funded institutions. Chi-Square Contingency Test Statistic r. 2 . c. ∑∑ i1 j1. (oij eij )2 eij. with df (r 1)(c 1)). (13.2). where: oij Observed frequency in cell (i, j) eij Expected frequency in cell (i, j) r Number of rows c Number of columns Similarly, once r 1 data values in a column are known, the remaining data value is determined. Therefore, the degrees of freedom are obtained by the expression (r 1)(c 1). Figure 13.7 presents the hypotheses and test results for this example. As was the case in the goodness-of-fit tests, the test statistic has a distribution that can be approximated by the chi-square distribution if the expected values are larger than 5. Note that the calculated chisquare statistic is compared to the tabled value of chi-square for an a 0.05 and degrees of 2 freedom (2 1)(2 1) 1. Because 76.19 3.8415, the null hypothesis of independence should be rejected. Dalgarno Photo representatives should conclude that the gender of the yearbook editor and each school’s source of funding are not independent. By examining the data in Figure 13.7, you can see that private schools are more likely to have female editors, whereas state schools are more likely to have male yearbook editors.. EXAMPLE 13-4. 2 2 CONTINGENCY ANALYSIS. Wireridge Marketing Before releasing a major advertising campaign to the media, Wireridge Marketing managers run a test on the media material. Recently, they randomly called 100 people and asked them to listen to a commercial that was slated to run nationwide on the radio. At the end of the commercial, the respondents were asked to name the company that was in the advertisement. The company is interested in determining.

(101) CHAPTER 13. FIGURE 13.7. |. Chi-Square Contingency Analysis Test for Dalgarno Photo. |. Goodness-of-Fit Tests and Contingency Analysis. 565. Hypotheses: H0 : Gender of yearbook editor is independent of college’s funding source. HA: Gender of yearbook editor is not independent of college’s funding source. a = 0.05. Male. Female. Private o 11 14 e11 39.98. State o12 43 e12 17.02. o21 141. o22 23 e22 48.98. e21 115.02 Test Statistic: χ2 =. r. 2 (oij – eij). c. ΣΣ. i1 j1. eij. . (14 39.98)2 2. (141 115.02) 115.02. . (43 17.02)2. 39.98 17.02 (23 48.98)2 76.19 48.98. f(χ2). df (r –1) (c–1) (1) (1) 1. χ2 χ2. 0.05. 3.8415. Decision Rule: If χ2 3.8415, reject H0 ; Otherwise, do not reject H0. Because 76.19 3.8415, reject H0. Thus, the gender of the yearbook editor and the school’s source of funding are not independent.. whether there is a relationship between gender and a person’s ability to recall the company name. To test this, the following steps can be used: Step 1 Specify the null and alternative hypotheses. The company is interested in testing whether a relationship exists between gender and recall ability. Here are the appropriate null and alternative hypotheses. H0: Ability to correctly recall the company name is independent of gender. HA: Recall ability and gender are not independent. Step 2 Determine the significance level. The test will be conducted using a 0.01 level of significance. Step 3 Determine the critical value. The critical value for this test will be the chi-square value, with (r 1)(c 1) (2 1)(2 1) 1 degree of freedom with an a 0.01. From Appendix G, the critical value is 6.6349..

(102) 566. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. Step 4 Collect the sample data and compute the chi-square test statistic using Equation 13.2. The following contingency table shows the results of the sampling: Female. Male. Total. Correct Recall Incorrect Recall. 33 22. 25 20. 58 42. Total. 55. 45. 100. Note that 58 percent of the entire data values result in a correct recall. If the ability to correctly recall the company name is independent of gender you would expect the same percentage (58%) would occur for each gender. Thus, 58% of the males [.58(45) 26.10] would be expected to have a correct recall. In general, the expected cell frequencies are determined by multiplying the row total by the column total and dividing by the overall sample size. For example, for the cell corresponding to female and correct recall, we get Expected . 58 55 31.90 100. The expected cell values for all cells are. Correct Recall Incorrect Recall Total. Female. Male. o 33 e 31.90 o 22 e 23.10. o 25 e 26.10 o 20 e 18.90. 55. Total. 45. 58 42 100. After checking to make sure all the expected cell frequencies

(103) 5, the test statistic is computed using Equation 13.2. r. 2 . c. ∑∑. i1 j1. . (oi j eij )2 eij. (33 31.90)2 (25 26.10)2 (22 23.10)2 (20 18.90)2 0.20 31.90 18.90 26.10 23.10. Step 5 Reach a decision. 2 Because 0.20 6.6349, do not reject the null hypothesis. Step 6 Draw a conclusion. Based on the sample data, there is no reason to believe that being able to recall the name of the company in the ad is related to gender. >>END. EXAMPLE. TRY PROBLEM 13-17 (pg. 569). r c Contingency Tables BUSINESS APPLICATION. Excel and Minitab. tutorials. Excel and Minitab Tutorial. LARGER CONTINGENCY TABLES. BENTON STONE & TILE Benton Stone & Tile makes a wide variety of products for the building industry. It pays market wages, provides competitive benefits, and offers attractive options for employees in an effort to create a satisfied workforce and reduce turnover. Recently, however, several supervisors have complained that employee absenteeism is becoming a problem. In response to these complaints, the human resources manager studied a random.

(104) CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. 567. sample of 500 employees. One aim of this study was to determine whether there is a relationship between absenteeism and marital status. Absenteeism during the past year was broken down into three levels: 1. 0 absences 2. 1 to 5 absences 3. Over 5 absences Marital status was divided into four categories: 1. Single 3. Divorced. 2. Married 4. Widowed. Table 13.5 shows the contingency table for the sample of 500 employees. The table is also shown in the file Benton. The null and alternative hypotheses to be tested are H0: Absentee behavior is independent of marital status. HA: Absentee behavior is not independent of marital status. As with 2 2 contingency analysis, the test for independence can be made using the chisquare test, where the expected cell frequencies are compared to the actual cell frequencies and the test statistic shown as Equation 13.2 is used. The logic of the test says that if the actual and expected frequencies closely match, then the null hypothesis of independence is not rejected. However, if the actual and expected cell frequencies are substantially different overall, the null hypothesis of independence is rejected. The calculated chi-square statistic is compared to an Appendix G critical value for the desired significance and degrees of freedom equal to (r 1)(c 1). The expected cell frequencies are determined assuming that the row and column variables are independent. This means, for example, that the probability of a married person being absent more than 5 days during the year is the same as the probability of any employee being absent more than 5 days. An easy way to compute the expected cell frequencies, eij, is given by Equation 13.3.. Expected Cell Frequencies eij . (ith row total)( jth column total) total s ample size. (13.3). For example, the expected cell frequency for row 1, column 1 is e11 . (200)(200) 80 500. and the expected cell frequency for row 2, column 3 is e23 . TABLE 13.5. |. (150)(100) 30 500. Contingency Table for Benton Stone & Tile Absentee Rate. Marital Status. 0. 1–5. Over 5. Row Totals. Single Married Divorced Widowed. 84 50 50 16. 82 64 34 20. 34 36 16 14. 200 150 100 50. 200. 200. 100. 500. Column Total.

(105) 568. CHAPTER 13. FIGURE 13.8A. |. Goodness-of-Fit Tests and Contingency Analysis. |. Excel 2007 Output—Benton Stone & Tile Contingency Analysis Test. Excel 2007 Instructions:. 1. Open file: Benton.xls. 2. Compute expected cell frequencies using Excel formula. 3. Compute chi-square statistics using Excel formula.. Expected frequency found using =(D$16*$F13)/$F$16).. Figures 13.8a and 13.8b show the completed contingency table with the actual and expected cell frequencies that were developed using Excel and Minitab. The calculated chi-square test value is computed as follows: r. 2. . c. ∑∑. i1 j1. (oij eij )2 eij. (84 80)2 (82 80)2 . . . (20 20)2 (14 10)2 80 80 20 10 10.88. . FIGURE 13.8B. |. Minitab Output—Benton Stone & Tile Contingency Analysis Test. Minitab Instructions:. 1. Open file: Benton.MTW. 2. Choose Stat > Tables > Chi-Square Test. 3. In Columns containing the table, enter data columns. 4. Click OK..

(106) CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. 569. The degrees of freedom are (r 1)(c 1) (4 1)(3 1) 6. You can use the chi-square table in Appendix G to get the chi-square critical value for a 0.05 and 6 degrees of freedom, or you can use Minitab’s Probability Distributions command or Excel’s CHIINV function (CHIINV(0.05,6) 12.5916). Because the calculated chi-square value (10.883) shown in Figures 13.8a and 13.8b is less than 12.5916, we cannot reject the null hypothesis. Based on these sample data, there is insufficient evidence to conclude that absenteeism and marital status are not independent.. Chi-Square Test Limitations The chi-square distribution is only an approximation for the true distribution for contingency analysis. We use the chi-square approximation because the true distribution is impractical to compute in most instances. However, the approximation (and, therefore, the conclusion reached) is quite good when all expected cell frequencies are at least 5.0. When expected cell frequencies drop below 5.0, the calculated chi-square value tends to be inflated and may inflate the true probability of a Type I error beyond the stated significance level. As a rule, if the null hypothesis is not rejected, you do not need to worry when the expected cell frequencies drop below 5.0. There are two alternatives that can be used to overcome the small expected-cell-frequency problem. The first is to increase the sample size. This may increase the marginal frequencies in each row and column enough to increase the expected cell frequencies. The second option is to combine the categories of the row and/or column variables. If you do decide to group categories together, there should be some logic behind the resulting categories. You don’t want to lose the meaning of the results through poor groupings. You will need to examine each situation individually to determine whether the option of grouping classes to increase expected cell frequencies makes sense.. MyStatLab. 13-2: Exercises Skill Development 13-16. The billing department of a national cable service company is conducting a study of how customers pay their monthly cable bills. The cable company accepts payment in one of four ways: in person at a local office, by mail, by credit card, or by electronic funds transfer from a bank account. The cable company randomly sampled 400 customers to determine if there is a relationship between the customer’s age and the payment method used. The following sample results were obtained: Age of Customer Payment Method. 20–30. 31–40. 41–50. Over 50. In Person By Mail By Credit Card By Funds Transfer. 8 29 26 23. 12 67 19 35. 11 72 5 17. 13 50 7 6. Based on the sample data, can the cable company conclude that there is a relationship between the age of the customer and the payment method used? Conduct the appropriate test at the a 0.01 level of significance. 13-17. A contingency analysis table has been constructed from data obtained in a phone survey of customers. in a market area in which respondents were asked to indicate whether they owned a domestic or foreign car and whether they were a member of a union or not. The following contingency table is provided.. Car. Union Yes No. Domestic Foreign. 155 40. 470 325. a. Use the chi-square approach to test whether type of car owned (domestic or foreign) is independent of union membership. Test using an a 0.05 level. b. Calculate the p-value for this hypothesis test. 13-18. Utilize the following contingency table to answer the questions listed below.. R1 R2 R3. C1. C2. 51 146 240. 207 185 157. a. State the relevant null and alternative hypotheses. b. Calculate the expected values for each of the cells. c. Compute the chi-square test statistic for the hypothesis test..

(107) 570. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. d. Determine the appropriate critical value and reach a decision for the hypothesis test. Use a significance level of 0.05. e. Obtain the p-value for this hypothesis test. 13-19. A manufacturer of sports drinks has randomly sampled 198 men and 202 women. Each sampled participant was asked to taste an unflavored version and a flavored version of a new sports drink currently in development. The participants’ preferences are shown below:. Men Women. Flavored. Unflavored. 101 68. 97 134. Newspaper Radio/TV Internet. 20–30. 31–40. 41–50. Over 50. 19 27 104. 62 125 113. 95 168 37. 147 88 15. At the 0.01 level of significance, can the marketing research firm conclude that there is a relationship between the age of the individual and the individual’s preferred source for news? 13-21. A loan officer wished to determine if the marital status of loan applicants was independent of the approval of loans. The following table presents the result of her survey:. Single Married Divorced. Rejected. 213 374 358. 189 231 252. B. C. D. F. Total. Front Middle Back. 18 7 3. 55 42 15. 30 95 104. 3 11 14. 0 1 2. 106 156 138. Total. 28. 112. 229. 28. 3. 400. 13-23. A study was conducted to determine if there is a difference between the investing preferences of midlevel managers working in the public and private sectors in New York City. A random sample of 320 public sector employees and 380 private sector employees was taken. The sampled participants were then asked about their retirement investment decisions and classified as being either “aggressive,” if they invested only in stocks or stock mutual funds, or “balanced,” if they invested in some combination of stocks, bonds, cash, and other. The following results were found:. Public Private. Age of Respondent. Approved. A. Business Applications. a. State the relevant null and alternative hypotheses. b. Conduct the appropriate test and state a conclusion. Use a level of significance of 0.05. 13-20. A marketing research firm is conducting a study to determine if there is a relationship between an individual’s age and the individual’s preferred source of news. The research firm asked 1,000 individuals to list their preferred source for news: newspaper, radio and television, or the Internet. The following results were obtained: Preferred News Source. seating location and grade using a significance level equal to 0.05?. a. Conduct the appropriate hypothesis test that will provide an answer to the loan officer. Use a significance level of 0.01. b. Calculate the p-value for the hypothesis test in part a. 13-22. An instructor in a large accounting class is interested in determining whether the grades that students get are related to how close to the front of the room the students sit. He has categorized the room seating as “Front,” “Middle,” and “Back.” The following data were collected over two sections with 400 total students. Based on the sample data, can you conclude that there is a dependency relationship between. Aggressive. Balanced. 164 236. 156 144. a. State the hypothesis of interest and conduct the appropriate hypothesis test to determine whether there is a relationship between employment sector and investing preference. Use a level of significance of 0.01. b. State the conclusion of the test conducted in part a. c. Calculate the p-value for the hypothesis test conducted in part a. 13-24. The following table classifies a stock’s price change as up, down, or no change for both today’s and yesterday’s prices. Price changes were examined for 100 days. A financial theory states that stock prices follow what is called a “random walk.” This means, in part, that the price change today for a stock must be independent of yesterday’s price change. Test the hypothesis that daily stock price changes for this stock are independent. Let a 0.05. Price Change Previous Day. Price Change Today. Up. Up No Change Down. 14 6 16. No Change 16 8 14. Down 12 6 8. 13-25. A local appliance retailer handles four washing machine models for a major manufacturer: standard, deluxe, superior, and XLT. The marketing manager has recently conducted a study on the purchasers of the washing machines. The study recorded the model of appliance purchased and the credit account balance of.

(108) CHAPTER 13. the customer at the time of purchase. The sample data are in the following table. Based on these data, is there evidence of a relationship between the account balance and the model of washer purchased? Use a significance level of 0.025. Conduct the test using a p-value approach. Credit Balance. Washer Model Purchased Standard. Deluxe. Superior. XLT. 10 8 16. 16 12 12. 40 24 16. 5 15 30. Under $200 $200–$800. Over $800. 13-26. A random sample of 980 heads of households was taken from the customer list for State Bank and Trust. Those sampled were asked to classify their own attitudes and their parents’ attitudes toward borrowing money as follows: A: Borrow only for real estate and car purchases B: Borrow for short-term purchases such as appliances and furniture C: Never borrow money The following table indicates the responses from those in the study.. |. Goodness-of-Fit Tests and Contingency Analysis. 571. 13-28. In its ninth year, the Barclaycard Business Travel Survey has become an information source for business travelers not only in the United Kingdom but internationally as well. Each year, as a result of the research, Barclaycard Business has been able to predict and comment on trends within the business travel industry. One question asked in the 2003/2004 and 2004/2005 surveys was, “Have you considered reducing hours spent away from home to increase quality of life?” The following table represents the responses: 2003/2004 Yes—have reduced Yes—not been able to No—not certain Total. 2004/2005. Total. 400 400 400. 384 300 516. 784 700 916. 1200. 1200. 2400. a. Determine if the response to the survey question was independent of the year in which the question was asked. Use a significance level of 0.05. b. Determine if there is a significant difference between the proportion of travelers who say they have reduced hours spent away from home between the 2003/2004 and the 2004/2005 years.. Computer Database Exercises Respondent Parent. A. B. C. A B C. 240 180 180. 80 120 80. 20 40 40. Test the hypothesis that the respondents’ borrowing habits are independent from what they believe their parents’ attitudes to be. Let a 0.01. 13-27. The California Lettuce Research board was originally formed as the Iceberg Lettuce Advisory Board in 1973. The primary function of the board is to fund research on iceberg and leaf lettuce. A recent project involved studying the effect of varying levels of sodium absorption ratios (SAR) on the yield of head lettuce. The measurements (the number of lettuce heads from each plot) of the kind observed were as follows: Lettuce Type SAR. Salinas. Sniper. 3 5 7 10. 104 160 142 133. 109 163 146 156. a. Determine if the number of lettuce heads harvested for the two lettuce types is independent of the levels of sodium absorption ratios (SAR). Use a significance level of 0.025 and a p-value approach. b. Which type of lettuce would you recommend?. 13-29. Daniel Vinson of the University of Missouri–Columbia led a team of researchers investigating the increased risk when people are angry of serious injuries in the workplace requiring emergency medical care. The file entitled Angry contains the data collected by the team of researchers. It displays the emotions reported by patients just before they were injured. a. Use the data in the file entitled Angry to construct a contingency table. b. Determine if the type of emotion felt by patients just before they were injured is independent of the severity of that emotion. Use a contingency analysis and a significance level of 0.05. 13-30. Gift of the Gauche, a left-handedness information Web site (www.left-handedness.info), provides information concerning left-handed activities, products, and demography. It indicates that about 10%–11% of the population of Europe and North America are lefthanded. It also reports on demographic surveys. It cites an American study in which over one million magazine respondents found that 12.6% of the male respondents were left-handed as were 9.9% of the female respondents, although this was not a random sample. The data obtained by a British survey of over 8,000 randomly selected men and women, published in Issue 37 of The Graphologist in 1992, is furnished in a file entitled Lefties. Based on this data, determine if the “handedness” of an individual is independent of gender. Use a significance level of 0.01 and a p-value approach. 13-31. The Marriott Company owns and operates a number of different hotel chains, including the Courtyard chain. Recently, a survey was mailed to a random sample of.

(109) 572. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. 400 Courtyard customers. A total of 62 customers responded to the survey. The customers were asked a number of questions related to their satisfaction with the chain as well as several demographic questions. Among the issues of interest was whether there is a relationship between the likelihood that customers will stay at the chain again and whether or not this was the customer’s first stay at the Courtyard. The following contingency table has been developed from the data set contained in the file called CourtyardSurvey: First Stay Stay Again?. Yes. No. Total. Definitely Will Probably Will Maybe Probably Not. 9 18 15 2. 12 2 3 1. 21 20 18 3. Total. 44. 18. 62. Using a significance level equal to 0.05, test to see whether these sample data imply a relationship between the two variables. Discuss the results.. 13-32. ECCO (Electronic Controls Company) makes backup alarms that are used on such equipment as forklifts and delivery trucks. The quality manager recently performed a study involving a random sample of 110 warranty claims. One of the questions the manager wanted to answer was whether there is a relationship between the type of warranty complaint and the plant at which the alarm was made. The data are in the file called ECCO. a. Calculate the expected values for the cells in this analysis. Suggest a way in which cells can be combined to assure that the expected value of each cell is at least 5 so that as many level combinations of the two variables as possible are retained. b. Using a significance level of 0.01, conduct a relevant hypothesis test and provide an answer to the manager’s question. 13-33. Referring to Problem 13-32, can the quality control manager conclude that the type of warranty problem is independent of the shift on which the alarm was manufactured? Test using a significance level of 0.05. Discuss your results.. END EXERCISES 13-2.

(110) CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. Visual Summary Chapter 13: Many of the statistical procedures introduced in earlier chapters require that the sample data come from populations that are normally distributed and that the data be measured at least at the interval level. However, you will encounter many situations in which these specifications are not met. This chapter introduces two sets of procedures that are used to address each of these issues in turn. Goodness-of–fit procedures are used to determine if sample data has been drawn from a specific hypothesized distribution. Contingency analysis is an often used technique to determine the relationship between qualitative variables. Though these procedures are not used as much as those requiring normal population, you will discover that far too few of the procedures presented in this chapter are used when they should be. It is, therefore, important that you learn when and how to use the procedures presented in this chapter.. 13.1 Introduction to Goodness-of-Fit Tests (pg. 548 – 562) Summary The chi-square goodness-of-fit test can be used to determine if a set of data comes from a specific hypothesized distribution. Recall that several of the procedures presented in Chapters 8-12 require that the sampled populations are normally distributed. For example, tests involving the t-distribution are based on such a requirement. In order to verify this requirement, the goodness-of-fit test determines if the observed set of values agree with a set of data obtained from a specified probability distribution. Perhaps the goodness-of-fit test is most often used to verify a normal distribution. However, it can be used to detect many other probability distributions. Outcome 1. Utilize the chi-square goodness-of-fit test to determine whether data from a process fit a specified distribution. 13.2 Introduction to Contingency Analysis (pg. 562– 572) Summary You will encounter many business situations in which the level of data measurement for the variable of interest is either nominal or ordinal, not interval or ratio. In Chapters 9 and 10 you were introduced to hypothesis tests involving one and two population proportions. However, you will also encounter many situations involving multiple population proportions for which two population procedures are not applicable. In each of these cases, the proportions relate to characteristic categories of the variable of interest. These situations involving categorical data call for a new statistical tool known as contingency analysis to help make decisions when multiple proportions are involved. Contingency analysis can be used when a level of data measurement is either nominal or ordinal and the values are determined by counting the number of occurrences in each category. Outcome 2. Set up a contingency table analysis and perform a chi-square test of independence. Conclusion This chapter has introduced two very useful statistical tools: goodness-of-fit tests and contingency analysis. Goodness-of-fit testing is used when a decision maker wishes to determine whether sample data come from a population having specific characteristics. The chi-square goodness-of-fit procedure that was introduced in this chapter addresses this issue. This test relies on the idea that if the distribution of the sample data is substantially different from the hypothesized population distribution, then the population distribution from which these sample data came must not be what was hypothesized. Contingency analysis is a frequently used statistical tool that allows the decision maker to test whether responses to two variables are independent. Market researchers, for example, use contingency analysis to determine whether attitude about the quality of their company’s product is independent of the gender of a customer. By using contingency analysis and the chi-square contingency test, they can make this determination based on a sample of customers.. 573.

(111) 574. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. Equations (13.1) Chi-Square Goodness-of-Fit Test Statistic pg. 550. (13.3) Expected Cell Frequencies pg. 567. k. 2 . (oi ei )2 ei i1. ∑. eij . (i th row total) ( j th column total) totaal sample size. (13.2) Chi-Square Contingency Test Statistic pg. 564 r. 2 . c. ∑∑ i1 j1. (oij eij )2 eij. with df (r 1)(c 1). Key Term Contingency table pg. 563. Chapter Exercises Conceptual Questions 13-34. Locate a journal article that uses either contingency analysis or a goodness-of-fit test. Discuss the article, paying particular attention to the reasoning behind using the particular statistical test. 13-35. Find a marketing research book (or borrow one from a friend). Does it discuss either of the tests considered in this chapter? If yes, outline the discussion. If no, determine where in the text such a discussion would be appropriate. 13-36. One of the topics in Chapter 10 was hypothesis testing for the difference between two population proportions. For the test to have validity, there were conditions set on the sample sizes with respect to the sample proportions. A 2 2 contingency table may also be utilized to test the difference between proportions of two independent populations. This procedure has conditions placed on the expected value of each cell. Discuss the relationship between these two conditions. 13-37. A 2 2 contingency table and a hypothesis test of the difference between two population proportions can be used to analyze the same data set. However, besides all the similarities of the two methods, the hypothesis test of the difference between two proportions has two advantages. Identify these advantages.. Business Applications 13-38. The College Bookstore has just hired a new manager, one with a business background. Claudia Markman has been charged with increasing the profitability of the bookstore, with the profits going to the general. MyStatLab scholarship fund. Claudia started her job just before the beginning of the semester and was analyzing the sales during the days when students are buying their textbooks. The store has four checkout stands, and Claudia noticed registers three and four served more students than registers one and two. She is not sure whether the layout of the store channels customers into these registers, whether the checkout clerks in these lines are simply slower than the other two, or whether she was just seeing random differences. Claudia kept a record of which stands the next 1,000 students chose for checkout. The students checked out of the four stands according to the following pattern: Stand 1. Stand 2. Stand 3. Stand 4. 338. 275. 201. 186. a. Based on these data, can Claudia conclude the proportion of students using the four checkout stands is equal? (Use an a 0.05.) b. A friend suggested that you could just as well conduct four hypothesis tests that the proportion of customers visiting each stand is equal to p 0.25. Discuss the merits of this suggestion. 13-39. A regional cancer treatment center has had success treating localized cancers with a linear accelerator. Whereas admissions for further treatment nationally average 2.1 per patient per year, the center’s director thinks that re-admissions with the new treatment are Poisson distributed, with a mean of 1.2 patients.

(112) CHAPTER 13. per year. He has collected the following data on a random sample of 300 patients: Re-admissions Last Year. Patients. 0 1 2 3 4 5 6 7 8. 139 87 48 14 8 1 1 0 2 300. a. Adjust the data so that you can test the director’s claim using a test statistic whose sampling distribution can be approximated by a chi-square distribution. b. Assume the Type I error rate is to be controlled at 0.05. Do you agree with the director’s claim? Why? Conduct a statistical procedure to support your opinion. 13-40. Cooper Manufacturing, Inc., of Dallas, Texas, has a contract with the U.S. Air Force to produce a part for a new fighter plane being manufactured. The part is a bolt that has specifications requiring that the length be normally distributed with a mean of 3.05 inches and a standard deviation of 0.015 inch. As part of the company’s quality control efforts, each day Cooper’s engineers select a random sample of 100 bolts produced that day and carefully measure the bolts to determine whether the production is within specifications. The following data were collected yesterday: Length (inches) Under 3.030 3.030 and under 3.035 3.035 and under 3.040 3.040 and under 3.050 3.050 and under 3.060 3.060 and under 3.065 3.065 and over. Frequency 5 16 7 20 36 8 8. Based on these sample data, what should Cooper’s engineers conclude about the production output if they test using an a 0.01? Discuss. 13-41. The Cooper Company discussed in Problem 13-40 has a second contract with a private firm for which it makes fuses for an electronic instrument. The quality control department at Cooper periodically selects a random sample of five fuses and tests each fuse to. |. Goodness-of-Fit Tests and Contingency Analysis. 575. determine whether it is defective. Based on these findings, the production process is either shut down (if too many defectives are observed) or allowed to run. The quality control department believes that the sampling process follows a binomial distribution, and it has been using the binomial distribution to compute the probabilities associated with the sampling outcomes. The contract allows for at most 5% defectives. The head of quality control recently compiled a list of the sampling results for the past 300 days in which five randomly selected fuses were tested, with the following frequency distribution for the number of defectives observed. She is concerned that the binomial distribution with a sample size of 5 and a probability of defectives of 0.05 may not be appropriate. Number of Defectives. Frequency. 0 1 2 3 4 5. 209 33 43 10 5 0. a. Calculate the expected values for the cells in this analysis. Suggest a way in which cells can be combined to assure that the expected value of each cell is at least 5. b. Using a significance level of 0.10, what should the quality control manager conclude based on these sample data? Discuss. 13-42. A survey performed by Simmons Market Research investigated the percentage of individuals in various age groups who indicated they were willing to pay more for environmentally friendly products. The results were presented in USA Today “Snapshots” (July 21, 2005). The survey had approximately 3,240 respondents in each age group. Results of the survey follow: Age Group 18–24 25–34 35–44 45–54 55–64 65 and over 11 17 19 20 14 19 Percentage. Conduct a goodness-of-fit test analysis to determine if the proportions of individuals willing to pay more for environmentally friendly products in the various age groups are equal. Use a significance level of 0.01. 13-43. An article published in USA Today asserts that many children are abandoning outdoor for indoor activities. The National Sporting Goods Association annual survey for 2004 (the latest data available) compared activity levels in 1995 versus 2004. A random selection.

(113) 576. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. of children (7- to 11-year-olds) indicating their favorite outdoor activity is given in the following table:. 1995 2004. Bicycling. Swimming. Baseball. Fishing. Touch Football. 68 47. 60 42. 29 22. 25 18. 16 10. Construct a contingency analysis to determine if the type of preferred outdoor activity is dependent on the year in this survey. Use a significance level of 0.05 and the p-value approach.. Computer Database Exercises 13-44. With the economic downturn that started in late 2007, many people started worrying about retirement and whether they even would be able to retire. A study recently done by the Employee Benefit Research Institute (EBRI) found about 69% of workers said they and/or their spouses have saved for retirement. The file entitled Retirement contains the total savings and investments indicated. Use a contingency analysis to determine if the amount of total savings and investments is dependent on the age of the worker. 13-45. The airport manager at the Sacramento, California, airport recently conducted a study of passengers departing from the airport. A random sample of 100 passengers was selected. The data are in the file called Airline Passengers. An earlier study showed the following usage by airline: Delta Horizon Northwest Skywest Southwest United. 20% 10% 10% 3% 25% 32%. a. If the manager wishes to determine whether the airline usage pattern has changed from that reported in the earlier study, state the appropriate null and alternative hypotheses. b. Based on the sample data, what should be concluded? Test using a significance level of 0.01. 13-46. A pharmaceutical company is planning to market a drug that is supposed to help reduce blood pressure. The company claims that if the drug is taken properly,. the amount of blood pressure decrease will be normally distributed with a mean equal to 10 points on the diastolic reading and a standard deviation equal to 4.0. One hundred patients were administered the drug, and data were collected showing the reduction in blood pressure at the end of the test period. The data are located in the file labeled Blood Pressure. a. Using a goodness-of-fit test and a significance level equal to 0.05, what conclusion should be reached with respect to the distribution of diastolic blood pressure reduction? Discuss. b. Conduct a hypothesis test to determine if the standard deviation for this population could be considered to be 4.0. Use a significance level of 0.10. c. Given the results of the two tests in parts a and b, is it appropriate to construct a confidence interval based on a normal distribution with a population standard deviation of 4.0? Explain your answer. d. If appropriate, construct a 99% confidence interval for the mean reduction in blood pressure. Based on this confidence interval, does an average diastolic loss of 10 seem reasonable for this procedure? Explain your reasoning. 13-47. An Ariel Capital Management and Charles Schwab survey addressed the proportion of African-Americans and White Americans who have money invested in the stock market. Suppose the file entitled Stockrace contains data obtained in the surveys. The survey asked 500 African-American and 500 White respondents if they personally had money invested in the stock market. a. Create a contingency table using the data in the file Stockrace. b. Conduct a contingency analysis to determine if the proportion of African-Americans differs from the proportion of White Americans who invest in stocks. Use a significance level of 0.05. 13-48. The state transportation department recently conducted a study of motorists in Idaho. Two main factors of interest were whether the vehicle was insured with liability insurance and whether the driver was wearing a seat belt. A random sample of 100 cars was stopped at various locations throughout the state. The data are in the file called Liabins. The investigators were interested in determining whether seat belt status is independent of insurance status. Conduct the appropriate hypothesis test using a 0.05 level of significance and discuss your results..

(114) CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. 577. Case 13.1 American Oil Company Chad Williams sat back in his airline seat to enjoy the hour-long flight between Los Angeles and Oakland, California. The hour would give him time to reflect on his upcoming trip to Australia and the work he had been doing the past week in Los Angeles. Chad is one man on a six-man crew employed by the American Oil Company to literally walk the earth searching for oil. His college degrees in geology and petroleum engineering landed him the job with American, but he never dreamed he would be doing the exciting work he now does. Chad and his crew spend several months in special locations around the world using highly sensitive electronic equipment for oil exploration. The upcoming trip to Australia is one that Chad has been looking forward to since it was announced that his crew would be going there to search the Outback for oil. In preparation for the trip, the crew has been in Los Angeles at American’s engineering research facility working on some new equipment that will be used in Australia. Chad’s thoughts centered on the problem he was having with a particular component part on the new equipment. The specifications called for 200 of the components, with each having a diameter of between 0.15 and 0.18 inch. The only available supplier of the component manufactures the components in New Jersey to specifications calling for normally distributed output, with a mean of 0.16 inches and a standard deviation of 0.02 inches. Chad faces two problems. First, he is unsure that the supplier actually does produce parts with means of 0.16 inches and standard deviations of 0.02 inches according to a normal distribution. Second, if the parts are made to specifications, he needs to determine how many components to purchase if enough acceptable components are to be received to make two oil exploration devices. The supplier has sent Chad the following data for 330 randomly selected components. Chad believes that the supplier is honest and that he can rely on the data.. Diameter (Inch) Under 0.14 0.14 and under 0.15 0.15 and under 0.16 0.16 and under 0.17 0.17 and under 0.18 Over 0.18 Total. Frequency 5 70 90 105 50 10 330. Chad needs to have a report ready for Monday indicating whether he believes the supplier delivers at its stated specifications and, if so, how many of the components American should order to have enough acceptable components to outfit two oil exploration devices.. Required Tasks: 1. State the problems faced by Chad Williams. 2. Identify the statistical test Chad Williams can use to determine whether the supplier’s claim is true or not. 3. State the null and alternative hypotheses for the test to determine whether the supplier’s claim is true or not. 4. Assuming that the supplier produces output whose diameter is normally distributed with a mean of 0.16 inches and a standard deviation of 0.02 inches, determine the expected frequencies that Chad would expect to see in a sample of 330 components. 5. Based on the observed and expected frequencies, calculate the appropriate test statistic. 6. Calculate the critical value of the test statistic. Select an alpha value. 7. State a conclusion. Is the supplier’s claim with respect to specifications of the component parts supported by the sample data? 8. Provide a short report that summarizes your analysis and conclusion.. Case 13.2 Bentford Electronics—Part 1 On Saturday morning, Jennifer Bentford received a call at her home from the production supervisor at Bentford Electronics Plant 1. The supervisor indicated that she and the supervisors from Plants 2, 3, and 4 had agreed that something must be done to improve company morale and thereby increase the production output of their plants. Jennifer Bentford, president of Bentford Electronics, agreed to set up a Monday morning meeting with the. supervisors to see if they could arrive at a plan for accomplishing these objectives. By Monday each supervisor had compiled a list of several ideas, including a four-day work week and interplant competitions of various kinds. A second meeting was set for Wednesday to discuss the issue further. Following the Wednesday afternoon meeting, Jennifer Bentford and her plant supervisors agreed to implement a weekly contest called the NBE Game of the Week. The plant producing the.

(115) 578. CHAPTER 13. |. Goodness-of-Fit Tests and Contingency Analysis. most each week would be considered the NBE Game of the Week winner and would receive 10 points. The second-place plant would receive 7 points, and the third- and fourth-place plants would receive 3 points and 1 point, respectively. The contest would last 26 weeks. At the end of that period, a $200,000 bonus would be divided among the employees in the four plants proportional to the total points accumulated by each plant. The announcement of the contest created a lot of excitement and enthusiasm at the four plants. No one complained about the rules because the four plants were designed and staffed to produce equally. At the close of the contest, Jennifer Bentford called the supervisors into a meeting, at which time she asked for data to determine whether the contest had significantly improved productivity. She indicated that she had to know this before she could authorize a second contest. The supervisors, expecting this request, had put together the following data:. Units Produced (4 Plants Combined) 0–2,500 2,501–8,000 8,001–15,000 15,001–20,000. Before-Contest Frequency. During-Contest Frequency. 11 23 56 15 105 days. 0 20 83 52 155 days. Jennifer examined the data and indicated that the contest looked to be a success, but she wanted to base her decision to continue the contest on more than just an observation of the data. “Surely there must be some way to statistically test the worthiness of this contest,” Jennifer stated. “I have to see the results before I will authorize the second contest.”. References Berenson, Mark L., and David M. Levine, Basic Business Statistics: Concepts and Applications, 11th ed. (Upper Saddle River, NJ: Prentice Hall, 2009). Conover, W. J., Practical Nonparametric Statistics, 3rd ed. (New York City: Wiley, 1999). Higgins, James J., Introduction to Modern Nonparametric Statistics, 1st ed. (Pacific Grove, CA: Duxbury, 2004). Marascuilo, Leonard, and M. McSweeney, Nonparametric and Distribution Free Methods for the Social Sciences (Monterey, CA: Brooks/Cole, 1977). Microsoft Excel 2007 (Redmond, WA: Microsoft Corp., 2007). Minitab for Windows Version 15 (State College, PA: Minitab, Inc., 2007)..

(116) • Review the methods for testing a null hypothesis using the t-distribution in Chapter 9. • Review confidence intervals discussed in Chapter 8.. scatter plots in Chapter 2.. • Review the concepts associated with select-. for finding critical values from the F-table as discussed in Chapters 11 and 12.. ing a simple random sample in Chapter 1.. chapter 14. • Make sure you review the discussion about • Review the F-distribution and the approach. Chapter 14 Quick Prep Links. Introduction to Linear Regression and Correlation Analysis 14.1 Scatter Plots and Correlation (pg. 580–589). Outcome 1. Calculate and interpret the correlation between two variables. Outcome 2. Determine whether the correlation is significant.. 14.2 Simple Linear Regression Analysis (pg. 589–612). Outcome 3. Calculate the simple linear regression equation for a set of data and know the basic assumptions behind regression analysis. Outcome 4. Determine whether a regression model is significant.. 14.3 Uses for Regression Analysis (pg. 612–623). Outcome 5. Recognize regression analysis applications for purposes of description and prediction. Outcome 6. Calculate and interpret confidence intervals for the regression analysis. Outcome 7. Recognize some potential problems if regression analysis is used incorrectly.. Why you need to know Although some business situations involve only one variable, others require decision makers to consider the relationship between two or more variables. For example, a financial manager might be interested in the relationship between stock prices and the dividends issued by a publicly traded company. A marketing manager would be interested in examining the relationship between product sales and the amount of money spent on advertising. Finally, consider a loan manager at a bank who is interested in determining the fair market value of a home or business. She would begin by collecting data on a sample of comparable properties that have sold recently. In addition to the selling price, she would collect data on other factors, such as the size and age of the property. She might then analyze the relationship between the price and the other variables and use this relationship to determine an appraised price for the property in question. Simple linear regression and correlation analysis, which are introduced in this chapter, are statistical techniques the broker, marketing director, and appraiser will need in their analyses. These techniques are two of the most often applied statistical procedures used by business decision makers for analyzing the relationship between two variables. In Chapter 15, we will extend the discussion to include three or more variables.. 579.

(117) 580. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 14.1 Scatter Plots and Correlation Scatter Plot A two-dimensional plot showing the values for the joint occurrence of two quantitative variables. The scatter plot may be used to graphically represent the relationship between two variables. It is also known as a scatter diagram.. Decision-making situations that call for understanding the relationship between two quantitative variables are aided by the use of scatter plots, or scatter diagrams. Figure 14.1 shows scatter plots that depict several potential relationships between values of a dependent variable, y, and an independent variable, x. A dependent (or response) variable is the variable whose variation we wish to explain. An independent (or explanatory) variable is a variable used to explain variation in the dependent variable. In Figure 14.1, (a) and (b) are examples of strong linear (or straight line) relationships between x and y. Note that the linear relationship can be either positive (as the x variable increases, the y variable also increases) or negative (as the x variable increases, the y variable decreases). Figures 14.1 (c) and (d) illustrate situations in which the relationship between the x and y variable is nonlinear. There are many possible nonlinear relationships that can occur. The scatter plot is very useful for visually identifying the nature of the relationship. Figures 14.1 (e) and (f) show examples in which there is no identifiable relationship between the two variables. This means that as x increases, y sometimes increases and sometimes decreases but with no particular pattern.. The Correlation Coefficient Correlation Coefficient A quantitative measure of the strength of the linear relationship between two variables. The correlation ranges from 1.0 to 1.0. A correlation of 1.0 indicates a perfect linear relationship, whereas a correlation of 0 indicates no linear relationship.. In addition to analyzing the relationship between two variables graphically, we can also measure the strength of the linear relationship between two variables using a measure called the correlation coefficient. The correlation coefficient of two variables can be estimated from sample data using Equation 14.1 or the algebraic equivalent, Equation 14.2.. Sample Correlation Coefficient r. Chapter Outcome 1.. FIGURE 14.1. |. ∑ (x x )(y y ). (14.1). [∑ (x x )2 ][∑ (y y )2 ]. Two-Variable Relationships (a) Linear. (b) Linear. y. (c) Curvilinear. y. y. x. x. x. (e) No Relationship. (d) Curvilinear y. (f) No Relationship. y. x. y. x. x.

(118) CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 581. or the algebraic equivalent: r. n ∑ xy ∑ x ∑ y. (14.2). [n(∑ x 2) (∑ x)2 ][n(∑ y 2) (∑ y)2 ]. where: r Sample correlation coefficient n Sample size x Value of the independent variable y Value of the dependent variable The sample correlation coefficient computed using Equations 14.1 and 14.2 is called the Pearson Product Moment Correlation. The sample correlation coefficient, r, can range from a perfect positive correlation, 1.0, to a perfect negative correlation, 1.0. A perfect correlation is one in which all points on the scatter plot fall on a straight line. If two variables have no linear relationship, the correlation between them is 0 and there is no linear relationship between the change in x and y. Consequently, the more the correlation differs from 0.0, the stronger the linear relationship between the two variables. The sign of the correlation coefficient indicates the direction of the relationship. Figure 14.2 illustrates some examples of correlation between two variables. Once again, for the correlation coefficient to equal plus or minus 1.0, all the (x, y) points form a perfectly straight line. The more the points depart from a straight line, the weaker (closer to 0.0) the correlation is between the two variables. BUSINESS APPLICATION. Excel and Minitab. tutorials. Excel and Minitab Tutorial FIGURE 14.2. |. TESTING FOR SIGNIFICANT CORRELATIONS. MIDWEST DISTRIBUTION COMPANY Consider the application involving Midwest Distribution, which supplies soft drinks and snack foods to convenience stores in Michigan, Illinois, and Iowa. Although Midwest Distribution has been profitable, the director of marketing has been concerned about the rapid turnover in her salesforce. In the course of exit interviews, she discovered a major concern with the compensation structure. Midwest Distribution has a two-part wage structure: a base salary and a commission computed on monthly sales. Typically, about half of the total wages paid comes from the base. Correlation between Two Variables (a) r = +1. r = +0.7. (b) r = –1. y. y. y. x. x. x. r =0. r = –0.55 y. r =0. y. x. y. x. x.

(119) 582. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. salary, which increases with longevity with the company. This portion of the wage structure is not an issue. The concern expressed by departing employees is that new employees tend to be given parts of the sales territory previously covered by existing employees and are assigned prime customers as a recruiting inducement. At issue, then, is the relationship between sales (on which commissions are paid) and number of years with the company. The data for a random sample of 12 sales representatives are in the file called Midwest. The first step is to develop a scatter plot of the data. Both Excel and Minitab have procedures for constructing a scatter plot and computing the correlation coefficient. The scatter plot for the Midwest data is shown in Figure 14.3. Based on this plot, total sales and years with the company appear to be linearly related. However, the strength of this relationship is uncertain. That is, how close do the points come to being on a straight line? To answer this question, we need a quantitative measure of the strength of the linear relationship between the two variables. That measure is the correlation coefficient. Equation 14.1 is used to determine the correlation between sales and years with the company. Table 14.1 shows the manual calculations that were used to determine this correlation coefficient of 0.8325. However, because the calculations are rather tedious and long, we almost always use computer software to perform the computation, as shown in Figure 14.4. The r 0.8325 indicates that there is a fairly strong, positive correlation between these two variables for the sample data. Significance Test for the Correlation Although a correlation coefficient of 0.8325 seems quite large (relative to 0), you should remember that this value is based on a sample of 12 data points and is subject to sampling error. Therefore, a formal hypothesis-testing. | Excel 2007 Scatter Plot of Sales vs. Years with Midwest Distribution. FIGURE 14.3. Excel 2007 Instructions:. 1. Open file: Midwest.xls. 2. Move the Sales column to the right of Years with midwest column. 3. Select data for chart. 4. On Insert tab, click XY (Scatter), and then click the Scatter with only Markers Option. 5. Use the Layout tab of the Chart Tools to add titles and remove grid lines. 6. Use the Design tab of the Chart Tools to move the chart to a new worksheet. Minitab Instructions (for similar results):. 1. Open file: Midwest.MTW. 2. Choose Graph Scatterplot. 3. Under Scatterplot, choose Simple OK.. 4. Under Y variable, enter y column. 5. In X variable, enter x column. 6. Click OK..

(120) CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 583. | Correlation Coefficient Calculations for the Midwest Distribution Example TABLE 14.1. Sales. Years. y. x. _ x2x. _ y2y. 487 445 272 641 187 440 346 238 312 269 655 563. 3 5 2 8 2 6 7 1 4 2 9 6. 1.58 0.42 2.58 3.42 2.58 1.42 2.42 3.58 0.58 2.58 4.42 1.42. 82.42 40.42 132.58 236.42 217.58 35.42 58.58 166.58 92.58 135.58 250.42 158.42. 4,855. 55. y. Σy n. . 4, 855 12. _ ( x 2 x )2. _ ( y 2 y )2. 130.22 16.98 342.06 808.56 561.36 50.30 141.76 596.36 53.70 349.80 1,106.86 224.96. 2.50 0.18 6.66 11.70 6.66 2.02 5.86 12.82 0.34 6.66 19.54 2.02. 6,793.06 1,633.78 17,577.46 55,894.42 47,341.06 1,254.58 3,431.62 27,748.90 8,571.06 18,381.94 62,710.18 25,096.90. 3,838.92. 76.92. 276,434.92. _ _ ( x 2 x ) ( y 2 y). 404.58. x. Σx n. . 55 12. 4.58. Using Equation 14.1, r. Σ ( x − x ) ( y − y) Σ ( x − x ) 2 Σ ( y − y) 2. . 3, 838.92 (76.92) (276, 434.92). 0.8325. | Excel 2007 Correlation Output for Midwest Distribution. FIGURE 14.4. Excel 2007 Instructions:. 1. Open file: Midwest.xls. 2. On the Data tab, click Data Analysis. 3. Select Correlation. 4. Define Data Range. 5. Click on Labels in First Row. 6. Specify output location. 7. Click OK.. Minitab Instructions (for similar results):. 1. Open file: Midwest.MTW. 2. Choose Stat Basic Statistics Correlation.. 3. In Variables, enter Y and X columns. 4. Click OK..

(121) 584. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. procedure is needed to determine whether the linear relationship between sales and years with the company is significant. The null and alternative hypotheses to be tested are H 0: r 0. (no correlation). HA: r 0. (correlation exists). where the Greek symbol r (rho) represents the population correlation coefficient. We must test whether the sample data support or refute the null hypothesis. The test procedure utilizes the t-test statistic in Equation 14.3. Chapter Outcome 2.. Test Statistic for Correlation t. r 1 r 2 n2. df n 2. (14.3). where: t Number of standard errors r is from 0 r Sample correlation coefficient n Sample size. The degrees of freedom for this test are n 2, because we lose 1 degree of freedom for each of the – that are used to estimate the population means for the two variables. two sample means ( x– and y) Figure 14.5 shows the hypothesis test for the Midwest Distribution example using an alpha level of 0.05. Recall that the sample correlation coefficient was r 0.8325. Based on these sample data, we should conclude there is a significant, positive linear relationship in the population between years of experience and total sales for Midwest Distribution sales representatives.. FIGURE 14.5. |. Correlation Significance Test for the Midwest Distribution Example. Hypothesis: H0 : 0 (no correlation) HA: 0 a 0.05 df n 2 10 Rejection Region /2 0.025. t0.025 2.228. Rejection Region a/2 0.025. t0.025 2.228. The calculated t-value is 0.8325 r t 1r2 10.6931 10 n2 4.752 Decision Rule: If t t0.025 2.228, reject H0. If t t0.025 2.228, reject H0. Otherwise, do not reject H0. Because 4.752 2.228, reject H0. Based on the sample evidence, we conclude there is a significant positive linear relationship between years with the company and sales volume..

(122) CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 585. The implication is that the more years an employee has been with the company, the more sales that employee generates. This runs counter to the claims made by some of the departing employees. The manager will probably want to look further into the situation to see whether a problem might exist in certain regions. The t-test for determining whether the population correlation is significantly different from 0 requires the following assumptions: Assumptions. 1. The data are interval or ratio-level. 2. The two variables (y and x) are distributed as a bivariate normal distribution. Although the formal mathematical representation is beyond the scope of this text, two variables are bivariate normal if their joint distribution is normally distributed. Although the t-test assumes a bivariate normal distribution, it is robust—that is, correct inferences can be reached even with slight departures from the normal-distribution assumption. (See Kutner et al., Applied Linear Statistical Models, for further discussion of bivariate normal distributions.) EXAMPLE 14-1. CORRELATION ANALYSIS. Stock Portfolio Analysis A student intern at the investment firm of McMillan & Associates was given the assignment of determining whether there is a positive correlation between the number of individual stocks in a client’s portfolio (x) and the annual rate of return (y) for the portfolio. The intern selected a simple random sample of 12 client portfolios and determined the number of individual company stocks and the annual rate of return earned by the client on his or her portfolio. To determine whether there is a statistically significant positive correlation between the two variables, the following steps can be employed: Step 1 Specify the population parameter of interest. The intern wishes to determine whether the number of stocks is positively correlated with the rate of return earned by the client. The parameter of interest is, therefore, the population correlation, r. Step 2 Formulate the appropriate null and alternative hypotheses. Because the intern was asked to determine whether a positive correlation exists between the variables of interest, the hypothesis test will be one-tailed, as follows: H0: r 0 HA: r 0 Step 3 Specify the level of significance. A significance level of 0.05 is chosen. Step 4 Compute the correlation coefficient and the test statistic. Compute the sample correlation coefficient using Equation 14.1 or 14.2, or by using software such as Excel or Minitab. The following sample data were obtained: Number of Stocks. Rate of Return. 9 16 25 16 20 16 20 20 16 9. 0.13 0.16 0.21 0.18 0.18 0.19 0.15 0.17 0.13 0.11.

(123) 586. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. Using Equation 14.1, we get r. ∑ (x x )(y y ) [∑ (x x )2 ][∑ ( y y )2 ]. 0.7796. Compute the t-test statistic using Equation 14.3: t. r 1 r 2 n2. . 0.7796 1 0.77962 10 2. 3.52. Step 5 Construct the rejection region and decision rule. For an alpha level equal to 0.05, the one-tailed, upper-tail, critical value for n 2 10 2 8 degrees of freedom is t0.05 1.8595. The decision rule is If t 1.8595, reject the null hypothesis. Otherwise, do not reject the null hypothesis. Step 6 Reach a decision. Because t 3.52 1.8595, reject the null hypothesis. Step 7 Draw a conclusion. Because the null hypothesis is rejected, the sample data do support the contention that there is a positive linear relationship between the number of individual stocks in a client’s portfolio and the portfolio’s rate of return. >>END EXAMPLE. TRY PROBLEM 14-3 (pg. 587). Cause-and-Effect Interpretations Care must be used when interpreting the correlation results. For example, even though we found a significant linear relationship between years of experience and sales for the Midwest Distribution sales force, the correlation does not imply cause and effect. Although an increase in experience may, in fact, cause sales to change, simply because the two variables are correlated does not guarantee a cause-and-effect situation. Two seemingly unconnected variables may be highly correlated. For example, over a period of time, teachers’ salaries in North Dakota might be highly correlated with the price of grapes in Spain. Yet, we doubt that a change in grape prices will cause a corresponding change in salaries for teachers in North Dakota, or vice versa. When a correlation exists between two seemingly unrelated variables, the correlation is said to be a spurious correlation. You should take great care to avoid basing conclusions on spurious correlations. The Midwest Distribution marketing director has a logical reason to believe that years of experience with the company and total sales are related. That is, sales theory and customer feedback hold that product knowledge is a major component in successfully marketing a product. However, a statistically significant correlation alone does not prove that this causeand-effect relationship exists. When two seemingly unrelated variables are correlated, they may both be responding to changes in some third variable. For example, the observed correlation could be the effect of a company policy of giving better sales territories to more senior salespeople..

(124) CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 587. MyStatLab. 14-1: Exercises Skill Development 14-1. An industry study was recently conducted in which the sample correlation between units sold and marketing expenses was 0.57. The sample size for the study included 15 companies. Based on the sample results, test to determine whether there is a significant positive correlation between these two variables. Use an a 0.05. 14-2. The following data for the dependent variable, y, and the independent variable, x, have been collected using simple random sampling: x. y. 10 14 16 12 20 18 16 14 16 18. 120 130 170 150 200 180 190 150 160 200. a. Construct a scatter plot for these data. Based on the scatter plot, how would you describe the relationship between the two variables? b. Compute the correlation coefficient. 14-3. A random sample of the following two variables was obtained: x. 29. 48. 28. 22. 28. 42. 33. 26. 48. 44. y. 16. 46. 34. 26. 49. 11. 41. 13. 47. 16. a. Calculate the correlation between these two variables. b. Conduct a test of hypothesis to determine if there exists a correlation between the two variables in the population. Use a significance level of 0.10. 14-4. A random sample of two variables, x and y, produced the following observations: x. y. 19 13 17 9 12 25 20 17. 7 9 8 11 9 6 7 8. a. Develop a scatter plot for the two variables and describe what relationship, if any, exists. b. Compute the correlation coefficient for these sample data. c. Test to determine whether the population correlation coefficient is negative. Use a significance level of 0.05 for the hypothesis test. 14-5. You are given the following data for variables x and y: x. y. 3.0 2.0 2.5 3.0 2.5 4.0 1.5 1.0 2.0 2.5. 1.5 0.5 1.0 1.8 1.2 2.2 0.4 0.3 1.3 1.0. a. Plot these variables in scatter plot format. Based on this plot, what type of relationship appears to exist between the two variables? b. Compute the correlation coefficient for these sample data. Indicate what the correlation coefficient measures. c. Test to determine whether the population correlation coefficient is positive. Use the 0.01 level to conduct the test. Be sure to state the null and alternative hypotheses and show the test and decision rule clearly. 14-6. For each of the following circumstances, perform the indicated hypothesis tests: a. HA: r 0, r 0.53, and n 30 with a 0.01, using a test-statistic approach. b. HA: r 0, r 0.48, and n 20 with a 0.05, using a p-value approach. c. HA: r 0, r 0.39, and n 45 with a 0.02, using a test-statistic approach. d. HA: r 0, r 0.34, and n 25 with a 0.05, using a test-statistic approach.. Business Applications 14-7. The Federal No Child Left Behind Act requires periodic testing in standard subjects. A random sample of 50 junior high school students from Atlanta was selected, and each student’s scores on a standardized mathematics examination and a standardized English examination were recorded. School administrators were interested in the relation between the two scores. Suppose the correlation coefficient for the two examination scores is 0.75..

(125) 588. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. a. Provide an explanation of the sample correlation coefficient in this context. b. Using a level of significance of a 0.01, test to determine whether there is a positive linear relationship between mathematics scores and English scores for junior high school students in Atlanta. 14-8. Because of the current concern over credit card balances, a bank’s Chief Financial Officer is interested in whether there is a relationship between account balances and the number of times a card is used each month. A random sample of 50 accounts was selected. The account balance and the number of charges during the past month were the two variables recorded. The correlation coefficient for the two variables was 0.23. a. Discuss what the r 0.23 measures. Make sure to frame your discussion in terms of the two variables mentioned here. b. Using an a 0.10 level, test to determine whether there is a significant linear relationship between account balance and the number of card uses during the past month. State the null and alternative hypotheses and show the decision rule. c. Consider the decision you reached in part b. Describe the type of error you could have made in the context of this problem. 14-9. Farmers National Bank issues MasterCard credit cards to its customers. A main factor in determining whether a credit card will be profitable to the bank is the average monthly balance that the customer will maintain on the card that will be subject to finance charges. Bank analysts wish to determine whether there is a relationship between the average monthly credit card balance and the income stated on the original credit card application form. The following sample data have been collected from existing credit card customers: Income. Credit Balance. $43,000 $35,000 $47,000 $55,000 $55,000 $59,000 $28,000 $43,000 $54,000 $36,000 $39,000 $31,000 $30,000 $37,000 $39,000. $345 $1,370 $1,140 $201 $56 $908 $2,345 $104 $0 $1,290 $130 $459 $0 $1,950 $240. a. Indicate which variable is to be the independent variable and which is to be the dependent variable in the bank’s analysis and indicate why.. b. Construct a scatter plot for these data and describe what, if any, relationship appears to exist between these two variables. c. Calculate the correlation coefficient for these two variables and test to determine whether there is a significant correlation at the a 0.05 level. 14-10. Amazon.com has become one of the most successful online merchants. Two measures of its success are sales and net income/loss figures (all figures in $million). These values for the years 1995–2007 are shown as follows. Year. Net Income/Loss. Sales. 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007. 0.3 5.7 27.5 124.5 719.9 1,411.2 567.3 149.1 35.3 588.5 359 190 476. 0.5 15.7 147.7 609.8 1,639.8 2,761.9 3,122.9 3,933 5,263.7 6,921 8,490 10,711 14,835. a. Produce a scatter plot for Amazon’s net income/loss and sales figures for the period 1995 to 2007. Does there appear to be a linear relationship between these two variables? Explain your response. b. Calculate the correlation coefficient between Amazon’s net income/loss and sales figures for the period 1995 to 2007. c. Conduct a hypothesis test to determine if a positive correlation exists between Amazon’s net income/loss and sales figures. Use a significance level of 0.05 and assume that these figures form a random sample. 14-11. Complaints concerning excessive commercials seem to grow as the amount of “clutter,” including commercials and advertisements for other television shows, steadily increases on network and cable television. A recent analysis by Nielsen Monitor-Plus compares the average nonprogram minutes in an hour of prime time for both network and cable television. Data for selected years are shown as follows. Year Network Cable. 1996. 1999. 2001. 2004. 9.88. 14.00. 14.65. 15.80. 12.77. 13.88. 14.50. 14.92. a. Calculate the correlation coefficient for the average nonprogram minutes in an hour of prime time between network and cable television. b. Conduct a hypothesis test to determine if a positive correlation exists between the average nonprogram.

(126) CHAPTER 14. minutes in an hour of prime time between network and cable television. Use a significance level of 0.05 and assume that these figures form a random sample.. Computer Database Exercises 14-12. Platinum Billiards, Inc., is a retailer of billiard supplies based in Jacksonville, Florida. It stands out among billiard suppliers because of the research it does to assure its products are top-notch. One experiment was conducted to measure the speed attained by a cue ball struck by various weighted pool cues. The conjecture is that a light cue generates faster speeds while breaking the balls at the beginning of a game of pool. Anecdotal experience has indicated that a billiard cue weighing less than 19 ounces generates faster speeds. Platinum used a robotic arm to investigate this claim. Its research generated the data given in the file entitled Breakcue. a. To determine if there is a negative relationship between the weight of the pool cue and the speed attained by the cue ball, calculate a correlation coefficient. b. Conduct a test of hypothesis to determine if there is a negative relationship between the weight of the pool cue and the speed attained by the cue ball. Use a significance level of 0.025 and a p-value approach. 14-13. Customers who made online purchases last quarter from an Internet retailer were randomly sampled from the retailer’s database. The dollar value of each customer’s quarterly purchases along with the time the customer spent shopping the company’s online catalog that quarter were recorded. The sample results are contained in the file Online. a. Create a scatter plot of the variables Time (x) and Purchases (y). What relationship, if any, appears to exist between the two variables? b. Compute the correlation coefficient for these sample data. What does the correlation coefficient measure? c. Conduct a hypothesis test to determine if there is a positive relationship between time viewing the retailer’s catalog and dollar amount purchased. Use a level of significance equal to 0.025. Provide a managerial explanation of your results.. |. Introduction to Linear Regression and Correlation Analysis. 589. 14-14. A regional accreditation board for colleges and universities is interested in determining whether a relationship exists between student applicant verbal SAT scores and the in-state tuition costs at the university. Data have been collected on a sample of colleges and universities and are in the data file called Colleges and Universities. a. Develop a scatter plot for these two variables and discuss what, if any, relationship you see between the two variables based on the scatter plot. b. Compute the sample correlation coefficient. c. Based on the correlation coefficient computed in part b, test to determine whether the population correlation coefficient is positive for these two variables. That is, can we expect schools that charge higher in-state tuition will attract students with higher average verbal SAT scores? Test using a 0.05 significance level. 14-15. As the number of air travelers with time on their hands increases, logic would indicate spending on retail purchases in airports would increase as well. A study by Airport Revenue News addressed the per person spending at select airports for merchandise, excluding food, gifts, and news items. A file entitled Revenues contains sample data selected from airport retailers in 2001 and again in 2004. a. Produce a scatter plot for the per person spending at selected airports for merchandise, excluding food, gifts, and news items, for the years 2001 and 2004. Does there appear to be a linear relationship between spending in 2001 and spending in 2004? Explain your response. b. Calculate the correlation coefficient between the per person spending in 2001 and the per person spending in 2004. Does it appear that an increase in per person spending in 2001 would be associated with an increase in spending in 2004? Support your assertion. c. Conduct a hypothesis test to determine if a positive correlation exists between the per person spending in 2001 and that in 2004. Use a significance level of 0.05 and assume that these figures form a random sample. END EXERCISES 14-1. 14.2 Simple Linear Regression Analysis. Simple Linear Regression The method of regression analysis in which a single independent variable is used to predict the dependent variable.. In the Midwest Distribution application, we determined that the relationship between years of experience and total sales is linear and statistically significant, based on the correlation analysis performed in the previous section. Because hiring and training costs have been increasing, we would like to use this relationship to help formulate a more acceptable wage package for the sales force. The statistical method we will use to analyze the relationship between years of experience and total sales is regression analysis. When we have only two variables—a dependent variable, such as sales, and an independent variable, such as years with the company—the technique is referred to as simple regression analysis. When the relationship between the dependent variable and the independent variable is linear, the technique is simple linear regression..

(127) 590. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. The Regression Model and Assumptions The objective of simple linear regression (which we shall call regression analysis) is to represent the relationship between values of x and y with a model of the form shown in Equation 14.4. Simple Linear Regression Model (Population Model) y b0 b1x . (14.4). where: y Value of the dependent variable x Value of the independent variable b0 Population’s y intercept b1 Slope of the population regression line Random error term The simple linear regression population model described in Equation 14.4 has four assumptions: Assumptions. 1. Individual values of the error terms, , are statistically independent of one another, and these values represent a random sample from the population of possible -values at each level of x. 2. For a given value of x, there can exist many values of y and therefore many values of . Further, the distribution of possible -values for any x-value is normal. 3. The distributions of possible -values have equal variances for all values of x. 4. The means of the dependent variable, y, for all specified values of the independent variable, (my|x), can be connected by a straight line called the population regression model. Figure 14.6 illustrates assumptions 2, 3, and 4. The regression model (straight line) connects the average of the y-values for each level of the independent variable, x. The actual y-values for each level of x are normally distributed around the mean of y. Finally, observe that the spread of possible y-values is the same regardless of the level of x. The population regression line is determined by two values, b0 and b1. These values are known as the population regression coefficients. Value b0 identifies the y intercept and b1 the slope of the regression line. Under the regression assumptions, the coefficients define the true population model. For each observation, the actual value of the dependent variable, y, for any x is the sum of two components: y. b0 b1 x Linear component. . ε Random error component. The random error component, , may be positive, zero, or negative, depending on whether a single value of y for a given x falls above, on, or below the population regression line. Section 15.5 in Chapter 15 discusses how to check whether assumptions have been violated and the possible courses of action if the violations occur. FIGURE 14.6. |. Graphical Display of Linear Regression Assumptions. y my|x = b0 + b1x. my|x3 my|x2. my|x1 x1. x2. x3. x.

(128) CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 591. Meaning of the Regression Coefficients Regression Slope Coefficient The average change in the dependent variable for a unit change in the independent variable. The slope coefficient may be positive or negative, depending on the relationship between the two variables.. Coefficient b1, the regression slope coefficient of the population regression line, measures the average change in the value of the dependent variable, y, for each unit change in x. The regression slope can be either positive, zero, or negative, depending on the relationship between x and y. For example, a positive population slope of 12 (b1 12) means that for a 1-unit increase in x, we can expect an average 12-unit increase in y. Correspondingly, if the population slope is negative 12 (b1 12), we can expect an average decrease of 12 units in y for a 1-unit increase in x. The population’s y intercept, b0, indicates the mean value of y when x is 0. However, this interpretation holds only if the population could have x values equal to 0. When this cannot occur, b0 does not have a meaningful interpretation in the regression model. BUSINESS APPLICATION. SIMPLE LINEAR REGRESSION ANALYSIS. MIDWEST DISTRIBUTION (CONTINUED) The Midwest Distribution marketing manager has data for a sample of 12 sales representatives. In Section 14.1, she has established that a significant linear relationship exists between years of experience and total sales using correlation analysis. (Recall that the sample correlation between the two variables was r 0.8325.) Now she would like to estimate the regression equation that defines the true linear relationship (that is, the population’s linear relationship) between years of experience and sales. Figure 14.3 shows the scatter plot for two variables: years with the company and sales. We need to use the sample data to estimate b0 and b1, the true intercept and slope of the line representing the relationship between two variables. The regression line through the sample data is the best estimate of the population regression line. However, there are an infinite number of possible regression lines for a set of points. For example, Figure 14.7 shows three of the. FIGURE 14.7. 700. |. Possible Regression Lines. y. 700 600 Sales in Thousands. Sales in Thousands. 600 500 400 300 200 100. (a). x 0. 1. 2. 3 4 5 6 7 Years with Company. 700. 8. 9. yˆ = 250 + 40x. 500 400 300 200 100. yˆ = 450 + 0x. 0. 10. x 0. 1. 2. (b). 3 4 5 6 7 Years with Company. y. 600 Sales in Thousands. 0. y. ˆy = 150 + 60x. 500 400 300 200 100 0. (c). x 0. 1. 2. 3 4 5 6 7 Years with Company. 8. 9. 10. 8. 9. 10.

(129) CHAPTER 14. FIGURE 14.8. |. Introduction to Linear Regression and Correlation Analysis. |. Computation of Regression Error for the Midwest Distribution Example. 700. y. 600. Sales in Thousands. 592. yˆ = 150 + 60x. 500. 390 yˆ. 400 300 200 Residual = 312 – 390 = –78 312 y. 100 0. Least Squares Criterion The criterion for determining a regression line that minimizes the sum of squared prediction errors.. Residual The difference between the actual value of y and the predicted value yˆ for a given level of the independent variable, x.. 0. 1. 2. 3. 4 5 6 Years with Company. 7. 8. 9. x 10. possible different lines that pass through the Midwest Distribution data. Which line should be used to estimate the true regression model? We must establish a criterion for selecting the best line. The criterion used is the least squares criterion. To understand the least squares criterion, you need to know about prediction error, or residual, which is the distance between the actual y coordinate of an (x, y) point and the predicted value of that y coordinate produced by the regression line. Figure 14.8 shows how the prediction error is calculated for the employee who was with Midwest for four years (x 4) using one possible regression line: (where yˆ is the predicted sales value). The predicted sales value is yˆ 150 60(4) 390 However, the actual sales (y) for this employee is 312 (see Table 14.2). Thus, when x 4, the difference between the observed value, y 312, and the predicted value, yˆ 390, is 312 390 78. The residual (or prediction error) for this case when x 4 is 78. Table 14.2 shows the calculated prediction errors and sum of squared errors for each of the three regression lines shown in Figure 14.7.1 Of these three potential regression models, the line with the equation yˆ 150 60 x has the smallest sum of squared errors. However, is there a better line than n. 2 this? That is, would ∑ ( yi yˆi ) be smaller for some other line? One way to determine this is i1 to calculate the sum of squared errors for all other regression lines. However, because there are an infinite number of these lines, this approach is not feasible. Fortunately, through the use of calculus, equations can be derived to directly determine the slope and intercept estimates n. 2 such that ∑ ( yi yˆi ) is minimized.2 This is accomplished by letting the estimated regression i1. model be of the form shown in Equation 14.5. Chapter Outcome 3.. Estimated Regression Model (Sample Model) yˆ b0 b1 x. (14.5). where: yˆ Estimated, or predicted, y-value b0 Unbiased estimate of the regression intercept, found using Equation 14.8 b1 Unbiased estimate of the regression slope, found using Eqquation 14.6 or 14.7 x Value of the independent variable Equations 14.6 and 14.8 are referred to as the solutions to the least squares equations because they provide the slope and intercept that minimize the sum of squared errors. Equation 14.7 is 1The reason we are using the sum of the squared residuals is that the sum of the residuals will be zero for the best regression line (the positive values of the residuals will balance the negative values). 2The calculus derivation of the least squares equations is contained in the Kutner et al. reference shown at the end of this chapter..

(130) CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. | Sum of Squared Errors for Three Linear Equations for Midwest Distribution TABLE 14.2. From Figure 14.7(a):. yˆ = 450 + 0 x Residual x. yˆ. y. y − yˆ. 3 5 2 8 2 6 7 1 4 2 9 6. 450 450 450 450 450 450 450 450 450 450 450 450. 487 445 272 641 187 440 346 238 312 269 655 563. 37 5 178 191 263 10 104 212 138 181 205 113. ( y − yˆ )2 1,369 25 31,684 36,481 69,169 100 10,816 44,944 19,044 32,761 42,025 12,769. ∑ 301,187 From Figure 14.7(b):. yˆ 250 40 x Residual x. yˆ. y. y − yˆ. 3 5 2 8 2 6 7 1 4 2 9 6. 370 450 330 570 330 490 530 290 410 330 610 490. 487 445 272 641 187 440 346 238 312 269 655 563. 117 5 58 71 143 50 184 52 98 61 45 73. ( y − yˆ )2 13,689 25 3,364 5,041 20,449 2,500 33,856 2,704 9,604 3,721 2,025 5,329. ∑ 102,307 From Figure 14.7(c):. yˆ 150 60x Residual x. yˆ. y. y − yˆ. 3 5 2 8 2 6 7 1 4 2 9 6. 330 450 270 630 270 510 570 210 390 270 690 510. 487 445 272 641 187 440 346 238 312 269 655 563. 157 5 2 11 83 70 224 28 78 1 35 53. ( y − yˆ )2 24,649 25 4 121 6,889 4,900 50,176 784 6,084 1 1,225 2,809. ∑ 97,667. 593.

(131) 594. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. the algebraic equivalent of Equation 14.6 and may be easier to use when the computation is performed using a calculator. Least Squares Equations. b1 . ∑( xi x )( yi y ) ∑ (xi x )2. (14.6). algebraic equivalent: ∑x∑y n b1 ( ∑ x )2 ∑ x2 n. (14.7). b0 y b1 x. (14.8). ∑ xy −. and. Table 14.3 shows the manual calculations, which are subject to rounding, for the least squares estimates for the Midwest Distribution example. However, you will almost always. | Manual Calculations for Least Squares Regression Coefficients for the Midwest Distribution Example TABLE 14.3. y. x. xy. x2. 487. 3. 1,461. 9. 237,169. 445. 5. 2,225. 25. 198,025. 272. 2. 544. 4. 73,984. 641. 8. 5,128. 64. 410,881. 187. 2. 374. 4. 34,969. 440. 6. 2,640. 36. 193,600. 346. 7. 2,422. 49. 119,716. 238. 1. 238. 1. 56,644. 312. 4. 1,248. 16. 97,344. 269. 2. 538. 4. 72,361. 655. 9. 5,895. 81. 429,025. 563. 6. 3,378. 36. 316,969. ∑ y 4,855. ∑ x 55. ∑ xy 26,091. ∑ x2 329. ∑ y2 2,240,687. y. ∑y n. . b1 . 4, 855 12 ∑xy − ∑x 2 −. 49.91. 404.58 ∑x ∑ y n = ( ∑x ) 2 n. x. ∑x. 26, 091 − 329 −. n. . 55 12. y2. 4.58. 55(4, 855) 12 (55) 2 12. Then, b0 y − b1 x 404.58 − 49.91(4.58) 175.99 The least squares regression line is, therefore, yˆ 175.99 49.91x There is a slight difference between the manual calculation and the computer result due to rounding..

(132) CHAPTER 14. FIGURE 14.9A. |. Introduction to Linear Regression and Correlation Analysis. 595. |. Excel 2007 Midwest Distribution Regression Results. Excel 2007 Instructions:. 1. Open file: Midwest.xls. 2. On the Data tab, click Data Analysis. 3. Select Regression Analysis. 4. Define x (Years with Midwest) and y (Sales) variable data range. 5. Select output location. 6. Check Labels. 7. Click Residuals. 8. Click OK.. SSE = 84834.2947. Estimated regression equation is y = 175.8288 = 49.9101(x). ^. use a software package such as Excel or Minitab to perform these computations. (Figures 14.9a and 14.9b show the Excel and Minitab output.) In this case, the “best” regression line, given the least squares criterion, is yˆ 175.8288 49.9101(x) . Figure 14.10 shows the predicted sales values along with the prediction errors and squared errors associated with this best simple linear regression line. Keep in mind that the prediction errors are also referred to as residuals. From Figure 14.10, the sum of the squared errors is 84,834.29. This is the smallest sum of squared residuals possible for this set of sample data. No other simple linear regression line. FIGURE 14.9B. |. Minitab Midwest Distribution Regression Results. Minitab Instructions:. 1. Open file: Midwest. MTW. 2. Choose Stat Regression Regression. 3. In Response, enter the y variable column. 4. In Predictors, enter the x variable column. 5. Click Storage; under Diagnostic Measures select Residuals. 6. Click OK. OK.. Estimated regression equation is yˆ = 176 + 49.9(x). Sum of squares residual = 84,834.

(133) 596. CHAPTER 14. FIGURE 14.10. |. |. Introduction to Linear Regression and Correlation Analysis. Residuals and Squared Residuals for the Midwest Distribution Example. Excel 2007 Instructions:. 1. Create Squared Residuals using Excel formula (i.e., for cell D25, use = C25^2). 2. Sum the residuals and squared residuals columns. Minitab Instructions. (for similar results): 1. Choose Calc Column Statistics. 2. Under Statistics, Choose Sum. 3. In Input variable, enter residual column. 4. Click OK. 5. Choose Calc Column Statistics. 6. Under Statistic, choose Sum of Squares. 7. In Input variable, enter residual column. 8. Click OK.. Sum of residuals equal zero. SSE = 84,834.2947. through these 12 (x, y) points will produce a smaller sum of squared errors. Equation 14.9 presents a formula that can be used to calculate the sum of squared errors manually.. Sum of Squared Errors SSE ∑ y 2 b0 ∑ y b1 ∑ xy. (14.9). Figure 14.11 shows the scatter plot of sales and years of experience and the least squares regression line for Midwest Distribution. This line is the best fit for these sample data. The regression line passes through the point corresponding to (x , y ). This will always be the case.. Least Squares Regression Properties Figure 14.10 illustrates several important properties of least squares regression. These are as follows: 1. The sum of the residuals from the least squares regression line is 0 (Equation 14.10). The total underprediction by the regression model is exactly offset by the total overprediction.. Sum of Residuals n. ∑ ( yi yˆi ) 0 i1. (14.10).

(134) CHAPTER 14. FIGURE 14.11. |. Introduction to Linear Regression and Correlation Analysis. 597. |. Least Squares Regression Line for Midwest Distribution Excel 2007 Instructions: 1. Open file: Midwest.xls. 2. Move the Sales column to the right of Years with Midwest column. 3. Select data for chart. 4. On Insert tab, click XY (Scatter), and then click Scatter with only Markers option. 5. Use the Layout tab of the Chart Tools to add titles and remove grid lines. 6. Use the Design tab of the Chart Tools to move the chart to a new worksheet. 7. Click on a chart point. 8. Right click and select Add Trendline. 9. Select Linear.. yˆ 175.8288 49.9101x. 2. The sum of the squared residuals is the minimum (Equation 14.11). Sum of Squared Residuals (Errors) n. SSE . ∑ ( yi yˆi )2. (14.11). i1. This property provided the basis for developing the equations for b0 and b1. 3. The simple regression line always passes through the mean of the y variable, y , and the mean of the x variable, x . So, to manually draw any simple linear regression line, all you need to do is to draw a line connecting the least squares y intercept with the ( x , y ) point. 4. The least squares coefficients are unbiased estimates of b0 and b1. Thus, the expected values of b0 and b1 equal b0 and b1, respectively. EXAMPLE 14-2. SIMPLE LINEAR REGRESSION AND CORRELATION. Fitzpatrick & Associates The investment firm Fitzpatrick & Associates wants to manage the pension fund of a major Chicago retailer. For their presentation to the retailer, the Fitzpatrick analysts want to use simple linear regression to model the relationship between profits and numbers of employees for 50 Fortune 500 companies in the firm’s portfolio. The data for the analysis are contained in the file Fortune 50. This analysis can be done using the following steps:. Excel and Minitab. tutorials. Excel and Minitab Tutorial. Step 1 Specify the independent and dependent variables. The object in this example is to model the linear relationship between number of employees (the independent variable) and each company’s profits (the dependent variable). Step 2 Develop a scatter plot to graphically display the relationship between the independent and dependent variables. Figure 14.12 shows the scatter plot, where the dependent variable, y, is company profits and the independent variable, x, is number of employees..

(135) 598. CHAPTER 14. FIGURE 14.12. |. Introduction to Linear Regression and Correlation Analysis. |. Excel 2007 Scatter Plot for Fitzpatrick & Associates. Excel 2007 Instructions: 1. Open file: Fortune 50.xls. 2. Copy the Profits column to the immediate right of Employees column. 3. Select data for chart (Employees and Profits). 4. On Insert tab, click XY (Scatter), and then click the Scatter with only Markers option. Minitab Instructions (for similar result): 1. Open file: Fortune 50.MTW. 2. Choose Graph Character Graphs Scatterplot.. 5. Use the Layout tab of the Chart Tools to add titles and remove grid lines. 6. Use the Design tab of the Chart Tools to move the chart to a new worksheet.. 3. In Y variable, enter y column. 4. In X variable, enter x column. 5. Click OK.. There appears to be a slight positive linear relationship between the two variables. Step 3 Calculate the correlation coefficient and the linear regression equation. Do either manually using Equations 14.1, 14.6 (or 14.7), and 14.8, respectively, or by using Excel or Minitab software. Figure 14.13 shows the regression results. The sample correlation coefficient (called “Multiple R” in Excel) is r 0.3638 The regression equation is yˆ 2, 556.88 0.0048 x The regression slope is estimated to be 0.0048, which means that for each additional employee, the average increase in company profit is 0.0048 million dollars, or $4,800. The intercept can only be interpreted when a value equal to zero for the x variable (employees) is plausible. Clearly, no company has zero employees, so the intercept in this case has no meaning other than it locates the height of the regression line for x 0..

(136) CHAPTER 14. FIGURE 14.13. |. Introduction to Linear Regression and Correlation Analysis. 599. |. Excel 2007 Regression Results for Fitzpatrick & Associates. Correlation coefficient r = √ 0.1323 = 0.3638. Regression Equation. Excel 2007 Instructions: 1. Open file: Fortune 50.xls. 2. On the Data tab, click Data Analysis. 3. Select Regression Analysis. 4. Define x (Employees) and y (Profits) variable data ranges.. 5. Check Labels. 6. Select output location. 7. Click OK.. Minitab Instructions (for similar result): 3. In Response, enter the y variable column. 1. Open file: Fortune 50.MTW. 4. In Predictors, enter the x variable column. 2. Choose Stat Regression Regression. 5. Click OK. >>END EXAMPLE. TRY PROBLEM 14-17 a,b,c (pg. 610). Chapter Outcome 4.. Significance Tests in Regression Analysis In Section 14.1, we pointed out that the correlation coefficient computed from sample data is a point estimate of the population correlation coefficient and is subject to sampling error. We also introduced a test of significance for the correlation coefficient. Likewise, the regression coefficients developed from a sample of data are also point estimates of the true regression coefficients for the population. The regression coefficients are subject to sampling error. For example, due to sampling error the estimated slope coefficient may be positive or negative while the population slope is really 0. Therefore, we need a test procedure to determine whether the regression slope coefficient is statistically significant. As you will see in this section, the test for the simple linear regression slope coefficient is equivalent to the test for the correlation coefficient. That is, if the correlation between two variables is found to be significant, then the regression slope coefficient will also be significant..

(137) 600. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. The Coefficient of Determination, R2 BUSINESS APPLICATION. TESTING THE REGRESSION MODEL. MIDWEST DISTRIBUTION (CONTINUED) Recall that the Midwest Distribution marketing manager was analyzing the relationship between the number of years an employee had been with the company (independent variable) and the sales generated by the employee (dependent variable). We note when looking at the sample data for 12 employees (see Table 14.3) that sales vary among employees. Regression analysis aims to determine the extent to which an independent variable can explain this variation. In this case, does number of years with the company help explain the variation in sales from employee to employee? The SST (total sum of squares) can be used in measuring the variation in the dependent variable. SST is computed using Equation 14.12. For Midwest Distribution, the total sum of squares for sales is provided in the output generated by Excel or Minitab, as shown in Figure 14.14a and Figure 14.14b. As you can see, the total sum of squares in sales that needs to be explained is 276,434.92. Note that the SST value is in squared units and has no particular meaning.. Total Sum of Squares n. SST . ∑ ( yi y )2. (14.12). i1. where:. FIGURE 14.14A. SST Total sum of squares n Sample size yi ith value of the dependent variable y Average value of the dependent variable. |. Excel 2007 Regression Results for Midwest Distribution. R-squared = 0.6931. Excel 2007 Instructions: 1. Open file: Midwest.xls. 2. On the Data tab, click Data Analysis. 3. Select Regression Analysis. 4. Define x (Years with Midwest) and y (Sales) variable data range. 5. Click on Labels. 6. Specify output location. 7. Click OK.. SSR = 191,600.62. SST = 276,434.92 SSE = 84,834.29.

(138) CHAPTER 14. FIGURE 14.14B. |. Introduction to Linear Regression and Correlation Analysis. 601. |. Minitab Regression Results for Midwest Distribution. R-squared = 0.693. Minitab Instructions: 1. Open file: Midwest.MTW. 2. Choose Stat Regression Regression. 3. In Response, enter the y variable column. 4. In Predictors, enter the x variable column. 5. Click OK.. SSR = 191,601. SSE = 84,834 SST = 276,435. The least squares regression line is computed so that the sum of squared residuals is minimized (recall the discussion of the least squares equations). The sum of squared residuals is also called the sum of squares error (SSE) and is defined by Equation 14.13.. Sum of Squares Error n. SSE . ∑ ( yi yˆi )2. (14.13). i1. where:. n Sample size yi i th value of the dependent variable yˆi i th predicted value of y given the i th value of x. SSE represents the amount of the total sum of squares in the dependent variable that is not explained by the least squares regression line. Excel refers to SSE as sum of squares residual and Minitab refers to SSE as residual error. This value is contained in the regression output shown in Figure 14.14a and Figure 14.14b. SSE ∑( y yˆ)2 84,834.29 Thus, of the total sum of squares (SST 276,434.92), the regression model leaves SSE 84,834.29 unexplained. Then, the portion of the total sum of squares that is explained by the regression line is called the sum of squares regression (SSR) and is calculated by Equation 14.14..

(139) 602. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. Sum of Squares Regression n. SSR . ∑ ( yˆi y )2. (14.14). i1. where: yˆi Estimated value of y for each value of x y Average value of the y variable. The sum of squares regression (SSR 191,600.62) is also provided in the regression output shown in Figure 14.14a and Figure 14.14b. You should also note that the following holds: SST SSR SSE For the Midwest Distribution example, in the Minitab output we get 276,435 191,601 84,834 Coefficient of Determination The portion of the total variation in the dependent variable that is explained by its relationship with the independent variable. The coefficient of determination is also called R-squared and is denoted as R 2.. We can use these calculations to compute an important measure in regression analysis called the coefficient of determination. The coefficient of determination is calculated using Equation 14.15.. Coefficient of Determination, R2 R2 . SSR SST. (14.15). Then, for the Midwest Distribution example, the proportion of variation in sales that can be explained by its linear relationship with the years of sales force experience is R2 . SSR 191, 600.62 0.6931 SST 276, 434.92. This means that 69.31% of the variation in the sales data for this sample can be explained by the linear relationship between sales and years of experience. Notice that R2 is part of the regression output in Figures 14.14a and 14.14b. R2 can be a value between 0 and 1.0. If there is a perfect linear relationship between two variables, then the coefficient of determination, R2, will be 1.0. This would correspond to a situation in which the least squares regression line would pass through each of the points in the scatter plot. R2 is the measure used by many decision makers to indicate how well the linear regression line fits the (x, y) data points. The better the fit, the closer R2 will be to 1.0. R2 will be close to 0 when there is a weak linear relationship. Finally, when you are employing simple linear regression (a linear relationship between a single independent variable and the dependent variable), there is an alternative way of computing R2, as shown in Equation 14.16.. Coefficient of Determination for the Single Independent Variable Case R2 r2 where: R2 Coefficient of determination r Sample correlation coefficient. (14.16).

(140) CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 603. Therefore, by squaring the correlation coefficient, we can get R2 for the simple regression model. Figure 14.14a shows the correlation, r 0.8325, which is referred to as Multiple R in Excel. Then, using Equation 14.16, we get R2. R2 r 2 0.83252 0.6931 Keep in mind that R2 0.6931 is based on the random sample of size 12 and is subject to sampling error. Thus, just because R2 0.6931 for the sample data does not mean that knowing the number of years an employee has worked for the company will explain 69.31% of the variation in sales for the population of all employees with the company. Likewise, just because R2 0.0 for the sample data does not mean that the population coefficient of determination, noted as r2 (rho-squared), is greater than zero. However, a statistical test exists for testing the following null and alternative hypotheses: H 0: r 2 0 HA: r2 0 The test statistic is an F-test with the test statistic defined as shown in Equation 14.17. Test Statistic for Significance of the Coefficient of Determination SSR F 1 SSE (n 2). df (D1 1, D2 n 2). (14.17). where: SSR Sum of squares regression SSE Sum of squares error. For the Midwest Distribution example, the test statistic is computed using Equation 14.17 as follows: 191, 600.62 1 F 22.58 84, 834.29 (12 2 ) The critical value from the F-distribution table in Appendix H for a 0.05 and for 1 and 10 degrees of freedom is 4.965. This gives the following decision rule: If F 4.965, reject the null hypothesis. Otherwise, do not reject the null hypothesis. Because F 22.58 4.965, we reject the null hypothesis and conclude the population coefficient of determination (r2) is greater than zero. This means the independent variable explains a significant proportion of the variation in the dependent variable. For a simple regression model (a regression model with a single independent variable), the test for r2 is equivalent to the test shown earlier for the population correlation coefficient, r. Refer to Figure 14.5 to see that the t-test statistic for the correlation coefficient was t 4.752. If we square this t-value we get t2 4.7522 F 22.58 Thus, the tests are equivalent. They will provide the same conclusions about the relationship between the x and y variables..

(141) 604. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. Significance of the Slope Coefficient For a simple linear regression model (one independent variable), there are three equivalent statistical tests: 1. Test for significance of the correlation between x and y. 2. Test for significance of the coefficient of determination. 3. Test for significance of the regression slope coefficient. We have already introduced the first two of these tests. The third one deals specifically with the significance of the regression slope coefficient. The null and alternative hypotheses to be tested are H0: b1 0 HA: b1 0 To test the significance of the simple linear regression slope coefficient, we are interested in determining whether the population regression slope coefficient is 0. A slope of 0 would imply that there is no linear relationship between x and y variables and that the x variable, in its linear form, is of no use in explaining the variation in y. If the linear relationship is useful, then we should reject the hypothesis that the regression slope is 0. However, because the estimated regression slope coefficient, b1, is calculated from sample data, it is subject to sampling error. Therefore, even though b1 is not 0, we must determine whether its difference from 0 is greater than would generally be attributed to sampling error. If we selected several samples from the same population and for each sample determined the least squares regression line, we would likely get regression lines with different slopes and different y intercepts. This is analogous to getting different sample means from different samples when attempting to estimate a population mean. Just as the distribution of possible sample means has a standard error, the possible regression slopes also have a standard error, which is given in Equation 14.18. Simple Regression Standard Error of the Slope Coefficient (Population) sb 1. where:. s ∑(x x)2. (14.18). sb Standard deviation of the regression sloope 1 (called the standard error of the slope) sε Population standard error of the estimate. Equation 14.18 requires that we know the standard error of the estimate. It measures the dispersion of the dependent variable about its mean value at each value of the dependent variable in the original units of the dependent variable. However, because we are sampling from the population, we can estimate se as shown in Equation 14.19. Simple Regression Estimator for the Standard Error of the Estimate sε . SSE n2. (14.19). where: SSE Sum of squares error n Sample size Equation 14.18, the standard error of the regression slope, applies when we are dealing with a population. However, in most cases, such as the Midwest Distribution example, we are.

(142) CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 605. dealing with a sample from the population. Thus, we need to estimate the regression slope’s standard error using Equation 14.20.. Simple Regression Estimator for the Standard Error of the Slope sb 1. sε. (14.20). ∑(x x )2. where: sb Estimate of the standard error of the leeast squares slope 1. sε . SSE Sample standard error of the estimate (the measure n 2 of deviation of the actual y-values around the regression line). BUSINESS APPLICATION. REGRESSION ANALYSIS USING COMPUTER SOFTWARE. MIDWEST DISTRIBUTION (CONTINUED) For Midwest Distribution, the regression outputs in Figures 14.15a and 14.15b show b1 49.91. The question is whether this value is different enough from 0 to have not been caused by sampling error. We find the answer by looking at the value of the estimate of the standard error of the slope, calculated using Equation 14.20, which is also shown in Figure 14.15a. The standard error of the slope coefficient is 10.50. If the standard error of the slope s b is large, then the value of b1 will be quite variable 1 from sample to sample. Conversely, if s b is small, the possible slope values will be less vari1 able. However, regardless of the standard error of the slope, the average value of b1 will equal b1, the true regression slope, if the assumptions of the regression analysis are satisfied. Figure 14.16 illustrates what this means. Notice that when the standard error of the slope is FIGURE 14.15A. |. Excel 2007 Regression Results for Midwest Distribution. Excel 2007 Instructions: 1. Open file: Midwest.xls. 2. On the Data tab, click Data Analysis. 3. Select Regression Analysis. 4. Define x (Years with Midwest) and y (Sales) variable data range. 5. Click on Labels. 6. Specify output location. 7. Click OK.. The calculated t statistic and p-value for testing whether the regression slope is 0. Standard error of the regression slope = 10.50. The corresponding F-ratio and p-value for testing whether the regression slope equals 0.

(143) 606. CHAPTER 14. FIGURE 14.15B. |. Introduction to Linear Regression and Correlation Analysis. |. Minitab Regression Results for Midwest Distribution. The calculated t statistic and p-value for testing whether the regression slope is 0.. Minitab Instructions: 1. Open file: Midwest. MTW. 2. Choose Stat Regression Regression. 3. In Response, enter the y variable column. 4. In Predictors enter the x variable column. 5. Click OK.. The corresponding F-ratio and p-value for testing whether the regression slope equals 0. Regression slope = 49.91. Standard error of the regression slope = 10.50. large, the sample slope can take on values much different from the true population slope. As Figure 14.16(a) shows, a sample slope and the true population slope can even have different signs. However, when b1 is small, the sample regression lines will cluster closely around the true population line [Figure 14.16(b)]. Because the sample regression slope will most likely not equal the true population slope, we must test to determine whether the true slope could possibly be 0. A slope of 0 in the linear model means that the independent variable will not explain any variation in the dependent variable, nor will it be useful in predicting the dependent variable. Assuming a 0.05 level of significance, the null and alternative hypotheses to be tested are H0: b1 0 HA: b1 0. FIGURE 14.16. |. Standard Error of the Slope. y. y. Sample 3. E(y) 0 1x E(y) 0 1x. Sample 2. Sample 1. (a) Large Standard Error. x. (b) Small Standard Error. x.

(144) CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 607. To test the significance of a slope coefficient, we use the t-test value in Equation 14.21. Simple Linear Regression Test Statistic for Test of the Significance of the Slope t. b1 b1 sb. df n 2. (14.21). 1. where: b1 Sample regression slope coefficient b1 Hypothesized slope (usually b1 0) sb Estimator of the standard error of the slope 1. Figure 14.17 illustrates this test for the Midwest Distribution example. The calculated t-value of 4.752 exceeds the critical value, t 2.228, from the t-distribution with 10 degrees of freedom and a/2 0.025. This indicates that we should reject the hypothesis that the true regression slope is 0. Thus, years of experience can be used to help explain the variation in an individual representative’s sales. This is not a coincidence. This test is always equivalent to the tests for r and r2 presented earlier. The output shown in Figures 14.15a and 14.15b also contains the calculated t statistic. The p-value for the calculated t statistic is also provided. As with other situations involving two-tailed hypothesis tests, if the p-value is less than a, the null hypothesis is rejected. In this case, because p-value 0.0008 0.05, we reject the null hypothesis.. FIGURE 14.17. |. Significance Test of the Regression Slope for Midwest Distribution. Hypotheses: H0: 1 0 HA: 1 0 0.05. df 12 – 2 10 Rejection Region /2 0.025. –t0.025 –2.228. Rejection Region /2 0.025. 0. t0.025 2.228. The calculated t is b – 1 49.91 – 0 t 1 4.752 sb1 10.50 Decision Rule: If t > t0.025 2.228, reject H0. If t < –t0.025 –2.228, reject H0. Otherwise, do not reject H0. Because 4.752 > 2.228, we should reject the null hypothesis and conclude that the true slope is not 0. Thus, the simple linear relationship that utilizes the independent variable, years with the company, is useful in explaining the variation in the dependent variable, sales volume..

(145) 608. CHAPTER 14. How to do it. |. Introduction to Linear Regression and Correlation Analysis. (Example 14-3). EXAMPLE 14-3. Vantage Electronic Systems Consider the example involving. Simple Linear Regression Analysis The following steps outline the process that can be used in developing a simple linear regression model and the various hypotheses tests used to determine the significance of a simple linear regression model.. 1. Define the independent (x) and dependent (y) variables and select a simple random sample of pairs of (x, y) values.. SIMPLE LINEAR REGRESSION ANALYSIS. Vantage Electronic Systems in Deerfield, Michigan, which started out supplying electronic equipment for the automobile industry but in recent years has ventured into other areas. One area is visibility sensors used by airports to provide takeoff and landing information and by transportation departments to detect low visibility on roadways during fog and snow. The recognized leader in the visibility sensor business is the SCR Company, which makes a sensor called the Scorpion. The research and development (R&D) department at Vantage has recently performed a test on its new unit by locating a Vantage sensor and a Scorpion sensor side-by-side. Various data, including visibility measurements, were collected at randomly selected points in time over a two-week period. These data are contained in a file called Vantage.. 2. Develop a scatter plot of y and x. You are looking for a linear relationship between the two variables.. 3. Compute the correlation coefficient for the sample data.. 4. Calculate the least squares regression line for the sample data and the coefficient of determination, R2. The coefficient of determination measures the proportion of variation in the dependent variable explained by the independent variable.. 5. Conduct any of the following tests for determining whether the regression model is statistically significant. a. Test to determine whether the true regression slope is 0. The test statistic with df n 2 is t. Step 1 Define the independent (x) and dependent ( y) variables. The analysis included a simple linear regression using the Scorpion visibility measurement as the dependent variable, y, and the Vantage visibility measurement as the independent variable, x. Step 2 Develop a scatter plot of y and x. The scatter plot is shown in Figure 14.18. There does not appear to be a strong linear relationship. Step 3 Compute the correlation coefficient for the sample data. Equation 14.1 or 14.2 can be used for manual computation, or we can use Excel or Minitab. The sample correlation coefficient is r 0.5778 Step 4 Calculate the least squares regression line for the sample data and the coefficient of determination, R2. Equations 14.7 and 14.8 can be used to manually compute the regression slope coefficient and intercept, respectively, and Equation 14.15 or 14.16 can be used to manually compute R2. Excel and Minitab can also be used to eliminate the computational burden. The coefficient of determination is R2 r2 0.57782 0.3339. b1 1 b1 0 sb sb 1 1. b. Test to see whether r is significantly different from 0. The test statistic is r t 1 r 2. FIGURE 14.18. Visibility Scatter Plot. 12. Scorpion Visibility. SSR 1 F SSE (n 2) 6. Reach a decision. 7. Draw a conclusion.. Scatter Plot—Example 14-3. y 14. n2 c. Test to see whether r2 is significantly greater than 0. The test statistic is. |. 10 8 6 4 2 0 0. 0.2. 0.4 0.6 0.8 Vantage Visibility. 1.0. 1.2. x.

(146) CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 609. Thus, approximately 33% of the variation in the Scorpion visibility measures is explained by knowing the corresponding Vantage system visibility measure. The least squares regression equation is. Excel and Minitab. tutorials. yˆ 0.586 3.017 x. Excel and Minitab Tutorial. Step 5 Conduct a test to determine whether the regression model is statistically significant (or whether the population correlation is equal to 0). The null and alternative hypotheses to test the correlation coefficient are H0: r 0 HA: r 0 The t-test statistic using Equation 14.3 is t. r 1 r 2 n2. . 0.5778 1 0.57782 280 2. 11.8. The t 11.8 exceeds the critical t for any reasonable level of a for 278 degrees of freedom, so the null hypothesis is rejected and we conclude that there is a statistically significant linear relationship between visibility measures for the two visibility sensors. Alternatively, the null and alternative hypotheses to test the regression slope coefficient are H0: b1 0 HA: b1 0 The t-test statistic is t. b1 1 3.017 0 11.8 0.2557 sb 1. Step 6 Reach a decision. The t-test statistic of 11.8 exceeds the t-critical for any reasonable level of a for 278 degrees of freedom. Step 7 Draw a conclusion. The population regression slope coefficient is not equal to 0. This means that knowing the Vantage visibility reading provides useful help in knowing what the Scorpion visibility reading will be. >>END EXAMPLE. TRY PROBLEM 14-16 (pg. 609). MyStatLab. 14-2: Exercises Skill Development 14-16. You are given the following sample data for variables y and x: y 140.1 120.3 80.8 100.7 130.2 90.6 110.5 120.2 130.4 130.3 100.1 x. 5. 3. 2. 4. 5. 4. 4. 5. 6. 5. a. Develop a scatter plot for these data and describe what, if any, relationship exists.. 4. b. (1) Compute the correlation coefficient. (2) Test to determine whether the correlation is significant at the significance level of 0.05. Conduct this hypothesis test using the p-value approach. (3) Compute the regression equation based on these sample data and interpret the regression coefficients. c. Test the significance of the overall regression model using a significance level equal to 0.05..

(147) 610. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 14-17. You are given the following sample data for variables x and y: x (independent). y (dependent). 1 7 3 8 11 5 4. 16 50 22 59 63 46 43. a. Construct a scatter plot for these data and describe what, if any, relationship appears to exist. b. Compute the regression equation based on these sample data and interpret the regression coefficients. c. Based on the sample data, what percentage of the total variation in the dependent variable can be explained by the independent variable? d. Test the significance of the overall regression model using a significance level of 0.01. e. Test to determine whether the true regression slope coefficient is equal to 0. Use a significance level of 0.01. 14-18. The following data for the dependent variable, y, and the independent variable, x, have been collected using simple random sampling: x. y. 10 14 16 12 20 18 16 14 16 18. 120 130 170 150 200 180 190 150 160 200. 30.3 14.6. 4.8 27.9. 15.2 17.6. 24.9 15.3. 8.6 19.8. x y. 1 4. 2 2. 3 5. 4 8. 5 9. a. Construct a scatter plot of these data. Describe the relationship between x and y. b. Calculate the sum of squares error for the following equations: (1) yˆ 0.8 1.60 x , (2) yˆ 1 1.50 x , and (3) yˆ 0.7 1.60 x. c. Which of these equations provides the “best” fit of these data? Describe the criterion you used to determine “best” fit. d. Determine the regression line that minimizes the sum of squares error.. Business Applications 14-21. The Skelton Manufacturing Company recently did a study of its customers. A random sample of 50 customer accounts was pulled from the computer records. Two variables were observed: y Total dollar volume of business this year x Miles customer is from corporate headquarters The following statistics were computed: yˆ 2,140.23 10.12 x sb 3.12 1. a. Develop a simple linear regression equation for these data. b. Calculate the sum of squared residuals, the total sum of squares, and the coefficient of determination. c. Calculate the standard error of the estimate. d. Calculate the standard error for the regression slope. e. Conduct the hypothesis test to determine whether the regression slope coefficient is equal to 0. Test using a 0.02. 14-19. Consider the following sample data for the variables y and x: x y. a. Calculate the linear regression equation for these data. b. Determine the predicted y-value when x 10. c. Estimate the change in the y variable resulting from the increase in the x variable of 10 units. d. Conduct a hypothesis test to determine if an increase of 1 unit in the x variable will result in the decrease of the average value of the y variable. Use a significance of 0.025. 14-20. Examine the following sample data for the variables y and x:. 20.1 13.2. 9.3 25.6. 11.2 19.4. a. Interpret the regression slope coefficient. b. Using a significance level of 0.01, test to determine whether it is true that the farther a customer is from the corporate headquarters, the smaller the total dollar volume of business. 14-22. A shipping company believes that the variation in the cost of a customer’s shipment can be explained by differences in the weight of the package being shipped. To investigate whether this relationship is useful, a random sample of 20 customer shipments was selected, and the weight (in lb) and the cost (in dollars, rounded to the nearest dollar) for each shipment were recorded. The following results were obtained: Weight (x). Cost (y). 8 6 5 7. 11 8 11 11.

(148) CHAPTER 14. Weight (x). Cost (y). 12 9 17 13 8 18 17 17 10 20 9 5 13 6 6 12. 17 11 27 16 9 25 21 24 16 24 21 10 21 16 11 20. a. Construct a scatter plot for these data. What, if any, relationship appears to exist between the two variables? b. Compute the linear regression model based on the sample data. Interpret the slope and intercept coefficients. c. Test the significance of the overall regression model using a significance level equal to 0.05. d. What percentage of the total variation in shipping cost can be explained by the regression model you developed in part b? 14-23. College tuition has risen at a pace faster than inflation for more than two decades, according to an article in USA Today. The following data indicate the average college tuition (in 2003 dollars) for private and public colleges: Period. 1983–1984 1988–1989 1993–1994 1998–1999 2003–2004 2008–2009. Private. 9,202. 12,146. 13,844. 16,454. 19,710. 21,582. Public. 2,074. 2,395. 3,188. 3,632. 4,694. 5,652. a. Conduct a simple linear regression analysis of these data in which the average tuition for private colleges is predicted by the average public college tuition. Test the significance of the model using an a 0.10. b. How much does the average private college tuition increase when the average public college tuition increases by $100? c. When the average public college tuition reaches $7,500, how much would you expect the average private college tuition to be?. Computer Database Exercises 14-24. The file Online contains a random sample of 48 customers who made purchases last quarter from an online retailer. The file contains information related to the time each customer spent viewing the online catalog and the dollar amount of purchases made.. |. Introduction to Linear Regression and Correlation Analysis. 611. The retailer would like to analyze the sample data to determine whether a relationship exists between the time spent viewing the online catalog and the dollar amount of purchases. a. Compute the regression equation based on these sample data and interpret the regression coefficients. b. Compute the coefficient of determination and interpret its meaning. c. Test the significance of the overall regression model using a significance level of 0.01. d. Test to determine whether the true regression slope coefficient is equal to 0. Use a significance level of 0.01 to conduct the hypothesis test. 14-25. The National Football League (NFL) is arguably the most successful professional sports league in the United States. Following the recent season, the commissioner’s office staff performed an analysis in which a simple linear regression model was developed with average home attendance used as the dependent variable and the total number of games won during the season as the independent variable. The staff was interested in determining whether games won could be used as a predictor for average attendance. Develop the simple linear regression model. The data are in the file called NFL. a. What percentage of total variation in average home attendance is explained by knowing the number of games the team won? b. What is the standard error of the estimate for this regression model? c. Using a 0.05, test to determine whether the regression slope coefficient is significantly different from 0. d. After examining the regression analysis results, what should the NFL staff conclude about how the average attendance is related to the number of games the team won? 14-26. The consumer price index (CPI) is a measure of the average change in prices over time in a fixed market basket of goods and services typically purchased by consumers. The CPI for all urban consumers covers about 80% of the total population. It is prepared and published by the Bureau of Labor Statistics of the Department of Labor, which measures average changes in prices of goods and services. The CPI is one way the government measures the general level of inflation—the annual percentage change in the value of this index is one way of measuring the annual inflation rate. The file entitled CPI contains the monthly CPI and inflation rate for the period 2000–2005. a. Construct a scatter plot of the CPI versus inflation for the period 2000–2005. Describe the relationship that appears to exist between these two variables. b. Conduct a hypothesis test to confirm your preconception of the relationship between the CPI and the inflation rate. Use a 0.05..

(149) 612. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. c. Does it appear that the CPI and the inflation rate are measuring the same component of our economy? Support your assertion with statistical reasoning. 14-27. The College Board, administrator of the SAT test for college entrants, has made several changes to the test in recent years. One recent change occurred between years 2005 and 2006. In a press release the College Board announced SAT scores for students in the class of 2005, the last class to take the former version of the SAT featuring math and verbal sections. The board indicated that for the class of 2005, the average SAT math scores continued their strong upward trend, increasing from 518 in 2004 to 520 in 2005, 14 points above 10 years previous and an all-time high. The file entitled MathSAT contains the math SAT scores for the interval 1967 to 2005. a. Produce a scatter plot of the average SAT math scores versus the year the test was taken for all students (male and female) during the last 10 years (1996–2005). b. Construct a regression equation to predict the average math scores with the year as the predictor. c. Use regression to determine if the College Board’s assertion concerning the improvement in SAT average math test scores over the last 10 years is overly optimistic. 14-28. One of the editors of a major automobile publication has collected data on 30 of the best-selling cars in the United States. The data are in a file called Automobiles. The editor is particularly interested in the relationship. between highway mileage and curb weight of the vehicles. a. Develop a scatter plot for these data. Discuss what the plot implies about the relationship between the two variables. Assume that you wish to predict highway mileage by using vehicle curb weight. b. Compute the correlation coefficient for the two variables and test to determine whether there is a linear relationship between the curb weight and the highway mileage of automobiles. c. (1) Compute the linear regression equation based on the sample data. (2) A CTS Sedan weighs approximately 4,012 pounds. Provide an estimate of the average highway mileage you would expect to obtain from this model. 14-29. The Insider View of Las Vegas Web site (www.insidervlv .com) furnishes information and facts concerning Las Vegas. A set of data published by them provides the amount of gaming revenue for various portions of Clark County, Nevada. The file entitled “VEGAS” provides the gaming revenue for the year 2005. a. Compute the linear regression equation to predict the gaming revenue for Clark County based on the gaming revenue of the Las Vegas Strip. b. Conduct a hypothesis test to determine if the gaming revenue from the Las Vegas Strip can be used to predict the gaming revenue for all of Clark County. c. Estimate the increased gaming revenue that would accrue to all of Clark County if the gaming revenue on the Las Vegas Strip were to increase by a million dollars. END EXERCISES 14-2. Chapter Outcome 5.. 14.3 Uses for Regression Analysis Regression analysis is a statistical tool that is used for two main purposes: description and prediction. This section discusses these two applications.. Regression Analysis for Description BUSINESS APPLICATION. USING REGRESSION ANALYSIS FOR DECISION-MAKING. CAR MILEAGE In the summer of 2006, gasoline prices soared to record levels in the United States, heightening motor vehicle customers’ concern for fuel economy. Analysts at a major automobile company collected data on a variety of variables for a sample of 30 different cars and small trucks. Included among those data were the Environmental Protection Agency (EPA)’s highway mileage rating and the horsepower of each vehicle. The analysts were interested in the relationship between horsepower (x) and highway mileage (y). The data are contained in the file Automobiles. A simple linear regression model can be developed using Excel or Minitab. The Excel output is shown in Figure 14.19. For these sample data, the coefficient of determination,.

(150) CHAPTER 14. FIGURE 14.19. |. Introduction to Linear Regression and Correlation Analysis. 613. |. Excel 2007 Regression Results for the Automobile Mileage Study. Excel 2007 Instructions: 1. Open file: Automobiles.xls. 2. Click on Data tab. 3. Select Data Analysis > Regression. 4. Define y variable range (Highway Mileage) and x variable range (Horse Power). 5. Check Labels. 6. Specify Output Location. 7. Click OK.. Regression equation HW Mileage = 31.1658 0.0286 (Horse Power). Minitab Instructions (for similar results): 1. Open file: Automobiles.MTW. 3. In Response, enter the y variable column. 2. Choose Stat Regression 4. In Predictors, enter the x variable column. Regression. 5. Click OK.. Excel and Minitab. R2 0.3016, indicates that knowing the horsepower of the vehicle explains 30.16% of the variation in the highway mileage. The estimated regression equation is yˆ 31.1658 0.0286 x. tutorials. Excel and Minitab Tutorial. Before the analysts attempt to describe the relationship between horsepower and highway mileage, they first need to test whether there is a statistically significant linear relationship between the two variables. To do this, they can apply the t-test described in Section 14.2 to test the following null and alternative hypotheses: H0: b1 0 HA: b1 0 at the significance level a 0.05 The calculated t statistic and the corresponding p-value are shown in Figure 14.19. Because the p-value (Significance F ) 0.0017 0.05 the null hypothesis, H0, is rejected and the analysts can conclude that the population regression slope is not equal to 0. The sample slope, b1, equals 0.0286. This means that for each 1-unit increase in horsepower, the highway mileage is estimated to decrease by an average of 0.0286 miles per gallon. However, b1 is subject to sampling error and is considered a point estimate for the true regression slope coefficient. From earlier discussions about point estimates in Chapters 8 and 10, we expect that b1 b1. Therefore, to help describe the relationship between the independent variable, horsepower, and the dependent variable, highway miles per gallon, we need to develop a confidence interval estimate for b1. Equation 14.22 is used to do this..

(151) 614. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. Confidence Interval Estimate for the Regression Slope, Simple Linear Regression. Chapter Outcome 6.. b1 tsb. (14.22). 1. or equivalently, b1 t. sε ∑(x x )2. df n 2. where: sb Standard error of the regression slope coefficient 1. sε Standard error of the estimate. The regression output shown in Figure 14.19 contains the 95% confidence interval estimate for the slope coefficient, which is 0.045 –––––––––– 0.012 Thus, at the 95% confidence level, based on the sample data, the analysts for the car company can conclude that a 1-unit increase in horsepower will result in a drop in mileage by an average amount between 0.012 and 0.045 miles per gallon. There are many other situations in which the prime purpose of regression analysis is description. Economists use regression analysis for descriptive purposes as they search for a way of explaining the economy. Market researchers also use regression analysis, among other techniques, in an effort to describe the factors that influence the demand for products.. EXAMPLE 14-4. DEVELOPING A CONFIDENCE INTERVAL ESTIMATE FOR THE REGRESSION SLOPE. Home Prices Home values are determined by a variety of factors. One factor is the size of the house (square feet). Recently, a study was conducted by First City Real Estate aimed at estimating the average value of each additional square foot of space in a house. A simple random sample of 319 homes sold within the past year was collected. Here are the steps required to compute a confidence interval estimate for the regression slope coefficient: Step 1 Define the y (dependent) and x (independent) variables. The dependent variable is sales price, and the independent variable is square feet. Step 2 Obtain the sample data. The study consists of sales prices and corresponding square feet for a random sample of 319 homes. The data are in a file called First-City. Step 3 Compute the regression equation and the standard error of the slope coefficient. These computations can be performed manually using Equations 14.7 and 14.8 for the regression model and Equation 14.20 for the standard error of the slope. Alternatively, we can use Excel or Minitab to obtain these values.. Intercept (b0) Square Feet (b1). Coefficients. Standard Error. 39,838.48 75.70. 7,304.95 3.78.

(152) CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 615. The point estimate for the regression slope coefficient is $75.70. Thus, for a 1-square-foot increase in the size of a house, house prices increase by an average of $75.70. This is a point estimate and is subject to sampling error. Step 4 Construct and interpret the confidence interval estimate for the regression slope using Equation 14.22. The confidence interval estimate is b1 tsb. 1. where the degrees of freedom for the critical t is 319 2 317. The critical t for a 95% confidence interval estimate is approximately 1.97, and the interval estimate is $75.70 1.97($3.78) $75.70 $7.45 $68.25 –––––––––– $83.15 So, for a 1-square-foot increase in house size, at the 95% confidence level, we estimate that homes increase in price by an average of between $68.25 and $83.15. >>END EXAMPLE. TRY PROBLEM 14-31 (pg. 620). Chapter Outcome 5.. Regression Analysis for Prediction BUSINESS APPLICATION. Excel and Minitab. tutorials. Excel and Minitab Tutorial. PREDICTING HOSPITAL COSTS USING REGRESSION ANALYSIS. FREEDOM HOSPITAL One of the main uses of regression analysis is prediction. You may need to predict the value of the dependent variable based on the value of the independent variable. Consider the administrator for Freedom Hospital, who has been asked by the hospital’s board of directors to develop a model to predict the total charges for a geriatric patient. The file Patients contains the data that the administrator has collected. Although the Regression tool in Excel works well for generating the simple linear regression equation and other useful information, it does not provide predicted values for the dependent variable. However, both Minitab and the PHStat add-ins do provide predictions. We will illustrate the Minitab output, which is formatted somewhat differently from the Excel output but contains the same basic information. The administrator is attempting to construct a simple linear regression model, with total charges as the dependent (y) variable and length of stay as the independent (x) variable. Figure 14.20 shows the Minitab regression output. The least squares regression equation is yˆ 528 1,353 x As shown in the figure, the regression slope coefficient is significantly different from 0 (t 14.17; p-value 0.000). The model explains 59.6% of the variation in the total charges (R2 59.6%). Notice in Figure 14.20 that Minitab has rounded the regression coefficient. The more precise values are provided in the column headed “Coef” and are yˆ 527.6 1,352.80 x The administrator could use this equation to predict total charges by substituting the length of stay into the regression equation for x. For example, suppose a patient has a five-day stay. The predicted total charges are yˆ 527.6 1,352.80(5) yˆ $7,291.60 Note that this predicted value is a point estimate of the actual charges for this patient. The true charges will be either higher or lower than this amount. The administrator can develop a prediction interval, which is similar to the confidence interval estimates developed in Chapter 8..

(153) 616. CHAPTER 14. FIGURE 14.20. |. Introduction to Linear Regression and Correlation Analysis. |. Minitab Regression Output for Freedom Hospital. Minitab Instructions: 1. Open file: Patients.MTW. 2. Choose Stat Regression Regression. 3. In Response, enter the y variable column. 4. In Predictors, enter the x variable column. 5. Click OK.. Chapter Outcome 6.. Confidence Interval for the Average y, Given x The marketing manager might like a 95% confidence interval for average, or expected value, for charges of patients who stay in the hospital five days. The confidence interval for the expected value of a dependent variable, given a specific level of the independent variable, is determined by Equation 14.23. Observe that the specific value of x used to provide the prediction is denoted as xp.. Confidence Interval for E(y)|xp yˆ tsε. 2 1 (x p x ) n ∑ (x x )2. (14.23). where: yˆ Point estimate of the dependent variable t Critical value with n 2 degrees of freedom n Sample size x p Specific value of the independent variable x Mean of the independent variable observations in the sample sε Estimate of the standard error of the estimatee. Although the confidence interval estimate can be manually computed using Equation 14.23, using your computer is much easier. Both PHStat and Minitab have built-in options to generate the confidence interval estimate for the dependent variable for a given value of the x variable. Figure 14.21 shows the Minitab results when length of stay, x, equals five days. Given this length of stay, the point estimate for the mean total charges is rounded by Minitab to $7,292, and at the 95% confidence level, the administrators believe the mean total charges will be in the interval $6,790 to $7,794. Prediction Interval for a Particular y, Given x The confidence interval shown in Figure 14.21 is for the average value of y given xp. The administrator might also be interested in predicting the total charges for a particular patient with a five-day stay, rather than the average of the charges for all patients staying five days. Developing this 95% prediction interval requires only a slight modification to Equation 14.23. This prediction interval is given by Equation 14.24..

(154) |. CHAPTER 14. FIGURE 14.21. Introduction to Linear Regression and Correlation Analysis. |. Minitab Output: Freedom Hospital Confidence Interval Estimate se. √n 1. —. (xp − x ) 2 Σ(x − x)2. Point Estimate Interval Estimate 6,790 ---------------7,794. 617. Minitab Instructions: 1. Use Instructions in Figure 14.20 to get regression results. 2. Before clicking OK, select Options. 3. In Prediction Interval for New Observations, enter value(s) of x variable. 4. In Confidence level, enter 0.95. 5. Click OK. OK.. Prediction Interval for y|xp 2 1 (x x ) ˆy tsε 1 p n ∑ (x x )2. (14.24). As was the case with the confidence interval application discussed previously, the manual computations required to use Equation 14.24 can be onerous. We recommend using your computer and software such as Minitab or PHStat to find the prediction interval. Figure 14.22 shows the PHStat results. Note that the same PHStat process generates both the prediction and confidence interval estimates.. FIGURE 14.22. |. Excel 2007 (PHStat) Prediction Interval for Freedom Hospital. Excel 2007 (PHStat) Instructions: 1. Open file: Patients.xls. 2. Click on Add-Ins PHStat. 3. Select Regression Simple Linear Regression. 4. Define y variable range (Total Charges) and x variable range (Length of Stay). 5. Select Confidence and Prediction Interval – set x 5 and 95% confidence.. Point estimate 2. tα/2se 1 1(xp x) n Σ(x x)2. Prediction interval 1,545 ------------- 13,038.

(155) 618. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. Based on this regression model, at the 95% confidence level, the hospital administrators can predict total charges for any patient with length of stay of five days to be between $1,545 and $13,038. As you can see, this prediction has extremely poor precision. We doubt any hospital administrator will use a prediction interval that is so wide. Although the regression model explains a significant proportion of variation in the dependent variable, it is relatively imprecise for predictive purposes. To improve the precision, we might decrease the confidence level or increase the sample size and redevelop the model. The prediction interval for a specific value of the dependent variable is wider (less precise) than the confidence interval for predicting the average value of the dependent variable. This will always be the case, as seen in Equations 14.23 and 14.24. From an intuitive viewpoint, we should expect to come closer to predicting an average value than a single value. Note, the term (x p x )2 has a particular effect on the confidence interval determined by both Equations 14.23 and 14.24. The farther xp (the value of the independent variable used to predict y), is from x , the greater (x p x )2 becomes. Figure 14.23 shows two regression lines developed from two samples with the same set of x-values. We have made both lines pass through the same (x , y ) point; however, they have different slopes and intercepts. At xp x1, the two regression lines give predictions of y that are close to each other. However, for xp x2, the predictions of y are quite different. Thus, when xp is close to x , the problems caused by variations in regression slopes are not as great as when xp is far from x . Figure 14.24 shows the prediction intervals over the range of possible xp values. The band around the estimated regression line bends away from the regression line as xp moves in either direction from x . Chapter Outcome 7.. Common Problems Using Regression Analysis Regression is perhaps the most widely used statistical tool other than descriptive statistical techniques. Because it is so widely used, you need to be aware of the common problems encountered when the technique is employed. One potential problem occurs when decision makers apply regression analysis for predictive purposes. The conclusions and inferences made from a regression line are statistically valid only over the range of the data contained in the sample used to develop the regression line. For instance, in the Midwest Distribution example, we analyzed the performance of sales representatives with one to nine years of experience. Therefore, predicting sales levels for employees with one to nine years of experience would be justified. However, if we were to try to predict the sales performance of someone with more than nine years of experience, the relationship between sales and experience might be different. Because no observations were. FIGURE 14.23. |. Regression Lines Illustrating the Increase in Potential Variation in y _as xp Moves Farther from x. Regression Line from Sample 1 x, y yˆ1 y. yˆ2 Regression Line from Sample 2. xp = x 1. x. xp = x 2.

(156) |. CHAPTER 14. FIGURE 14.24. |. 619. Introduction to Linear Regression and Correlation Analysis. y. Confidence Intervals for y|xp and E( y)|xp yˆ = b0 + b1x. y|xp. E(y)|xp x xp. taken for experience levels beyond the 1- to 9-year range, we have no information about what might happen outside that range. Figure 14.25 shows a case in which the true relationship between sales and experience reaches a peak value at about 20 years and then starts to decline. If a linear regression equation were used to predict sales based on experience levels beyond the relevant range of data, large prediction errors could occur. A second important consideration, one that was discussed previously, involves correlation and causation. The fact that a significant linear relationship exists between two variables does not imply that one variable causes the other. Although there may be a cause-and-effect relationship, you should not infer that such a relationship is present based only on regression and/or correlation analysis. You should also recognize that a cause-and-effect relationship between two variables is not necessary for regression analysis to be an effective tool. What matters is that the regression model accurately reflects the relationship between the two variables and that the relationship remains stable. Many users of regression analysis mistakenly believe that a high coefficient of determination (R2) guarantees that the regression model will be a good predictor. You should remember that R2 is a measure of the variation in the dependent variable explained by the independent variable. Although the least squares criterion assures us that R2 will be maximized (because the sum of squares error is a minimum) for the given set of sample data, the FIGURE 14.25. | 1,200. Graph for a Sales Peak at 20 Years Sales in Thousands. 1,000 800 600 400 200 0. 0. 5. 10. 15 Years. 20. 25. 30.

(157) 620. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. value applies only to those data used to develop the model. Thus, R2 measures the fit of the regression line to the sample data. There is no guarantee that there will be an equally good fit with new data. The only true test of a regression model’s predictive ability is how well the model actually predicts. Finally, we should mention that you might find a large R2 with a large standard error. This can happen if total sum of squares is large in comparison to the SSE. Then, even though R2 is relatively large, so too is the estimate of the model’s standard error. Thus, confidence and prediction errors may simply be too wide for the model to be used in many situations. This is discussed more fully in Chapter 15.. MyStatLab. 14-3: Exercises Skill Development. Problems 14-32 and 14-33 refer to the following output for a simple linear regression model:. 14-30. The following data have been collected by an accountant who is performing an audit of paper products at a large office supply company. The dependent variable, y, is the time taken (in minutes) by the accountant to count the units. The independent variable, x, is the number of units on the computer inventory record.. Summary Output Regression Statistics Multiple R R-Square Adjusted R-Square Standard Error Observations. y 23.1 100.5 242.9 56.4 178.7 10.5 94.2 200.4 44.2 128.7 180.5 x 24 120 228 56 190 13 85 190 32 120 230. 0.1027 0.0105 0.0030 9.8909 75. Anova. a. Develop a scatter plot for these data. b. Determine the regression equation representing the data. Is the model significant? Test using a significance level of 0.10 and the p-value approach. c. Develop a 90% confidence interval estimate for the true regression slope and interpret this interval estimate. Based on this interval, could you conclude the accountant takes an additional minute to count each additional unit? 14-31. You are given the following sample data: x. y. 10 6 9 3 2 8 3. 3 7 3 8 9 5 7. a. Develop a scatter plot for these data. b. Determine the regression equation for the data. c. Develop a 95% confidence interval estimate for the true regression slope and interpret this interval estimate. d. Provide a 95% prediction interval estimate for a particular y, given xp 7.. Regression Residual Total. df. SS. MS. F. Significance F. 1 73 74. 76.124 7141.582 7217.706. 76.12 97.83. 0.778. 0.3806. Intercept x. Intercept x. Coefficents. Standard Error. t-Statistic. 4.0133 0.0943. 3.878 0.107. 1.035 0.882. p-value. Lower 95%. Upper 95%. 0.3041 0.3806. 3.715 0.119. 11.742 0.307. 14-32. Referring to the displayed regression model, what percent of variation in the y variable is explained by the x variable in the model? 14-33. Construct and interpret a 90% confidence interval estimate for the regression slope coefficient. 14-34. You are given the following summary statistics from a regression analysis: yˆ 200 150 x SSE 25.25 SSX Sum of squares X ∑ ( x − x )2 99, 645 n 18 x 52.0.

(158) CHAPTER 14. a. Determine the point estimate for y if xp 48 is used. b. Provide a 95% confidence interval estimate for the average y, given xp 48. Interpret this interval. c. Provide a 95% prediction interval estimate for a particular y, given xp 48. Interpret. d. Discuss the difference between the estimates provided in parts b and c. 14-35. The sales manager at Sun City Real Estate Company in Tempe, Arizona, is interested in describing the relationship between condo sales prices and the number of weeks the condo is on the market before it sells. He has collected a random sample of 17 low-end condos that have sold within the past three months in the Tempe area. These data are shown as follows: Weeks on the Market. Selling Price. 23 48 9 26 20 40 51 18 25 62 33 11 15 26 27 56 12. $76,500 $102,000 $53,000 $84,200 $73,000 $125,000 $109,000 $60,000 $87,000 $94,000 $76,000 $90,000 $61,000 $86,000 $70,000 $133,000 $93,000. 10 103. 8 85. 11 115. 7 73. 10 97. 11 102. 6 65. 7 75. Introduction to Linear Regression and Correlation Analysis. 621. 14-37. A regression analysis from a sample of 15 produced the following: ∑(xi x )( yi y ) 156.4 ∑(xi x )2 173.5 ∑( yi y )2 181.6 ∑( yi yˆ )2 40.621 x 13.4 and y 56.4 a. Produce the regression line. b. Determine if there is a linear relationship between the dependent and independent variables. Use a significance level of 0.05 and a p-value approach. c. Calculate a 90% confidence interval for the amount the dependent variable changes when the independent variable increases by 1 unit.. Business Applications 14-38. During the recession that began in 2008, not only did some people stop making house payments, they also stopped making payments for local government services such as trash collection and water and sewer services. The following data have been collected by an accountant who is performing an audit of account balances for a major city billing department. The population from which the data were collected represent those accounts for which the customer had indicated the balance was incorrect. The dependent variable, y, is the actual account balance as verified by the accountant. The independent variable, x, is the computer account balance.. y x. a. Develop a simple linear regression model to explain the variation in selling price based on the number of weeks the condo is on the market. b. Test to determine whether the regression slope coefficient is significantly different from 0 using a significance level equal to 0.05. c. Construct and interpret a 95% confidence interval estimate for the regression slope coefficient. 14-36. A sample of 10 yields the following data: x y. |. 15 155. 9 95. a. Provide a 95% confidence interval for the average y when xp 9.4. b. Provide a 95% confidence interval for the average y when xp 10. c. Obtain the margin of errors for both part a and part b. Explain why the margin of error obtained in part b is larger than that in part a.. 233 245. 10 12. 24 22. 56 56. 78 90. 102 103. 90 85. 200 190. 344 320. 120 120. 18 23. a. Compute the least squares regression equation. b. If the computer account balance was 100, what would you expect to be the actual account balance as verified by the accountant? c. The computer balance for Timothy Jones is listed as 100 in the computer account record. Provide a 90% interval estimate for Mr. Jones’s actual account balance. d. Provide a 90% interval estimate for the average of all customers’ actual account balances in which a computer account balance is the same as that of Mr. Jones (part c). Interpret. 14-39. Gym Outfitters sells and services exercise equipment such as treadmills, ellipticals, and stair climbers to gymnasiums and recreational centers. The company’s management would like to determine if there is a relationship between the number of minutes required to complete a routine service call and the number of machines serviced. A random sample of 12 records revealed the following information concerning the.

(159) 622. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. number of machines serviced and the time (in minutes) to complete the routine service call: Number of Machines. Service Time (minutes). 11 8 9 10 7 6 8 4 10 5 5 12. 115 60 80 90 55 65 70 33 95 50 40 110. a. Estimate the least squares regression equation. b. If a gymnasium had six machines, how many minutes should Gym Outfitters expect a routine service call to require? c. Provide a 90% confidence interval for the average amount of time required to complete a routine service call when the number of machines being serviced is nine. d. Provide a 90% prediction interval for the time required to complete a particular routine service call for a gymnasium that has seven machines. 14-40. The National Association of Realtors (NAR) ExistingHome Sales Series provides a measurement of the residential real estate market. On or about the 25th of each month, NAR releases statistics on sales and prices of condos and co-ops, in addition to existing singlefamily homes, for the nation and the four regions. The data presented here indicate the number of (in thousands) existing-home sales as well as condo/ co-op sales: Single-Family Condo/Co-op Sales Sales. a. Construct the regression equation that would predict the number of condo/co-op sales using the number of single-family sales. b. One might conjecture that these two markets (single-family sales and condo/co-op sales) would be competing for the same audience. Therefore, we would expect that as the number of single-family sales increases, the number of condo/co-op sales would decrease. Conduct a hypothesis test to determine this using a significance level of 0.05. c. Provide a prediction interval for the number of condo/co-op sales when the number of singlefamily sales is 6,000 (thousands). Use a confidence level of 95%. 14-41. J.D. Power and Associates conducts an initial quality study (IQS) each year to determine the quality of newly manufactured automobiles. IQS measures 135 attributes across nine categories, including ride/ handling/braking, engine and transmission, and a broad range of quality problem symptoms reported by vehicle owners. The 2008 IQS was based on responses from more than 62,000 purchasers and lessees of new 2008 model-year cars and trucks, who were surveyed after 90 days of ownership. The data given here portray the industry average of the number of reported problems per 100 vehicles for 1998–2008.. Year 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 Problems 176 167 154 147 133 133 125 121 119 118 118. a. Construct a scatter plot of the number of reported problems per 100 vehicles as a function of the year. b. Determine if the average number of reported problems per 100 vehicles declines from year to year. Use a significance level of 0.01 and a p-value approach. c. Assume the relationship between the number of reported problems per 100 vehicles and the year continues into the future. Provide a 95% prediction interval for the initial quality industry average of the number of reported problems per 100 vehicles for 2010.. Year. Month. 2009. Apr May. 6,270 6,230. 895 912. Jun. 6,330. 943. Computer Database Exercises. Jul. 6,220. 914. Aug. 6,280. 928. Sept. 6,290. 908. Oct. 6,180. 867. Nov. 6,150. 876. Dec. 5,860. 885. Jan. 5,790. 781. Feb. 6,050. 852. Mar. 6,040. 862. Apr. 5,920. 839. 14-42. A manufacturer produces a wash-down motor for the food service industry. The company manufactures the motors to order by modifying a base model to meet the specifications requested by the customer. The motors are produced in a batch environment with the batch size equal to the number ordered. The manufacturer has recently sampled 27 customer orders. The motor manufacturer would like to determine if there is a relationship between the cost of producing the order and the order size so that it could estimate the cost of producing a particular size order. The sampled data are contained in the file Washdown Motors.. 2010.

(160) CHAPTER 14. a. Use the sample data to estimate the least squares regression model. b. Provide an interpretation of the regression coefficients. c. Test the significance of the overall regression model using a significance level of 0.01. d. The company has just received an order for 30 motors. Use the regression model developed in part a to estimate the cost of producing this particular order. e. Referring to part d, what is the 90% confidence interval for an average cost of an order of 30 motors? 14-43. Each month, the Bureau of Labor Statistics (BLS) of the U.S. Department of Labor announces the total number of employed and unemployed persons in the United States for the previous month. At the same time, it also publishes the inflation rate, which is the rate of change in the price of goods and services from one month to the next. It seems quite plausible that there should be some relationship between these two indicators. The file entitled CPI provides the monthly unemployment and inflation rates for the period 2000–2005. a. Construct a scatter plot of the unemployment rate versus inflation rate for the period 2000–2005. Describe the relationship that appears to exist between these two variables. b. Produce a 95% prediction interval for the unemployment rate for the maximum inflation rate in the period 2000–2005. Interpret the interval.. |. Introduction to Linear Regression and Correlation Analysis. 623. c. Produce a 95% prediction interval for the unemployment rate when the inflation rate is 0.00. d. Which of the prediction intervals in parts b and c has the larger margin of error? Explain why this is the case. 14-44. The National Highway Transportation Safety Administration’s National Center for Statistics and Analysis released its Vehicle Survivability Travel Mileage Schedules in January 2006. One item investigated was the relationship between the annual vehicle miles traveled (VMT) as a function of vehicle age for passenger cars up to 25 years old. The VMT data were collected by asking consumers to estimate the number of miles driven in a given year. The data were collected over a 14-month period, starting in March 2001 and ending in May 2002. The file entitled Miles contains this data. a. Produce a regression equation modeling the relationship between VMT and the age of the vehicle. Estimate how many more annual vehicle miles would be traveled for a vehicle that is 10 years older than another vehicle. b. Provide a 90% confidence interval estimate for the average annual vehicle miles traveled when the age of the vehicle is 15 years. c. Determine if it is plausible for a vehicle that is 10 years old to travel 12,000 miles in a year. Support your answer with statistical reasoning. END EXERCISES 14-3.

(161) 624. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. Visual Summary Chapter 14: Although some business situations involve only one variable, others require decision makers to consider the relationship between two or more variables. In analyzing the relationship between two variables, there are two basic models that we can use. The regression model covered in this chapter is referred to as simple linear regression. This relationship between x and y assumes that the x variable takes on known values specifically selected from all the possible values for x. The y variable is a random variable observed at the different levels of x. Testing that a linear relationship exists between the dependent and independent variables is performed using the standard statistical procedures of hypothesis testing and confidence intervals. A second model is referred to as the correlation model and is used in applications in which both the x and y variables are considered to be random variables. These two models arise in practice by the way in which the data are obtained. Regression analysis and correlation analysis are two of the most often applied statistical tools for business decision making.. 14.1 Scatter Plots and Correlation (pg. 580–589) Summary Decision-making situations that call for understanding the relationship between two quantitative variables are aided by the use of scatter plots, or scatter diagrams. A scatter plot is a two-dimensional plot showing the values for the joint occurrence of two quantitative variables. The scatter plot may be used to graphically represent the relationship between two variables. A numerical quantity that measures the strength of the linear relationship between two variables is labeled the correlation coefficient. The sample correlation coefficient, r, can range from a perfect positive correlation, +1.0, to a perfect negative correlation, –1.0. A test based upon the t-distribution can determine whether the population correlation coefficient is significantly different from 0 and, therefore, whether a linear relationship exists between the dependent and independent variables.. Outcome 1. Calculate and interpret the correlation between two variables. Outcome 2. Determine whether the correlation is significant.. 14.2 Simple Linear Regression Analysis (pg. 589–612) Summary The statistical technique we use to analyze the relationship between the dependent variable and the independent variable is known as regression analysis. When the relationship between the dependent variable and the independent variable is linear, the technique is referred as simple linear regression. The population regression model is determined by three values known as the population regression coefficients: (1) the y-intercept, (2) the slope of the regression line, and (3) the random error term. The criterion used to determine the best estimate of the population regression line is known as the least squares criterion. It chooses values for the y-intercept and slope that will produce the smallest possible sum of squared prediction errors. Testing that the population’s slope coefficient is equal to zero provides a method to determine if there is no linear relationship between the dependent and independent variables. The test for the simple linear regression is equivalent to the test that the correlation coefficient is significant. A less involved procedure that indicates the goodness of fit of the regression equation to the data is known as the coefficient of determination. Simple linear regression, which is introduced in this chapter, is one of the most often applied statistical tools by business decision makers for analyzing the relationship between two variables. Outcome 3. Calculate the simple linear regression equation for a set of data and know the basic assumptions behind regression analysis. Outcome 4. Determine whether a regression model is significant.. Conclusion 14.3 Uses for Regression Analysis (pg. 612–623) Summary Regression analysis is a statistical tool that is used for two main purposes: description and prediction. Description is accomplished by describing the plausible values the population slope coefficient may attain. To provide this, a confidence interval estimator of the population slope is employed. There are many other situations in which the prime purpose of regression analysis is description. Market researchers also use regression analysis, among other techniques, in an effort to describe the factors that influence the demand for their products. The analyst may wish to provide a confidence interval for the expected value of a dependent variable, given a specific level of the independent variable. This is obtained by the use of a confidence interval for the average y, given x. Another confidence interval is available in the case that the analyst wishes to predict a particular y for a given x. This interval estimator is called a prediction interval. Any procedure in statistics is valid only if the assumptions it is built upon is valid. This is particularly true in regression analysis. Therefore, before using a regression model for description or prediction, you should check to see if the assumptions associated with linear regression analysis are valid. Residual analysis is the procedure that is used for that purpose. Outcome 5. Recognize regression analysis applications for purposes of description and prediction. Outcome 6. Calculate and interpret confidence intervals for the regression analysis. Outcome 7. Recognize some potential problems if regression analysis is used incorrectly.. Correlation and regression analysis are two of the most frequently used statistical techniques by business decision makers. This chapter has introduced the basics of these two topics. The discussion of regression analysis has been limited to situations in which you have one dependent variable and one independent variable. Chapter 15 will extend the discussion of regression analysis by showing how two or more independent variables are included in the analysis. The focus of that chapter will be on building a model for explaining the variation in the dependent variable. However, the basic concepts presented in this chapter will be carried forward..

(162) CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. 625. Equations (14.1) Sample Correlation Coefficient pg. 580. (14.13) Sum of Squares Error pg. 601. ∑ (x x )( y y ). r. n. SSE . [∑ (x x )2 ][∑ ( y y )2 ]. ∑ ( yi yˆi )2 i1. (14.2) or the algebraic equivalent: pg. 581. r. (14.14) Sum of Squares Regression pg. 602. n ∑ xy ∑ x ∑y. n. SSR . [ n( ∑ x 2 ) ( ∑ x)2 ][ n( ∑y 2 ) ( ∑y)2 ]. (14.3) Test Statistic for Correlation pg. 584. t. r. i1. (14.15) Coefficient of Determination, R2 pg. 602. df n 2. 1 r 2 n2. ∑ ( yˆi y )2. R2 . SSR SST. (14.16) Coefficient of Determination for the Single Independent (14.4) Simple Linear Regression Model (Population Model) pg. 590. Variable Case pg. 602. R2 r2. y b0 b1x (14.5) Estimated Regression Model (Sample Model) pg. 592. yˆ b0 b1 x (14.6) Least Squares Equations pg. 594. b1 . ∑ (x x )( y y ) ∑ (x x )2. (14.17) Test Statistic for Significance of the Coefficient of Determination pg. 603. F. SSR / 1 SSE / (n 2). df (D1 1, D2 n − 2). (14.18) Simple Regression Standard Error of the Slope Coefficient (Population) pg. 604. (14.7) or the algebraic equivalent: pg. 594. ∑x∑y ∑ xy n b1 ( ∑ x )2 ∑ x2 n. b 1. ε ∑(x x )2. (14.19) Simple Regression Estimator for the Standard Error of the Estimate pg. 604. (14.8) and pg. 594. b0 y b1 x. sε . SSE n 2. (14.9) Sum of Squared Errors pg. 596. SSE ∑ y 2 b0 ∑ y b1 ∑ xy. (14.20) Simple Regression Estimator for the Standard Error of the Slope pg. 605. (14.10) Sum of Residuals pg. 596. sb . n. ∑. 1. ( yi yˆi ) 0. i1. Significance of the Slope pg. 607. n. ∑ ( yi yˆi )2 i1. (14.12) Total Sum of Squares pg. 600 n. SST . ∑(x x )2. (14.21) Simple Linear Regression Test Statistic for Test of the. (14.11) Sum of Squared Residuals (Errors) pg. 597. SSE . sε. ∑ ( yi y )2 i1. t. b1 1 sb. df n 2. 1. (14.22) Confidence Interval Estimate for the Regression Slope, Simple Linear Regression pg. 614. b1 tsb. 1.

(163) 626. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. or equivalently,. b1 t. (14.24) Prediction Interval for y|xp pg. 617. sε ∑(x x )2. df n 2. 2 1 (x x ) ˆy tsε 1 p n ∑ (x x )2. (14.23) Confidence Interval for E(y)|xp pg. 616. yˆ tsε. 2 1 (x p x ) n ∑ (x x )2. Key Terms Coefficient of determination pg. 602 Correlation coefficient pg. 580 Least squares criterion pg. 592. Regression slope coefficient pg. 591 Residual pg. 592. Scatter plot pg. 580 Simple linear regression pg. 589. Chapter Exercises Conceptual Questions 14-45. A statistics student was recently working on a class project that required him to compute a correlation coefficient for two variables. After careful work he arrived at a correlation coefficient of 0.45. Interpret this correlation coefficient for the student who did the calculations. 14-46. Referring to the previous problem, another student in the same class computed a regression equation relating the same two variables. The slope of the equation was found to be 0.735. After trying several times and always coming up with the same result, she felt that she must have been doing something wrong since the value was negative and she knew that this could not be right. Comment on this student’s conclusion. 14-47. If we select a random sample of data for two variables and, after computing the correlation coefficient, conclude that the two variables may have zero correlation, can we say that there is no relationship between the two variables? Discuss. 14-48. Discuss why prediction intervals that attempt to predict a particular y-value are less precise than confidence intervals for predicting an average y. 14-49. Consider the two following scenarios: a. The number of new workers hired per week in your county has a high positive correlation with the average weekly temperature. Can you conclude that an increase in temperature causes an increase in the number of new hires? Discuss. b. Suppose the stock price and the common dividends declared for a certain company have a high positive correlation. Are you safe in concluding on the basis of the correlation coefficient that an increase in the common dividends declared causes an increase in. MyStatLab the stock price? Present other reasons than the correlation coefficient that might lead you to conclude that an increase in common dividends declared causes an increase in the stock price. 14-50. Consider the following set of data: x y. 48 47. 27 23. 34 31. 24 20. 49 50. 29 48. 39 47. 38 47. 46 42. 32 47. a. Calculate the correlation coefficient of these two variables. b. Multiply each value of the variable x by 5 and add 10 to the resulting products. Now multiply each value of the variable y by 3 and subtract 7 from the resulting products. Finally, calculate the correlation coefficient of the new x and y variables. c. Describe the principle that the example developed in parts a and b demonstrates. 14-51. Go to the library and locate an article in a journal related to your major (Journal of Marketing, Journal of Finance, etc.) that uses linear regression. Discuss the following: a. How the author chose the dependent and independent variables b. How the data were gathered c. What statistical tests the author used d. What conclusions the analysis allowed the author to draw. Business Applications 14-52. The Smithfield Organic Milk Company recently studied a random sample of 30 of its distributors and found the correlation between sales and advertising dollars to be 0.67..

(164) CHAPTER 14. a. Is there a significant linear relationship between sales and advertising? If so, is it fair to conclude that advertising causes sales to increase? b. If a regression model were developed using sales as the dependent variable and advertising as the independent variable, determine the proportion of the variation in sales that would be explained by its relationship to advertising. Discuss what this says about the usefulness of using advertising to predict sales. 14-53. A previous exercise discussed the relationship between the average college tuition (in 2003 dollars) for private and public colleges. The data indicated in the article follow:. Period Private. 1983–1984 9,202. 1988–1989 12,146. 1993–1994 13,844. Public. 2,074. 2,395. 3,188. Period. 1998–1999. 2003–2004. 2008–2009. Private. 16,454. 19,710. 21,582. Public. 3,632. 4,694. 5,652. a. Construct the regression equation that would predict the average college tuition for private colleges using that of the public colleges. b. Determine if there is a linear tendency for the average college tuition for private colleges to increase when the average college tuition for public colleges increases. Use a significance level of 0.05 and a p-value approach. c. Provide a 95% confidence interval for the average college tuition for private colleges when the average college tuition for public colleges reaches $7,000. d. Is it plausible that the average college tuition for private colleges would be larger than $35,000 when the average college tuition for public colleges reaches $7,000? Support your assertion with statistical reasoning. 14-54. The Farmington City Council recently commissioned a study of park users in their community. Data were collected on the age of the person surveyed and the amount of hours he or she spent in the park in the past month. The data collected were as follows:. |. Introduction to Linear Regression and Correlation Analysis. 627. a. Draw a scatter plot for these data and discuss what, if any, relationship appears to be present between the two variables. b. Compute the correlation coefficient between age and the amount of time spent in the park. Provide an explanation to the Farmington City Council of what the correlation measures. c. Test to determine whether the amount of time spent in the park decreases with the age of the park user. Use a significance level of 0.10. Use a p-value approach to conduct this hypothesis test. 14-55. At State University, a study was done to establish whether a relationship exists between students’ graduating grade point average (GPA) and the SAT verbal score when the student originally entered the university. The sample data are reported as follows: GPA 2.5 3.2 3.5 2.8 3.0 2.4 3.4 2.9 2.7 3.8 SAT 640 700 550 540 620 490 710 600 505 710. a. Develop a scatter plot for these data and describe what, if any, relationship exists between the two variables, GPA and SAT score. b. (1) Compute the correlation coefficient. (2) Does it appear that the success of students at State University is related to the SAT verbal scores of those students? Conduct a statistical procedure to answer this question. Use a significance level of 0.01. c. (1) Compute the regression equation based on these sample data if you wish to predict the university GPA using the student SAT score. (2) Interpret the regression coefficients. 14-56. An American airline company recently performed a customer survey in which it asked a random sample of 100 passengers to indicate their income and the total cost of the airfares they purchased for pleasure trips during the past year. A regression model was developed for the purpose of determining whether income could be used as a variable to explain the variation in the total cost of airfare on airlines in a year. The following regression results were obtained: yˆ 0.25 0.0150 x sε 721.44 R 2 0.65 sb 0.0000122 1. Time in Park. Age. Time in Park. Age. 7.2 3.5 6.6 5.4 1.5 2.3. 16 15 28 16 29 38. 4.4 8.8 4.9 5.1 1.0. 48 18 24 33 56. a. Produce an estimate of the maximum and minimum differences in the amounts allocated to purchase airline tickets by two families that have a difference of $20,000 in family income. Assume that you wish to use a 90% confidence level. b. Can the intercept of the regression equation be interpreted in this case, assuming that no one who was surveyed had an income of 0 dollars? Explain..

(165) 628. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. c. Use the information provided to perform an F-test for the significance of the regression model. Discuss your results, assuming the test is performed at the significance level of 0.05. 14-57. One of the advances that have helped to diminish carpal tunnel syndrome is ergonomic keyboards. The ergonomic keyboards may also increase typing speed. Ten administrative assistants were chosen to type on both standard and ergonomic keyboards. The resulting typing speeds follow: Ergonomic: 69 70 Standard:. 80 68. 60 54. 71 56. 73 58. 64 64. 63 62. 70 51. 63 64. 74 53. a. Produce a scatter plot of the typing speed of administrative assistants using ergonomic and standard keyboards. Does there appear to be a linear relationship between these two variables? Explain your response. b. Calculate the correlation coefficient of the typing speed of administrative assistants using ergonomic and standard keyboards. c. Conduct a hypothesis test to determine if a positive correlation exists between administrative assistants using ergonomic and standard keyboards. Use a significance level of 0.05. 14-58. A company is considering recruiting new employees from a particular college and plans to place a great deal of emphasis on the student’s college GPA. However, the company is aware that not all schools have the same grading standards, so it is possible that a student at this school might have a lower (or higher) GPA than a student from another school, yet really be on par with the other student. To make this comparison between schools, the company has devised a test that it has administered utilizing a sample size of 400 students. With the results of the test, it has developed a regression model that it uses to predict student GPA. The following equation represents the model: yˆ 1.0 0.028 x The R2 for this model is 0.88 and the standard error of the estimate is 0.20, based on the sample data used to develop the model. Note that the dependent variable is the GPA and the independent variable is test score, where this score can range from 0 to 100. For the sample data used to develop the model, the following values are known: y 2.76 x 68 ∑ (x x )2 148,885.73 a. Based on the information contained in this problem, can you conclude that as the test score increases,. the GPA will also increase, using a significance level of 0.05? b. Suppose a student interviews with this company, takes the company test, and scores 80 correct. What is the 90% prediction interval estimate for this student’s GPA? Interpret the interval. c. Suppose the student in part b actually has a 2.90 GPA at this school. Based on this evidence, what might be concluded about this person’s actual GPA compared with other students with the same GPA at other schools? Discuss the limitations you might place on this conclusion. d. Suppose a second student with a 2.45 GPA took the test and scored 65 correct. What is the 90% prediction interval for this student’s “real” GPA? Interpret.. Computer Database Exercises 14-59. Although the Jordan Banking System, a smaller regional bank, generally avoided the subprime mortgage market and consequently did not take money from the Federal Troubled Asset Relief Program (TARP), its board of directors has decided to look into all aspects of revenues and costs. One service the bank offers is free checking, and the board is interested in whether the costs of this service are offset by revenues from interest earned on the deposits. One aspect in studying checking accounts is to determine whether changes in average checking account balance can be explained by knowing the number of checks written per month. The sample data selected are contained in the data file named Jordan. a. Draw a scatter plot for these data. b. Develop the least squares regression equation for these data. c. Develop the 90% confidence interval estimate for the change in the average checking account balance when a person who formerly wrote 25 checks a month doubles the number of checks used. d. Test to determine if an increase in the number of checks written by an individual can be used to predict the checking account balance of that individual. Use a 0.05. Comment on this result and the result of part c. 14-60. An economist for the state government of Mississippi recently collected the data contained in the file called Mississippi on the percentage of people unemployed in the state at randomly selected points in time over the past 25 years and the interest rate of Treasury bills offered by the federal government at that point in time. a. (1) Develop a plot showing the relationship between the two variables. (2) Describe the relationship as being either linear or curvilinear. b. (1) Develop a simple linear regression model with unemployment rate as the dependent variable. (2) Write a short report describing the model and indicating the important measures..

(166) CHAPTER 14. 14-61. Terry Downes lost his job as an operations analyst last year in a company downsizing effort. In looking for job opportunities Terry remembered reading an article in Fortune stating companies were looking to outsource activities they were currently doing that were not part of their core competence. Terry decided no company’s core competence involved cleaning its facilities, and so using his savings, he started a cleaning company. In a surprise to his friends, Terry’s company proved to be successful. Recently, Terry decided to survey customers to determine how satisfied they are with the work performed. He devised a rating scale between 0 and 100, with 0 being poor and 100 being excellent service. He selected a random sample of 14 customers and asked them to rate the service. He also recorded the number of worker hours spent in the customer’s facility. These data are in the data file named Downes. a. (1) Draw a scatter plot showing these two variables, with the y variable on the vertical axis and the x variable on the horizontal axis. (2) Describe the relationship between these two variables. b. (1) Develop a linear regression model to explain the variation in the service rating. (2) Write a short report describing the model and showing the results of pertinent hypothesis tests, using a significance level of 0.10. 14-62. A previous problem discussed the College Board changing the SAT test between 2005 and 2006. The class of 2005 was the last to take the former version of the SAT featuring math and verbal sections. The file entitled MathSAT contains the math SAT scores for the interval 1967 to 2005. One point of interest concerning the data is the relationship between the average scores of male and female students. a. Produce a scatter plot depicting the relationship between the average math SAT score of males (the dependent variable) and females (independent variable) over the period 1967 to 2005. Describe the relationship between these two variables. b. Is there a linear relationship between the average score for males and females over the period 1967 to 2005? Use a significance level of 0.05 and the p-value approach to determine this. 14-63. The housing market in the United States saw a major decrease in value between 2007 and 2008. The file entitled House contains the data on average and median housing prices between November 2007 and November 2008. Assume the data can be viewed as samples of the relevant populations. a. Determine the linear relationship that could be used to predict the average selling prices for November 2007 using the median selling prices for that period.. |. Introduction to Linear Regression and Correlation Analysis. 629. b. Conduct a hypothesis test to determine if the median selling prices for November 2007 could be used to determine the average selling prices in that period. Use a significance level of 0.05 and the p-value approach to conduct the test. c. Provide an interval estimate of the average selling price of homes in November 2007 if the median selling price was $195,000. Use a 90% confidence interval. 14-64. The Grinfield Service Company’s marketing director is interested in analyzing the relationship between her company’s sales and the advertising dollars spent. In the course of her analysis, she selected a random sample of 20 weeks and recorded the sales for each week and the amount spent on advertising. These data are contained in the data file called Grinfield. a. Identify the independent and dependent variables. b. Draw a scatter plot with the dependent variable on the vertical axis and the independent variable on the horizontal axis. c. The marketing director wishes to know if increasing the amount spent on advertising increases sales. As a first attempt, use a statistical test that will provide the required information. Use a significance level of 0.025. On careful consideration, the marketing manager realizes that it takes a certain amount of time for the effect of advertising to register in terms of increased sales. She therefore asks you to calculate a correlation coefficient for sales of the current week against amount of advertising spent in the previous week and to conduct a hypothesis test to determine if, under this model, increasing the amount spent on advertising increases sales. Again, use a significance level of 0.025. 14-65. Refer to the Grinfield Service Company discussed in Problem 14-64. a. Develop the least squares regression equation for these variables. Plot the regression line on the scatter plot. b. Develop a 95% confidence interval estimate for the increase in sales resulting from increasing the advertising budget by $50. Interpret the interval. c. Discuss whether it is appropriate to interpret the intercept value in this model. Under what conditions is it appropriate? Discuss. d. Develop a 90% confidence interval for the mean sales amount achieved during all weeks in which advertising is $200 for the week. e. Suppose you are asked to use this regression model to predict the weekly sales when advertising is to be set at $100. What would you reply to the request? Discuss..

(167) 630. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. Case 14.1 A & A Industrial Products Alex Court, the cost accountant for A & A Industrial Products, was puzzled by the repair cost analysis report he had just reviewed. This was the third consecutive report where unscheduled plant repair costs were out of line with the repair cost budget allocated to each plant. A & A budgets for both scheduled maintenance and unscheduled repair costs for its plants’ equipment, mostly large industrial machines. Budgets for scheduled maintenance activities are easy to estimate and are based on the equipment manufacturer’s recommendations. The unscheduled repair costs, however, are harder to determine. Historically, A & A Industrial Products has estimated unscheduled maintenance using a formula based on the average number of hours of operation between major equipment failures at a plant. Specifically, plants were given a budget of $65.00 per hour of operation between major failures. Alex had arrived at this amount by dividing aggregate historical repair costs by the total number of hours between failures. Then plant averages would be used to estimate unscheduled repair cost. For example, if a plant averaged 450 hours of run time before a major repair occurred, the plant would be allocated a repair budget of 450 × $65 $29,250 per repair. If the plant was expected to be in operation 3,150 hours per year, the company would anticipate seven unscheduled repairs (3,150/450) annually and budget $204,750 for annual unscheduled repair costs. Alex was becoming more and more convinced that this approach was not working. Not only was upper management upset about the variance between predicted and actual costs of repair, but plant managers believed that the model did not account for potential differences among the company’s three plants when allocating dollars for unscheduled repairs. At the weekly management meeting, Alex was informed that he needed to analyze his cost projections further. and produce a report that provided a more reliable method for predicting repair costs. On leaving the meeting, Alex had his assistant randomly pull 64 unscheduled repair reports. The data are in the file A & A Costs. The management team is anxiously waiting for Alex’s analysis.. Required Tasks: 1. Identify the major issue(s) of the case. 2. Analyze the overall cost allocation issues by developing a scatterplot of Cost v. Hours of Operation. Which variable, cost or hours of operation, should be the dependent variable? Explain why. 3. Fit a linear regression equation to the data. 4. Explain how the results of the linear regression equation could be used to develop a cost allocation formula. State any adjustments or modification you have made to the regression output to develop a cost allocation formula that can be used to predict repair costs. 5. Sort the data by plant. 6. Fit a linear regression equation to each plant’s data. 7. Explain how the results of the individual plant regression equations can help the manager determine whether a different linear regression equation could be used to develop a cost allocation formula for each plant. State any adjustments or modification you have made to the regression output to develop a cost allocation formula. 8. Based on the individual plant regression equations determine whether there is reason to believe there are differences among the repair costs of the company’s three plants. 9. Summarize your analysis and findings in a report to the company’s manager.. Case 14.2 Sapphire Coffee—Part 1 Jennie Garcia could not believe that her career had moved so far so fast. When she left graduate school with a master’s degree in anthropology, she intended to work at a local coffee shop until something else came along that was more related to her academic background. But after a few months she came to enjoy the business, and in a little over a year she was promoted to store manager. When the company for whom she worked continued to grow, Jennie was given oversight of a few stores. Now, eight years after she started as a barista, Jennie was in charge of operations and planning for the company’s southern region. As a part of her responsibilities, Jennie tracks store revenues and forecasts coffee demand. Historically, Sapphire Coffee would base its demand forecast on the number of stores, believing that each store sold approximately the same amount of coffee. This approach seemed to work well when the company had shops of similar size and layout, but as the company grew, stores became more varied. Now, some stores had drive-thru windows, a feature that. top management added to some stores believing that it would increase coffee sales for customers who wanted a cup of coffee on their way to work but who were too rushed to park and enter the store to place an order. Jennie noticed that weekly sales seemed to be more variable across stores in her region and was wondering what, if anything, might explain the differences. The company’s financial vice president had also noticed the increased differences in sales across stores and was wondering what might be happening. In an e-mail to Jennie he stated that weekly store sales are expected to average $5.00 per square foot. Thus, a 1,000-square-foot store would have average weekly sales of $5,000. He asked that Jennie analyze the stores in her region to see if this rule of thumb was a reliable measure of a store’s performance. The vice president of finance was expecting the analysis to be completed by the weekend. Jennie decided to randomly select weekly sales records for 53 stores. The data are in the file Sapphire Coffee-1. A full analysis needs to be sent to the corporate office by Friday..

(168) CHAPTER 14. Required Tasks: 1. Identify the major issue(s) of the case. 2. Develop a scatter plot of the variables Store Size and Weekly Sales. Identify the dependent variable. Briefly describe the relationship between the two variables. 3. Fit a linear regression equation to the data. Does the variable Store Size explain a significant amount of the variation in Weekly Sales?. |. Introduction to Linear Regression and Correlation Analysis. 631. 4. Based on the estimated regression equation does it appear the $5.00 per square foot weekly sales expectation the company currently uses is a valid one? 5. Summarize your analysis and findings in a report to the company’s vice president of finance.. Case 14.3 Alamar Industries While driving home in northern Kentucky at 8:00 P.M., Juan Alamar wondered whether his father had done him any favor by retiring early and letting him take control of the family machine tool–restoration business. When his father started the business of overhauling machine tools (both for resale and on a contract basis), American companies dominated the tool manufacturing market. During the past 30 years, however, the original equipment industry had been devastated, first by competition from Germany and then from Japan. Although foreign competition had not yet invaded the overhaul segment of the business, Juan had heard about foreign companies establishing operations on the West Coast. The foreign competitors were apparently stressing the highquality service and operations that had been responsible for their great inroads into the original equipment market. Last week Juan attended a daylong conference on total quality management that had discussed the advantages of competing for the Baldrige Award, the national quality award established in 1987. Presenters from past Baldrige winners, including Xerox, Federal Express, Cadillac, and Motorola, stressed the positive effects on their companies of winning and said similar effects would be possible for any company. This assertion of only positive effects was what Juan questioned. He was certain that the effect on his remaining free time would not be positive. The Baldrige Award considers seven corporate dimensions of quality. Although the award is not based on a numerical score, an overall score is calculated. The maximum score is 1,000, with. most recent winners scoring about 800. Juan did not doubt the award was good for the winners, but he wondered about the nonwinners. In particular, he wondered about any relationship between attempting to improve quality according to the Baldrige dimensions and company profitability. Individual company scores are not released, but Juan was able to talk to one of the conference presenters, who shared some anonymous data, such as companies’ scores in the year they applied, their returns on investment (ROIs) in the year applied, and returns on investment in the year after application. Juan decided to commit the company to a total quality management process if the data provided evidence that the process would lead to increased profitability. Baldrige Score. ROI Application Year. ROI Next Year. 470 520 660 540 600 710 580 600 740 610 570 660. 11% 10 14 12 15 16 11 12 16 11 12 17. 13% 11 15 12 16 16 12 13 16 14 13 19. Case 14.4 Continental Trucking Norm Painter is the newly hired cost analyst for Continental Trucking. Continental is a nationwide trucking firm, and until recently, most of its routes were driven under regulated rates. These rates were set to allow small trucking firms to earn an adequate profit, leaving little incentive to work to reduce costs by efficient management techniques. In fact, the greatest effort was made to try to influence regulatory agencies to grant rate increases.. A recent rash of deregulation has made the long-distance trucking industry more competitive. Norm has been hired to analyze Continental’s whole expense structure. As part of this study, Norm is looking at truck repair costs. Because the trucks are involved in long hauls, they inevitably break down. In the past, little preventive maintenance was done, and if a truck broke down in the middle of a haul, either a replacement tractor was sent or an independent contractor finished the haul. The truck was then repaired at the nearest local shop. Norm is sure this procedure has.

(169) 632. CHAPTER 14. |. Introduction to Linear Regression and Correlation Analysis. led to more expense than if major repairs had been made before the trucks failed. Norm thinks that some method should be found for determining when preventive maintenance is needed. He believes that fuel consumption is a good indicator of possible breakdowns; as trucks begin to run badly, they will consume more fuel. Unfortunately, the major determinants of fuel consumption are the weight of a truck and headwinds. Norm picks a sample of a single truck model and gathers data relating fuel consumption to truck weight. All trucks in the sample are in good condition. He separates the data by direction of the haul, realizing that winds tend to blow predominantly out of the west. Although he can rapidly gather future data on fuel consumption and haul weight, now that Norm has these data, he is not quite sure what to do with them.. East-West Haul. West-East Haul. Miles/Gallon. Haul Weight. Miles/Gallon. 4.1 4.7 3.9 4.3 4.8 5.1 4.3 4.6 5.0. 41,000 lb 36,000 37,000 38,000 32,000 37,000 46,000 35,000 37,000. 4.3 4.5 4.8 5.2 5.0 4.7 4.9 4.5 5.2 4.8. Haul Weight 40,000 lb 37,000 36,000 38,000 35,000 42,000 37,000 36,000 42,000 41,000. References Berenson, Mark L., and David M. Levine, Basic Business Statistics: Concepts and Applications, 11th ed. (Upper Saddle River, NJ: Prentice Hall, 2009). Cryer, Jonathan D., and Robert B. Miller, Statistics for Business: Data Analysis and Modeling, 2nd ed. (Belmont, CA: Duxbury Press, 1994). Dielman, Terry E., Applied Regression Analysis—A Second Course in Business and Economic Statistics, 4th ed. (Belmont, CA: Duxbury Press, 2005). Draper, Norman R., and Harry Smith, Applied Regression Analysis, 3rd ed. (New York City: John Wiley and Sons, 1998). Frees, Edward W., Data Analysis Using Regression Models: The Business Perspective (Upper Saddle River, NJ: Prentice Hall, 1996). Kleinbaum, David G., Lawrence L. Kupper, Azhar Nizam, and Keith E. Muller, Applied Regression Analysis and Multivariable Methods, 4th ed. (Florence, KY: Cengage Learning, 2008). Kutner, Michael H., Christopher J. Nachtsheim, John Neter, and William Li, Applied Linear Statistical Models, 5th ed. (New York: McGraw-Hill Irwin, 2005). Microsoft Excel 2007 (Redmond, WA: Microsoft Corp., 2007). Minitab for Windows Version 15 (State College, PA: Minitab, 2007)..

(170) chapter 15. Chapter 15 Quick Prep Links. • Make sure you review the discussion about • In Chapter 14, review the steps involved in. scatter plots in Chapters 2 and 14. hypothesis using the t-distribution in Chapter 9. • Review the concepts associated with simple linear regression and correlation analysis • Review confidence intervals discussed in presented in Chapter 14. Chapter 8.. • Review the methods for testing a null. using the t-distribution for testing the significance of a correlation coefficient and a regression coefficient.. Multiple Regression Analysis and Model Building 15.1 Introduction to Multiple Regression Analysis (pg. 634–653). Outcome 1. Understand the general concepts behind model building using multiple regression analysis. Outcome 2. Apply multiple regression analysis to business decision-making situations. Outcome 3. Analyze the computer output for a multiple regression model and interpret the regression results. Outcome 4. Test hypotheses about the significance of a multiple regression model and test the significance of the independent variables in the model. Outcome 5. Recognize potential problems when using multiple regression analysis and take steps to correct the problems.. 15.2 Using Qualitative Independent Variables. Outcome 6. Incorporate qualitative variables into a regression model by using dummy variables.. (pg. 654–661). 15.3 Working with Nonlinear Relationships (pg. 661–678). Outcome 7. Apply regression analysis to situations where the relationship between the independent variable(s) and the dependent variable is nonlinear.. 15.4 Stepwise Regression. Outcome 8. Understand the uses of stepwise regression.. (pg. 678–689). 15.5 Determining the Aptness of the Model (pg. 689–699). Outcome 9. Analyze the extent to which a regression model satisfies the regression assumptions.. Why you need to know Chapter 14 introduced linear regression and correlation analyses for analyzing the relationship between two variables. As you might expect, business problems are not limited to linear relationships involving only two variables. Many practical situations involve analyzing the relationships among three or more variables. For example, a vice president of planning for an automobile manufacturer would be interested in the relationship between her company’s automobile sales and the variables that influence those sales. Included in her analysis might be such independent or explanatory variables as automobile price, competitors’ sales, and advertising, as well as economic variables such as disposable personal income, the inflation rate, and the unemployment rate. When multiple independent variables are to be included in an analysis simultaneously, the technique introduced in this chapter—multiple linear regression—is very useful. When a relationship between variables is. 633.

(171) 634. CHAPTER 15. |. Multiple Regression Analysis and Model Building nonlinear, we may be able to transform the independent variables in ways that allow us to use multiple linear regression analysis to model the nonlinear relationships. This chapter examines the general topic of model building by extending the concepts of simple linear regression analysis provided in Chapter 14.. 15.1 Introduction to Multiple Regression. Analysis Chapter 14 introduced the concept of simple linear regression analysis. The simple regression model is characterized by two variables: y, the dependent variable, and x, the independent variable. The single independent variable explains some variation in the dependent variable, but unless x and y are perfectly correlated, the proportion explained will be less than 100%. In multiple regression analysis, additional independent variables are added to the regression model to clear up some of the as yet unexplained variation in the dependent variable. Multiple regression is merely an extension of simple regression analysis; however, as we expand the model for the population from one independent variable to two or more, there are some new considerations. The general format of a multiple regression model for the population is given by Equation 15.1. Multiple Regression Model Population y b0 b1x1 b2x2 . . . bkxk . (15.1). where: b0 Population’s regression constant bj Population’s regression coefficient for each variable xj 1, 2, . . . k k Number of independent variables Model error Four assumptions similar to those that apply to the simple linear regression model must also apply to the multiple regression model. Assumptions. 1. Individual model errors, , are statistically independent of one another, and these values represent a random sample from the population of possible errors at each level of x. 2. For a given value of x there can exist many values of y, and therefore many possible values for . Further, the distribution of possible model errors for any level of x is normally distributed. 3. The distributions of possible -values have equal variances at each level of x. 4. The means of the dependent variable, y, for all specified values of x can be connected with a line called the population regression model. Equation 15.1 represents the multiple regression model for the population. However, in most instances, you will be working with a random sample from the population. Given the preceding assumptions, the estimated multiple regression model, based on the sample data, is of the form shown in Equation 15.2. Estimated Multiple Regression Model yˆ b0 b1 x1 b2 x2 . . . bk x k. (15.2). This estimated model is an extension of an estimated simple regression model. The principal difference is that whereas the estimated simple regression model is the equation for a straight line in a two-dimensional space, the estimated multiple regression model forms a hyperplane (or response surface) through multidimensional space. Each regression coefficient represents a different slope. Therefore, using Equation 15.2, a value of the dependent variable can be.

(172) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 635. TABLE 15.1 | Sample Data to Illustrate the Difference between Simple and Multiple Regression Models. (A) One Independent Variable. Regression Hyperplane The multiple regression equivalent of the simple regression line. The plane typically has a different slope for each independent variable.. (B) Two Independent Variables. y. x1. y. x1. x2. 564.99 601.06 560.11 616.41 674.96 630.58 554.66. 50 60 40 50 60 45 53. 564.99 601.06 560.11 616.41 674.96 630.58 554.66. 50 60 40 50 60 45 53. 10 13 14 12 15 16 14. estimated using values of two or more independent variables. The regression hyperplane represents the relationship between the dependent variable and the k independent variables. For example, Table 15.1A shows sample data for a dependent variable, y, and one independent variable, x1. Figure 15.1 shows a scatter plot and the regression line for the simple regression analysis for y and x1. The points are plotted in two-dimensional space, and the regression model is represented by a line through the points such that the sum of squared errors [ SSE ∑( y yˆ )2 ] is minimized. If we add variable x2 to the model, as shown in Table 15.1B, the resulting multiple regression equation becomes yˆ 307.71 2.85 x1 10.94 x2 For the time being don’t worry about how this equation was computed. That will be discussed shortly. Note, however, that the (y, x1, x2) points form a three-dimensional space, as shown in Figure 15.2. The regression equation forms a slice (hyperplane) through the data such that ∑( y yˆ )2 is minimized. This is the same least squares criterion that is used with simple linear regression. The mathematics for developing the least squares regression equation for simple linear regression involves differential calculus. The same is true for the multiple regression equation but the mathematical derivation is beyond the scope of this text.1. FIGURE 15.1. |. Simple Regression Line. Scatter Plot. 800 700. yˆ = 463.89 + 2.67x1. 600 500. y 400 300 200 100 0. 0. 10. 20. 30. 40. 50. 60. 70. x1. 1For a complete treatment of the matrix algebra approach for estimating multiple regression coefficients, consult Applied Linear Statistical Models by Kutner et al..

(173) 636. CHAPTER 15. FIGURE 15.2. |. Multiple Regression Analysis and Model Building. | y. Multiple Regression Hyperplane for Population. 2 0 Regression Plane 1. x1. x2. Multiple regression analysis is usually performed with the aid of a computer and appropriate software. Both Minitab and Excel contain procedures for performing multiple regression. Minitab has a far more complete regression procedure. However, the PHStat Excel add-ins expand Excel’s capabilities. Each software package presents the output in a slightly different format; however, the same basic information will appear in all regression output. Chapter Outcome 1. Model A representation of an actual system using either a physical or a mathematical portrayal.. Basic Model-Building Concepts An important activity in business decision making is referred to as model building. Models are often used to test changes in a system without actually having to change the real system. Models are also used to help describe a system or to predict the output of a system based on certain specified inputs. You are probably quite aware of physical models. Airlines use flight simulators to train pilots. Wind tunnels are used to determine the aerodynamics of automobile designs. Golf ball makers use a physical model of a golfer called “Iron Mike” that can be set to swing golf clubs in a very controlled manner to determine how far a golf ball will fly. Although physical models are very useful in business decision making, our emphasis in this chapter is on statistical models that are developed using multiple regression analysis. Modeling is both an art and a science. Determining an appropriate model is a challenging task, but it can be made manageable by employing a model-building process consisting of the following three components: model specification, model fitting, and model diagnosis. Model Specification Model specification, or model identification, is the process of determining the dependent variable, deciding which independent variables should be included in the model, and obtaining the sample data for all variables. As with any statistical procedure, the larger the sample size the better, because the potential for extreme sampling error is reduced when the sample size is large. However, at a minimum, the sample size required to compute a regression model must be at least one greater than the number of independent variables.2 If we are thinking of developing a regression model with five independent variables, the absolute minimum number of cases required is six. Otherwise, the computer software will indicate an error has been made or will print out meaningless values. However, as a practical matter, the sample size should be at least four times the number of independent variables. Thus, if we had five independent variables (k 5), we would want a sample of at least 20.. 2There are mathematical reasons for this sample-size requirement that are beyond the scope of this text. In essence, the regression coefficient in Equation 15.2 can’t be computed if the sample size is not at least one larger than the number of independent variables..

(174) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 637. Model Building Model building is the process of actually constructing a mathematical equation in which some or all of the independent variables are used in an attempt to explain the variation in the dependent variable.. Chapter Outcome 2.. How to do it Model Specification In the context of the statistical models discussed in this chapter, this component involves the following three steps:. 1. Decide what question you want to ask. The question being asked usually indicates the dependent variable. In the previous chapter, we discussed how simple linear regression analysis could be used to describe the relationship between a dependent and an independent variable.. 2. List the potential independent variables for your model. Here, your knowledge of the situation you are modeling guides you in identifying potential independent variables.. 3. Gather the sample data (observations) for all variables.. How to do it Developing a Multiple Regression Model The following steps are employed in developing a multiple regression model:. 1. Specify the model by determining the dependent variable and potential independent variables, and select the sample data.. 2. Formulate the model. This is done by computing the correlation coefficients for the dependent variable and each independent variable, and for each independent variable with all other independent variables. The multiple regression equation is also computed. The computations are performed using computer software such as Excel or Minitab.. 3. Perform diagnostic checks on the model to determine how well the specified model fits the data and how well the model appears to meet the multiple regression assumptions.. Model Diagnosis Model diagnosis is the process of analyzing the quality of the model you have constructed by determining how well a specified model fits the data you just gathered. You will examine output values such as R-squared and the standard error of the model. At this stage, you will also assess the extent to which the model’s assumptions appear to be satisfied. (Section 15.5 is devoted to examining whether a model meets the regression analysis assumptions.) If the model is unacceptable in any of these areas, you will be forced to revert to the model-specification step and begin again. However, you will be the final judge of whether the model provides acceptable results, and you will always be constrained by time and cost considerations. You should use the simplest available model that will meet your needs. The objective of model building is to help you make better decisions. You do not need to feel that a sophisticated model is better if a simpler one will provide acceptable results.. BUSINESS APPLICATION. DEVELOPING A MULTIPLE REGRESSION MODEL. First City Real Estate First City Real Estate executives wish to build a model to predict sales prices for residential property. Such a model will be valuable when working with potential sellers who might list their homes with First City. This can be done using the following steps: Step 1 Model Specification. The question being asked is how can the real estate firm determine the selling price for a house? Thus, the dependent variable is the sales price. This is what the managers want to be able to predict. The managers met in a brainstorming session to determine a list of possible independent (explanatory) variables. Some variables, such as “condition of the house,” were eliminated because of lack of data. Others, such as “curb appeal” (the appeal of the house to people as they drive by), were eliminated because the values for these variables would be too subjective and difficult to quantify. From a wide list of possibilities, the managers selected the following variables as good candidates: x1 Home size (in square feet) x2 Age of house x3 Number of bedrooms x4 Number of bathrooms x5 Garage size (number of cars) Data were obtained for a sample of 319 residential properties that had sold within the previous two months in an area served by two of First City’s offices. For each house in the sample, the sales price and values for each potential independent variable were collected. The data are in the file First City. Step 2 Model Building. The regression model is developed by including independent variables from among those for which you have complete data. There is no way to determine whether an independent variable will be a good predictor variable by analyzing the individual variable’s descriptive statistics, such as the mean and standard deviation. Instead, we need to look at the correlation between the independent.

(175) 638. CHAPTER 15. |. Multiple Regression Analysis and Model Building. Correlation Coefficient. variables and the dependent variable, which is measured by the correlation coefficient. When we have multiple independent variables and one dependent variable, we can look at the correlation between all pairs of variables by developing a correlation matrix. Each correlation is computed using one of the equations in Equation 15.3. The appropriate formula is determined by whether the correlation is being calculated for an independent variable and the dependent variable or for two independent variables.. A quantitative measure of the strength of the linear relationship between two variables. The correlation coefficient, r, ranges from 1.0 to 1.0.. Correlation Matrix A table showing the pairwise correlations between all variables (dependent and independent).. Correlation Coefficient r. ∑ ( x x )( y y ) ∑ ( x x )2 ∑ ( y y )2 One x variable with y. or. r. ∑ ( xi xi)( xj xj) ∑ ( xi xi )2 ∑ (xj xj )2. (15.3). One x variable with another x. The actual calculations are done using Excel’s correlation tool or Minitab’s correlation command, and the results are shown in Figure 15.3a and Figure 15.3b. The output provides the correlation between y and each x variable and between each pair of independent variables.3 Recall that in Chapter 14, a t-test (see Equation 14-3) was used to test whether the correlation coefficient is statistically significant. H0: r 0. HA: r 0. We will conduct the test with a significance level of a 0.05 FIGURE 15.3A. |. Excel 2007 Results Showing First City Real Estate Correlation Matrix. Correlation between age and square feet = −0.0729 Older homes tend to be smaller.. Excel 2007 Instructions: 1. Open file: First City.xls. 2. Select Home Sample 1 worksheet. 3. Click on Data > Data Analysis. 4. Select Correlation.. 5. Define y variable range (all rows and columns). 6. Click on Labels. 7. Click OK.. 3Minitab, in addition to providing the correlation matrix, can provide the p-values for each correlation. If the p-value is less than the specified alpha, the correlation is statistically significant..

(176) CHAPTER 15. FIGURE 15.3B. |. Multiple Regression Analysis and Model Building. 639. |. Minitab Results Showing First City Real Estate Correlation Matrix Correlation between age and square feet = − 0.073 Older homes tend to have fewer square feet.. Minitab Instructions: 1. Open file: First City.MTW. 2. Choose Stat Basic Statistics Correlation.. 3. In Variables, enter variable columns. 4. Click OK.. Given degrees of freedom equal to n - 2 319 - 2 317 the critical t (see Appendix E) for a two-tailed test is approximately 1.96.4 Any correlation coefficient generating a t-value greater than 1.96 or less than -1.96 is determined to be significant. For now, we will focus on the correlations in the first column in Figures 15.3a and 15.3b, which measure the strength of the linear relationship between each independent variable and the dependent variable, sales price. For example, the t statistic for price and square feet is t. r 1 r 2 n2. . 0.7477 1 0.7477 2 319 2. 20.048. Because t 20.048 1.96 we reject H0 and conclude that the correlation between sales price and square feet is statistically significant. Similar calculations for the other independent variables with price show that all variables are statistically correlated with price. This indicates that a significant linear relationship exists between each independent variable and sales price. Variable x1, square feet, has the highest correlation at 0.748. Variable x2, age of the house, has the lowest correlation at 0.485. The negative correlation implies that older homes tend to have lower sales prices. As we discussed in Chapter 14, it is always a good idea to develop scatter plots to visualize the relationship between two variables. Figure 15.4 shows the scatter plots for each independent variable and the dependent variable, sales price. In each case, the plots indicate a linear relationship between the independent variable and the dependent variable. Note that several of the independent variables (bedrooms, bathrooms, garage size) are quantitative but discrete. The scatter plots for these variables show points at each level of the independent variable rather than over a continuum of values.. 4You. can use the Excel TINV function to get the precise t-value, which is 1.967..

(177) CHAPTER 15. FIGURE 15.4. |. Multiple Regression Analysis and Model Building. | First City Real Estate Scatter Plots (b) Price versus Age. $400,000. $400,000. $300,000. $300,000 Price. Price. (a) Price versus Square Feet. $200,000 $100,000 0. $200,000 $100,000. 0. 1,000. 2,000 3,000 Square Feet. 4,000. 0. 5,000. 0. 40. 60. 80. 100. (d) Price versus Bathrooms $400,000. $300,000. $300,000 Price. $400,000. $200,000. $200,000 $100,000. $100,000 0. 20. Age. (c) Price versus Bedrooms. Price. 0. 1. 2. 3 4 Bedrooms. 5. 6. 0. 7. 0. 2. 4 6 Bathrooms. 8. 0. (e) Price versus # Car Garage $400,000 $300,000 Price. 640. $200,000 $100,000 0. Chapter Outcome 3.. 0. 1. 2 3 # Car Garage. 4. 5. Computing the Regression Equation First City’s goal is to develop a regression model to predict the appropriate selling price for a home, using certain measurable characteristics. The first attempt at developing the model will be to run a multiple regression computer program using all available independent variables. The regression outputs from Excel and Minitab are shown in Figure 15.5a and Figure 15.5b. The estimate of the multiple regression model given in Figure 15.5a is yˆ 31,127.6 63.1(sq. ft.) 1,144.4(age) 8,4100.4(bedrooms) 3,522.0(bathrooms) 28,203.5(garage) The coefficients for each independent variable represent an estimate of the average change in the dependent variable for a 1-unit change in the independent variable, holding all other independent variables constant. For example, for houses of the same age, with the same number of bedrooms, baths, and garages, a 1-square-foot increase in the size of the house is estimated to increase its price by an average of $63.10. Likewise, for houses with the same square feet, bedrooms, bathrooms, and garages, a 1-year increase in the.

(178) CHAPTER 15. FIGURE 15.5A. |. Multiple Regression Analysis and Model Building. 641. |. Excel 2007 Multiple Regression Model Results for First City Real Estate. Multiple coefficient of determination Standard error of estimate. SSR = 1.0389E+12 SSE = 2.34135E+11 SST = 1.27303E+12. Regression coefficients. Excel 2007 Instructions: 1. Open file: First City.xls. 2. Click on Data Data Analysis. 3. Select Regression.. 4. Define y variable range and the x variable range (include labels). 5. Click Labels. 6. Click OK.. age of the house is estimated to result in an average drop in sales price of $1,144.40. The other coefficients are interpreted in the same way. Note, in each case, we are interpreting the regression coefficient for one independent variable while holding the other variables constant. To estimate the value of a residential property, First City Real Estate brokers would substitute values for the independent variables into the regression equation. For example, suppose a house with the following characteristics is considered: x1 Square feet 2,100 x2 Age 15 x3 Number of bedrooms 4 x4 Number of bathrooms 3 x5 Size of garage 2 The point estimate for the sales price is yˆ 31,127.6 63.1 (2,100) 1,144.4 (15) 8,410.4 (4) 3,522.0 (3) 28,203.5 (2) yˆ $179,802.70.

(179) 642. CHAPTER 15. FIGURE 15.5B. |. Multiple Regression Analysis and Model Building. |. Minitab Multiple Regression Model Results for First City Real Estate. Regression coefficients. Multiple coefficient of determination Standard error of the estimate. SSR = 1.0389E+12 SSE = 2.34135E+11 SST = 1.27303E+12. Minitab Instructions: 1. Open file: First City.MTW. 2. Choose Stat Regression Regression.. Multiple coefficient of determination (R2 ) The proportion of the total variation of the dependent variable in a multiple regression model that is explained by its relationship to the independent variables. It is, as is the case in the simple linear model, called R-squared and is denoted as R 2.. 3. In Response, enter dependent (y) variable. 4. In Predictors, enter independent (x) variables. 5. Click OK.. The Coefficient of Determination You learned in Chapter 14 that the coefficient of determination, R2, measures the proportion of variation in the dependent variable that can be explained by the dependent variable’s relationship to a single independent variable. When there are multiple independent variables in a model, R2 is called the multiple coefficient of determination and is used to determine the proportion of variation in the dependent variable that is explained by the dependent variable’s relationship to all the independent variables in the model. Equation 15.4 is used to compute R2 for a multiple regression model.. Multiple Coefficient of Determination (R2) R2 . Sum of squares regression SSR Total sum of squ uares SST. (15.4). As shown in Figure 15.5a, R2 0.8161. Both SSR and SST are also included in the output. Therefore, you can also use Equation 15.4 to get R2, as follows: 1.0389 E 12 SSR 0.8161 SST 1.27303E 12 More than 81% of the variation in sales price can be explained by the linear relationship of the five independent variables in the regression model to the dependent variable. However, as we shall shortly see, not all independent variables are equally important to the model’s ability to explain this variation..

(180) CHAPTER 15. |. 643. Multiple Regression Analysis and Model Building. Model Diagnosis Before First City actually uses this regression model to estimate the sales price of a house, there are several questions that should be answered. 1. 2. 3. 4. 5.. Is the overall model significant? Are the individual variables significant? Is the standard deviation of the model error too large to provide meaningful results? Is multicollinearity a problem? Have the regression analysis assumptions been satisfied?. We shall answer the first four questions in order. We will have to wait until Section 15.5 before we have the procedures to answer the fifth important question. Chapter Outcome 4.. Is the Model Significant? Because the regression model we constructed is based on a sample of data from the population and is subject to sampling error, we need to test the statistical significance of the overall regression model. The specific null and alternative hypotheses tested for First City Real Estate are H0: b1 b2 b3 b4 b5 0 HA: At least one bi 0 If the null hypothesis is true and all the slope coefficients are simultaneously equal to zero, the overall regression model is not useful for predictive or descriptive purposes. The F-test is a method for testing whether the regression model explains a significant proportion of the variation in the dependent variable (and whether the overall model is significant). The F-test statistic for a multiple regression model is shown in Equation 15.5.. F-Test Statistic SSR k F SSE n k 1. (15.5). where: SSR Sum of squares regression ∑ (yˆ y )2 SSE Sum of squares error ∑( y yˆ )2 n Sample size k Number of independent variables Degrees of freedom D1 k and D2 (n k 1). The ANOVA portion of the output shown in Figure 15.5a contains values for SSR, SSE, and the F-value. The general format of the ANOVA table in a regression analysis is as follows: ANOVA Source. df. SS. MS. F. Significance F. Regression. k. SSR. MSR SSR/k. MSR/MSE. computed p-value. nk1. SSE. MSE SSE/(n k 1). n1. SST. Residual Total.

(181) 644. CHAPTER 15. |. Multiple Regression Analysis and Model Building. The ANOVA portion of the output from Figure 15.5a is as follows: ANOVA Source. df. Regression. 5. Residual Total. SS. MS. F. Significance F. 1.04E 12. 2.08E 11. 277.8. 0.0000. 313. 2.34E 11. 7.48E 08. 318. 1.27303E 12. We can test the model’s significance H0: b1 b2 b3 b4 b5 0 HA: At least one bi 0 by either comparing the calculated F-value, 277.8, with a critical value for a given alpha level a 0.01 and k 5 and n k 1 313 degrees of freedom using Excel’s FINV function (F0.01 3.079) or comparing the p-value in the output with a specified alpha level. Because F 277.8 3.079, reject H0 or because p-value ≈ 0.0 0.01, reject H0. Adjusted R-squared A measure of the percentage of explained variation in the dependent variable in a multiple regression model that takes into account the relationship between the sample size and the number of independent variables in the regression model.. we should therefore conclude that the regression model does explain a significant proportion of the variation in sales price. Thus, the overall model is statistically significant. This means we can conclude that at least one of the regression slope coefficients is not equal to zero. Excel and Minitab also provide a measure called the R-sq(adj), which is the adjusted R-squared value (see Figures 15.5a and 15.5b). It is calculated by Equation 15.6. Adjusted R-Squared A measure of the percentage of explained variation in the dependent variable that takes into account the relationship between the sample size and the number of independent variables in the regression model. ⎛ n 1 ⎞ R-sq(adj) RA2 1 (1 R 2 ) ⎜ ⎟ ⎝ n k 1 ⎠. (15.6). where: n Sample size k Number of independent variables R2 Coefficient of determination Adding independent variables to the regression model will always increase R2, even if these variables have no relationship to the dependent variable. Therefore, as the number of independent variables is increased (regardless of the quality of the variables), R2 will increase. However, each additional variable results in the loss of one degree of freedom. This is viewed as part of the cost of adding the specified variable. The addition to R2 may not justify the reduction in degrees of freedom. The RA2 value takes into account this cost and adjusts the RA2 value accordingly. RA2 will always be less than R2. When a variable is added that does not contribute its fair share to the explanation of the variation in the dependent variable, the RA2 value may actually decline, even though R2 will always increase. The adjusted R-squared is a particularly important measure when the number of independent variables is large relative to the sample size. It takes into account the relationship between sample size and number of variables. R2 may appear artificially high if the number of variables is large compared with the sample size. In this example, in which the sample size is quite large relative to the number of independent variables, the adjusted R-squared is 81.3%, only slightly less than R2 81.6%..

(182) CHAPTER 15 Chapter Outcome 4.. |. Multiple Regression Analysis and Model Building. 645. Are the Individual Variables Significant? We have concluded that the overall model is significant. This means at least one independent variable explains a significant proportion of the variation in sales price. This does not mean that all the variables are significant, however. To determine which variables are significant, we test the following hypotheses: H0: bj 0 HA: bj 0. for all j. We can test the significance of each independent variable using significance level a 0.05 and a t-test, as discussed in Chapter 14. The calculated t-values should be compared to the critical t-value with n k 1 319 5 1 313 degrees of freedom, which is approximately t0.025 ≈ 1.97 for a 0.05. The calculated t-value for each variable is provided on the computer printout in Figures 15.5a and 15.5b. Recall that the t statistic is determined by dividing the regression coefficient by the estimator of the standard deviation of the regression coefficient, as shown in Equation 15.7. t-Test for Significance of Each Regression Coefficient t. bj 0 sb. df n k 1. (15.7). j. where: bj Sample slope coefficient for the jth indeependent variable sb Estimate of the standard error for the jth sample slope coefficient j. For example, the t-value for square feet shown in Figure 15.5a is 15.70. This was computed using Equation 15.7, as follows: t. bj 0 sb. j. . 63.1 0 15.70 4.02. Because t 15.70 1.97, we reject H0, and conclude that, given the other independent variables in the model, the regression slope for square feet is not zero. We can also look at the Excel or Minitab output and compare the p-value for each regression slope coefficient with alpha. If the p-value is less than alpha, we reject the null hypothesis and conclude that the independent variable is statistically significant in the model. Both the t-test and the p-value techniques will give the same results. You should consider that these t-tests are conditional tests. This means that the null hypothesis is the value of each slope coefficient is 0, given that the other independent variables are already in the model.5 Figure 15.6 shows the hypothesis tests for each independent. 5Note that the t-tests may be affected if the independent variables in the model are themselves correlated. A procedure known as the sum of squares drop F-test, discussed by Kutner et al. in Applied Linear Statistical Models, should be used in this situation. Each t-test considers only the marginal contribution of the independent variables and may indicate that none of the variables in the model are significant, even though the ANOVA procedure indicates otherwise..

(183) CHAPTER 15. FIGURE 15.6. |. Multiple Regression Analysis and Model Building. |. Significance Tests for Each Independent Variable in the First City Real Estate Example. Hypotheses: H0: j = 0, given all other variables are already in the model HA: j = 0, given all other variables are already in the model = 0.05. –. 646. Decision Rule: If t 1.97 or t 1.97, reject H0. Otherwise, do not reject H0.. df = n – k – 1 = 319 – 5 – 1 = 313. /2 = 0.025. /2 = 0.025 t. –t0.025 = –1.97. t0.025 = 1.97. 0.0. The test is: For 1: Calculated t (from printout) = 15.70 Because 15.70 > 1.97, reject H0. For 2: Calculated t = –10.15 Because –10.15 < –1.97, reject H0. For 3: Calculated t = –2.80 Because –2.80 < –1.97, reject H0. For 4: Calculated t = 2.23 Because 2.23 > 1.97, reject H0. For 5: Calculated t = 9.87 Because 9.87 > 1.97, reject H0.. variable using a 0.05 significance level. We conclude that all five independent variables in the model are significant. When a regression model is to be used for prediction, the model should contain no insignificant variables. If insignificant variables are present, they should be dropped and a new regression equation obtained before the model is used for prediction purposes. We will have more to say about this later. Is the Standard Deviation of the Regression Model Too Large? The purpose of developing the First City regression model is to be able to determine values of the dependent variable when corresponding values of the independent variables are known. An indication of how good the regression model is can be found by looking at the relationship between the measured values of the dependent variable and those values that would be predicted by the regression model. The standard deviation of the regression model (also called the standard error of the estimate), measures the dispersion of observed home sale values, y, around values predicted by the regression model. The standard error of the estimate is shown in Figure 15.5a and can be computed using Equation 15.8.. Standard Error of the Estimate s . SSE MSE n k 1. where: SSE Sum of squares error (residual) n Sample size k Number of independent variables. (15.8).

(184) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 647. Examining Equation 15.8 closely, we see that this standard error of the estimate is the square root of the mean square error of the residuals found in the analysis of variance table. Sometimes, even though a model has a high R2, the standard error of the estimate will be too large to provide adequate precision for confidence and prediction intervals. A rule of thumb that we have found useful is to examine the range 2s. Taking into account the mean value of the dependent variable, if this range is acceptable from a practical viewpoint, the standard error of the estimate might be considered acceptable.6 In this First City Real Estate Company example, the standard error, shown in Figure 15.5a, is $27,350. Thus, the rough prediction range for the price of an individual home is 2($27,350) $54,700 Considering that the mean price of homes in this study is in the low $200,000s, a potential error of $54,700 high or low is probably not acceptable. Not many homeowners would be willing to have their appraisal value set by a model with a possible error this large. Even though the model is statistically significant, the company needs to take steps to reduce the standard deviation of the estimate. Subsequent sections of this chapter discuss some ways we can attempt to reduce it. Chapter Outcome 5.. Multicollinearity A high correlation between two independent variables such that the two variables contribute redundant information to the model. When highly correlated independent variables are included in the regression model, they can adversely affect the regression results.. Is Multicollinearity a Problem? Even if the overall regression model is significant and each independent variable is significant, decision makers should still examine the regression model to determine whether it appears reasonable. This is referred to as checking for face validity. Specifically, you should check to see that signs on the regression coefficients are consistent with the signs on the correlation coefficients between the independent variables and the dependent variable. Does any regression coefficient have an unexpected sign? Before answering this question for the First City Real Estate example, we should review what the regression coefficients mean. First, the constant term, b0, is the estimate of the model’s y intercept. If the data used to develop the regression model contain values of x1, x2, x3, x4, and x5 that are simultaneously 0 (such as would be the case for vacant land), b0 is the mean value of y, given that x1, x2, x3, x4, and x5 all equal 0. Under these conditions b0 would estimate the average value of a vacant lot. However, in the First City example, no vacant land was in the sample, so b0 has no particular meaning. The coefficient for square feet, b1, estimates the average change in sales price corresponding to a change in house size of 1 square foot, holding the other independent variables constant. The value shown in Figure 15.5a for b1 is 63.1. The coefficient is positive, indicating that an increase in square footage is associated with an increase in sales price. This relationship is expected. All other things being equal, bigger houses should sell for more money. Likewise, the coefficient for x5, the size of the garage, is positive, indicating that an increase in size is also associated with an increase in price. This is expected. The coefficient for x2, the age of the house, is negative, indicating that an older house is worth less than a similar younger house. This also seems reasonable. Finally, variable x4 for bathrooms has the expected positive sign. However, the coefficient for variable x3, the number of bedrooms, is $8,410.4, meaning that if we hold the other variables constant but increase the number of bedrooms by one, the average price will drop by $8,410.40. Does this seem reasonable? Referring to the correlation matrix that was shown earlier in Figure 15.3, the correlation between variable x3, bedrooms, and y, the sales price, is 0.540. This indicates that without considering the other independent variables, the linear relationship between number of bedrooms and sales price is positive. But why does the regression coefficient for variable x3 turn out to be negative in the model? The answer lies in what is called multicollinearity. Multicollinearity occurs when independent variables are correlated with each other and therefore overlap with respect to the information they provide in explaining the variation in 6The actual confidence interval for prediction of a new observation requires the use of matrix algebra. However, when the sample size is large and dependent variable values near the means of the dependent variables are used, the rule of thumb given here is a close approximation. Refer to Applied Linear Statistical Models by Kutner et al. for further discussion..

(185) 648. CHAPTER 15. |. Multiple Regression Analysis and Model Building. the dependent variable. For example, x3 and the other independent variables have the following correlations (see Figure 15.3b): rx. 3, x1. rx. 3, x2. rx. 3, x 4. rx. 3, x 5. 0.706 0.202 0.600 0.312. All four correlations have t-values indicating a significant linear relationship. Refer to the correlation matrix in Figure 15.3 to see that other independent variables are also correlated with each other. The problems caused by multicollinearity, and how to deal with them, continue to be of prime concern to statisticians. From a decision maker’s viewpoint, you should be aware that multicollinearity can (and often does) exist and recognize the basic problems it can cause. The following are some of the most obvious problems and indications of severe multicollinearity: 1. Unexpected, therefore potentially incorrect, signs on the coefficients 2. A sizable change in the values of the previously estimated coefficients when a new variable is added to the model 3. A variable that was previously significant in the regression model becomes insignificant when a new independent variable is added. 4. The estimate of the standard deviation of the model error increases when a variable is added to the model.. Variance Inflation Factor A measure of how much the variance of an estimated regression coefficient increases if the independent variables are correlated. A VIF equal to 1.0 for a given independent variable indicates that this independent variable is not correlated with the remaining independent variables in the model. The greater the multicollinearity, the larger the VIF.. Mathematical approaches exist for dealing with multicollinearity and reducing its impact. Although these procedures are beyond the scope of this text, one suggestion is to eliminate the variables that are the chief cause of the multicollinearity problems. If the independent variables in a regression model are correlated and multicollinearity is present, another potential problem is that the t-tests for the significance of the individual independent variables may be misleading. That is, a t-test may indicate that the variable is not statistically significant when in fact it is. One method of measuring multicollinearity is known as the variance inflation factor (VIF). Equation 15.9 is used to compute the VIF for each independent variable.. Variance Inflation Factor VIF . 1 (1 R 2j ). (15.9). where: R 2j Coefficient of determination when the jth independent variable is regressed againstt the remaining k 1 independent variables. Both the PHStat add-ins to Excel and Minitab contain options that provide VIF values.7 Figure 15.7 shows the Excel (PHStat) output of the VIFs for the First City Real Estate example. The effect of multicollinearity is to decrease the test statistic, thus reducing the probability that the variable will be declared significant. A related impact is to increase the width of the confidence interval estimate of the slope coefficient in the regression model. Generally, if the VIF 5 for a particular independent variable, multicollinearity is not considered a problem for that variable. VIF values

(186) 5 imply that the correlation between the independent variables is too extreme and should be dealt with by dropping variables from the. 7Excel’s Regression procedure in the Data Analysis Tools area does not provide VIF values directly. Without PHStat, you would need to compute each regression analysis individually and record the R-squared value to compute the VIF..

(187) CHAPTER 15. FIGURE 15.7. |. Multiple Regression Analysis and Model Building. 649. |. Excel 2007 (PHStat) Multiple Regression Model Results for First City Real Estate with Variance Inflation Factors. Variance inflation factors. Excel 2007 Instructions: 1. Open file: First City.xls. 2. Click on Add-Ins PHStat. 3. Select Regression Multiple Regression. 4. Define y variable range and the x variable range.. 5. Select Regression Statistics Table and ANOVA and Coefficients Table. 6. Select Variance Inflation Factor (VIF). 7. Click OK. Note VIFs consolidated to one page for display in Figure 15.7.. Minitab Instructions (for similar result): 1. Open file: First City.MTW. 2. Choose Stat Regression Regression. 3. In Response enter dependent (y) variable. 4. In Predictors, enter independent (x) variables.. 5. Click Options. 6. In Display, select Variance Inflation factors. 7. Click OK. OK.. model. As Figure 15.7 illustrates, the VIF values for each independent variable are less than 5, so based on variance inflation factors, even though the sign on the variable, bedrooms, has switched from positive to negative, the other multicollinearity issues do not exist among these independent variables. Confidence Interval Estimation for Regression Coefficients Previously, we showed how to determine whether the regression coefficients are statistically significant. This was necessary because the estimates of the regression coefficients are developed from sample data and are subject to sampling error. The issue of sampling error also comes into play when interpreting the slope coefficients. Consider again the regression models for First City Real Estate shown in Figures 15.8a and 15.8b. The regression coefficients shown are point estimates for the true regression coefficients. For example, the coefficient for the variable square feet is b1 63.1. We interpret this to mean that, holding the other variables constant, for each increase in the size of a home by 1 square foot, the price of a house is estimated to increase by $63.1. But like all point estimates, this is subject to sampling error. In Chapter 14 you were introduced to the concept of confidence interval estimates for the regression coefficients. That same concept applies in multiple.

(188) 650. CHAPTER 15. FIGURE 15.8A. |. Multiple Regression Analysis and Model Building. |. Excel 2007 Multiple Regression Model Results for First City Real Estate. 95% confidence interval estimates for the regression. Excel 2007 Instructions: 1. Open file: First City.xls. 2. Click on Data Data Analysis. 3. Select Regression.. 4. Define y variable range and the x variable range (include labels). 5. Click Labels. 6. Click OK.. regression models. Equation 15.10 is used to develop the confidence interval estimate for the regression coefficients. Confidence Interval Estimate for the Regression Slope b j tsb. j. (15.10). where: bj Point estimate for the regression coefficcient for x j t Critical t -value for the specified confidence level sb The standard error of the j th regression coefficient j. The Excel output in Figure 15.8a provides the confidence interval estimates for each regression coefficient. For example, the 95% interval estimate for square feet is $55.2 -------- $71.0 Minitab does not have a command to generate confidence intervals for the individual regression parameters. However, statistical quantities are provided on the Minitab output in Figure 15.8b to allow the manual calculation of these confidence intervals. As an example,.

(189) CHAPTER 15. FIGURE 15.8B. |. 651. Multiple Regression Analysis and Model Building. |. Minitab Multiple Regression Model Results for First City Real Estate. sb1. b1 = 63.1. Minitab Instructions: 1. Open file: First City.MTW. 2. Choose Stat Regression Regression. 3. In Response, enter dependent (y) variable.. 4. In Predictors, enter independent (x) variable. 5. Click OK.. the confidence interval for the coefficient associated with the square feet variable can be computed using Equation 15.10 as8 b1 tsb. 1. 63.1 1.967( 4.017 ) 63.1 7.90 $55.2 --------- $71.0 We interpret this interval as follows: Holding the other variables constant, using a 95% confidence level, a change in square feet by 1 foot is estimated to generate an average change in home price of between $55.20 and $71.00. Each of the other regression coefficients can be interpreted in the same manner. 8Note,. we used Excel’s TINV function to get the precise t-value of 1.967.. MyStatLab. 15-1: Exercises Skill Development 15-1. The following output is associated with a multiple regression model with three independent variables: df. SS. MS. F. Regression. 3. 16,646.091 5,548.697 5.328. Residual. 21. 21,871.669. Total. 24. 38,517.760. 1,041.508. Significance F 0.007. Intercept x1 x2 x3. Coefficients. Standard Error. 87.790 0.970 0.002 8.723. 25.468 0.586 0.001 7.495. t Stat. p-value. 3.447 1.656 3.133 1.164. 0.002 0.113 0.005 0.258.

(190) 652. CHAPTER 15. |. Multiple Regression Analysis and Model Building. Lower 95% Upper 95% Lower 90% Upper 90% Intercept x1 x2 x3. 34.827 2.189 0.001 24.311. 140.753 0.248 0.004 6.864. 43.966 1.979 0.001 21.621. 131.613 0.038 0.004 4.174. a. What is the regression model associated with these data? b. Is the model statistically significant? c. How much of the variation in the dependent variable can be explained by the model? d. Are all of the independent variables in the model significant? If not, which are not and how can you tell? e. How much of a change in the dependent variable will be associated with a one-unit change in x2? In x3? f. Do any of the 95% confidence interval estimates of the slope coefficients contain zero? If so, what does this indicate? 15-2. You are given the following estimated regression equation involving a dependent and two independent variables: yˆ 12.67 4.14 x1 8.72 x2 a. Interpret the values of the slope coefficients in the equation. b. Estimate the value of the dependent variable when x1 4 and x2 9. 15-3. In working for a local retail store you have developed the following estimated regression equation: yˆ 22,167 412 x1 818 x2 93x3 71x4 where: y Weekly sales x1 Local unemployment rate x2 Weekly average high temperature x3 Number of activities in the local community x4 Average gasoline price a. Interpret the values of b1, b2, b3, and b4 in this estimated regression equation. b. What is the estimated sales if the unemployment rate is 5.7%, the average high temperature is 61°, there are 14 activities, and gasoline average price is $1.39? 15-4. The following correlation matrix is associated with the same data used to build the regression model in Problem 15-1: y. x1. x2. y x1. 1 0.406. 1. x2. 0.459. 0.051. 1. x3. 0.244. 0.504. 0.272. Does this output indicate any potential multicollinearity problems with the analysis?. x3. 1. 15-5. Consider the following set of data: x1 x2 y. 29 15 16. 48 37 46. 28 24 34. 22 32 26. 28 47 49. 42 13 11. 33 43 41. 26 12 13. 48 58 47. 44 19 16. a. Obtain the estimated regression equation. b. Develop the correlation matrix for this set of data. Select the independent variable whose correlation magnitude is the smallest with the dependent variable. Determine if its correlation with the dependent variable is significant. c. Determine if the overall model is significant. Use a significance level of 0.05. d. Calculate the variance inflation factor for each of the independent variables. Indicate if multicollinearity exists between the two independent variables. 15-6. Consider the following set of data: x2 x1 y. 10 50 103. 8 45 85. 11 37 115. 7 32 73. 10 44 97. 11 51 102. 6 42 65. a. Obtain the estimated regression equation. b. Examine the coefficient of determination and the adjusted coefficient of determination. Does it seem that either of the independent variables’ addition to R2 does not justify the reduction in degrees of freedom that results from its addition to the regression model? Support your assertions. c. Conduct a hypothesis test to determine if the dependent variable increases when x2 increases. Use a significance level of 0.025 and the p-value approach. d. Construct a 95% confidence interval for the coefficient of x1.. Computer Database Exercises 15-7. An investment analyst collected data about 20 randomly chosen companies. The data consisted of the 52-week-high stock prices, price-to-earnings (PE) ratio, and the market value of the company. These data are in the file entitled Investment. a. Produce a regression equation to predict the market value using the 52-week-high stock price and the PE ratio of the company. b. Determine if the overall model is significant. Use a significance level of 0.05. c. OmniVision Technologies (Sunnyvale, CA) in April 2006 had a 52-week-high stock price of 31 and a PE ratio of 19. Estimate its market value for that time period. (Note: Its actual market value for that time period was $1,536.) 15-8. An article in BusinessWeek presents a list of the 100 companies perceived as having “hot growth” characteristics. A company’s rank on the list is the sum of 0.5 times its rank in return on total capital and 0.25 times its sales and profit-growth ranks. The file entitled Growth contains sales ($million), sales increase (%),.

(191) CHAPTER 15. return on capital, market value ($million), and recent stock price of the top 20 ranked companies. a. Produce a correlation matrix for the variables contained in the file entitled Growth. b. Select the two variables that are most highly correlated with the recent stock price and produce the regression equation to predict the recent stock price as a function of the two variables you chose. c. Determine if the overall model is significant. Use a significance level of 0.10. d. Examine the coefficient of determination and the adjusted coefficient of determination. Does it seem that either of the independent variables’ addition to R2 does not justify the reduction in degrees of freedom that results from its addition to the regression model? Support your assertions. e. Select the variable that is most correlated with the stock price and test to see if it is a significant predictor of the stock price. Use a significance level of 0.10 and the p-value approach. 15-9. Refer to Exercise 15-8, which referenced a list of the 100 companies perceived as having “hot growth” characteristics. The file entitled Logrowth contains sales ($million), sales increase (%), return on capital, market value ($million), and recent stock price of the companies ranked from 81 to 100. In Exercise 15-8, stock prices were the focus. Examine the sales of the companies. a. Produce a regression equation that will predict the sales as a function of the other variables. b. Determine if the overall model is significant. Use a significance level of 0.05. c. Conduct a test of hypothesis to discover if market value should be removed from this model. d. To see that a variable can be insignificant in one model but very significant in another, construct a regression equation in which sales is the dependent variable and market value is the independent variable. Test the hypothesis that market value is a significant predictor of sales for those companies ranked from 81 to 100. Use a significance level of 0.05 and the p-value approach. 15-10. The National Association of Theatre Owners is the largest exhibition trade organization in the world, representing more than 26,000 movie screens in all 50 states and in more than 20 countries worldwide. Its membership includes the largest cinema chains and hundreds of independent theatre owners. It publishes statistics concerning the movie sector of the economy. The file entitled Flicks contains data on total U.S. box office grosses ($billion), total number of admissions (billion), average U.S. ticket price ($), and number of movie screens. a. Construct a regression equation in which total U.S. box office grosses are predicted using the other variables. b. Determine if the overall model is significant. Use a significance level of 0.05.. |. Multiple Regression Analysis and Model Building. 653. c. Determine the range of plausible values for the change in box office grosses if the average ticket price were to be increased by $1. Use a confidence level of 95%. d. Calculate the variance inflation factor for each of the independent variables. Indicate if multicollinearity exists between any two independent variables. e. Produce the regression equation suggested by your answer to part d. 15-11. The athletic director of State University is interested in developing a multiple regression model that might be used to explain the variation in attendance at football games at his school. A sample of 16 games was selected from home games played during the past 10 seasons. Data for the following factors were determined: y Game attendance x1 Team win/loss percentage to date x2 Opponent win/loss percentage to date x3 Games played this season x4 Temperature at game time The data collected are in the file called Football. a. Produce scatter plots for each independent variable versus the dependent variable. Based on the scatter plots, produce a model that you believe represents the relationship between the dependent variable and the group of predictor variables represented in the scatter plots. b. Based on the correlation matrix developed from these data, comment on whether you think a multiple regression model will be effectively developed from these data. c. Use the sample data to estimate the multiple regression model that contains all four independent variables. d. What percentage of the total variation in the dependent variable is explained by the four independent variables in the model? e. Test to determine whether the overall model is statistically significant. Use a 0.05. f. Which, if any, of the independent variables is statistically significant? Use a significance level of a 0.08 and the p-value approach to conduct these tests. g. Estimate the standard deviation of the model error and discuss whether this regression model is acceptable as a means of predicting the football attendance at State University at any given game. h. Define the term multicollinearity and indicate the potential problems that multicollinearity can cause for this model. Indicate what, if any, evidence there is of multicollinearity problems with this regression model. Use the variance inflation factor to assist you in this analysis. i. Develop a 95% confidence interval estimate for each of the regression coefficients and interpret each estimate. Comment on whether the interpretation of the intercept is relevant in this situation. END EXERCISES 15-1.

(192) 654. CHAPTER 15. |. Multiple Regression Analysis and Model Building. Chapter Outcome 6.. 15.2 Using Qualitative Independent. Variables. Dummy Variable A variable that is assigned a value equal to either 0 or 1, depending on whether the observation possesses a given characteristic.. In Example 15-1 involving the First City Real Estate Company, the independent variables were quantitative and ratio level. However, you will encounter many situations in which you may wish to use a qualitative, lower level variable as an explanatory variable. If a variable is nominal, and numerical codes are assigned to the categories, you already know not to perform mathematical calculations using those data. The results would be meaningless. Yet, we may wish to use a variable such as marital status, gender, or geographical location as an independent variable in a regression model. If the variable of interest is coded as an ordinal variable, such as education level or job performance ranking, computing means and variances is also inappropriate. Then how are these variables incorporated into a multiple regression analysis? The answer lies in using what are called dummy (or indicator) variables. For instance, consider the variable gender, which can take on two possible values: male or female. Gender can be converted to a dummy variable as follows: x1 1 if female x1 0 if male Thus, a data set consisting of males and females will have corresponding values for x1 equal to 0s and 1s, respectively. Note that it makes no difference which gender is coded 1 and which is coded 0. If a categorical variable has more than two mutually exclusive outcome possibilities, multiple dummy variables must be created. Consider the variable marital status, with the following possible outcomes: never married. married. divorced. widowed. In this case, marital status has four values. To account for all the possibilities, you would create three dummy variables, one less than the number of possible outcomes for the original variable. They could be coded as follows: x1 1 if never married, 0 if not x2 1 if married, 0 if not x3 1 if divorced, 0 if not Note that we don’t need the fourth variable because we would know that a person is widowed if x1 0, x2 0, and x3 0. If the person isn’t single, married, or divorced, he or she must be widowed. Always use one fewer dummy variables than categories. The mathematical reason that the number of dummy variables must be one less than the number of possible responses is called the dummy variable trap. Perfect multicollinearity is introduced, and the least squares regression estimates cannot be obtained, if the number of dummy variables equals the number of possible categories.. EXAMPLE 15-1. INCORPORATING DUMMY VARIABLES. Business Executive Salaries To illustrate the effect of incorporating dummy variables into a regression model, consider the sample data displayed in the scatter plot in Figure 15.9. The population from which the sample was selected consists of executives between the ages of 24 and 60 who are working in U.S. manufacturing businesses. Data for annual salary (y) and age (x1) are available. The objective is to determine whether a model can be generated to explain the variation in annual salary for business executives. Even though age and annual salary are significantly correlated (r 0.686) at the a 0.05 level, the coefficient of determination is only 47%. Therefore, we would likely search for other independent variables that could help us to further explain the variation in annual salary. Suppose we can determine which of the 16 people in the sample had a master of business administration (MBA) degree. Figure 15.10 shows the scatter plot for these same data, with.

(193) CHAPTER 15. FIGURE 15.9. |. 655. Multiple Regression Analysis and Model Building. |. Executive Salary Data—Scatter Plot. $200,000. ANNUAL SALARY VERSUS AGE. y. r = 0.686. Salary. $150,000. $100,000. $50,000. 0. 20. 30. 40. 50. 60. x 70. Age. Salary($). Age. MBA. 65,000 85,000 74,000 83,000 110,000 160,000 100,000 122,000 85,000 120,000 105,000 135,000 125,000 175,000 156,000 140,000. 26 28 36 35 35 40 41 42 45 46 50 51 55 50 61 63. 0 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0. FIGURE 15.10. 10. |. Executive Salary Data Including MBA Variable TABLE 15.2. 0. the MBA data represented by triangles. To incorporate a qualitative variable into the analysis, use the following steps: Step 1 Code the qualitative variable as a dummy variable. Create a new variable, x2, which is a dummy variable coded as x2 1 if MBA, 0 if not The data with the new variable are shown in Table 15.2. Step 2 Develop a multiple regression model with the dummy variables incorporated as independent variables. The two-variable population multiple regression model has the following form: y b0 b1x1 b2x2 ε Using either Excel or Minitab, we get the following regression equation as an estimate of the population model: yˆ 6,974 2,055x1 35,236x2 Because the dummy variable, x2, has been coded 0 or 1 depending on MBA status, incorporating it into the regression model is like having two simple. |. y $200,000. Impact of a Dummy Variable. $180,000 $160,000. MBAs yˆ = 42,210 + 2,055x1. Salary. $140,000 $120,000 $100,000 $80,000. Non–MBAs yˆ = 6,974 + 2,055x1. $60,000 $40,000. b2 = 35,236 = Regression coefficient on the dummy variable. $20,000 0. 0. 10. 20. 30. 40 Age. 50. 60. x 70.

(194) 656. CHAPTER 15. |. Multiple Regression Analysis and Model Building. linear regression lines with the same slopes, but different intercepts. For instance, when x2 0, the regression equation is yˆ 6, 974 2, 055 x1 35, 236(0 ) 6, 974 2, 055 x1 This line is shown in Figure 15.10. However, when x2 1 (the executive has an MBA), the regression equation is yˆ 6, 974 2, 055 x1 35, 236(1) 42, 210 2, 055 x1 This regression line is also shown in Figure 15.10. As you can see, incorporating the dummy variable affects the regression intercept. In this case, the intercept for executives with an MBA degree is $35,236 higher than for those without an MBA. We interpret the regression coefficient on this dummy variable as follows: Based on these data, and holding age (x1) constant, we estimate that executives with an MBA degree make an average of $35,236 per year more in salary than their non–MBA counterparts. >>END EXAMPLE. TRY PROBLEM 15-17 (pg. 659). BUSINESS APPLICATION. Excel and Minitab. tutorials. Excel and Minitab Tutorial. REGRESSION MODELS USING DUMMY VARIABLES. FIRST CITY REAL ESTATE (CONTINUED) The regression model developed in Example 15-l for First City Real Estate showed potential because the overall model was statistically significant. Looking back at Figure 15.8, we see that the model explained nearly 82% of the variation in sales prices for the homes in the sample. All of the independent variables were significant, given that the other independent variables were in the model. However, the standard error of the estimate is $27,350. The managers have decided to try to improve the model. First, they have decided to add a new variable: area. However, at this point, the only area variable they have access to defines whether the home is in the foothills. Because this is a categorical variable with two possible outcomes (foothills or not foothills), a dummy variable can be created as follows: x6 (area) 1 if foothills, 0 if not Of the 319 homes in the sample, 249 were homes in the foothills and 70 were not. Figure 15.11 shows the revised Minitab multiple regression with the variable, area, added. This model is an improvement over the original model because the adjusted R-squared has increased from 81.3% to 90.2% and the standard error of the estimate has decreased from $27,350 to $19,828. The conditional t-tests show that all of the regression models’ slope coefficients, except that for the variable bathrooms, differ significantly from 0. The Minitab output shows that the variance inflation factors are all less than 5.0, so we don’t need to be too concerned about the t-tests understating the significance of the regression coefficients. (See the Excel Tutorial for this example to get the full VIF output from PHStat.) The resulting regression model is yˆ 6, 817 63 . 3(sq. ft.) 334(age) 8, 445( bedroo oms) 949(bathrooms) 26,246(garage) 62,041 (area) Because the variable bathrooms is not significant in the presence of the other variables, we can remove the variable and rerun the multiple regression. The resulting model is Price 7,050 62.5(sq. ft.) 322(age) 8,830(bedrooms) 26,054(garage) 61,370(area).

(195) CHAPTER 15. FIGURE 15.11. |. Multiple Regression Analysis and Model Building. 657. |. Minitab Output—First City Real Estate Revised Regression Model. Minitab Instructions: 1. Open file: First City. MTW. 2. Choose Stat Regression Regression. 3. In Response, enter dependent (y) variable. 4. In Predictors, enter independent (x) variables. 5. Click Options. 6. In Display, select Variance inflation Factors. 7. Click OK. OK.. Dummy variable coefficient. Improved R-square, adjusted R-square, and standard error. Based on the sample data and this regression model, we estimate that a house with the same characteristics (square feet, age, bedrooms, and garages) is worth an average of $61,370 more if it is located in the foothills (based on how the dummy variable was coded). There are still signals of multicollinearity problems. The coefficient on the independent variable bedrooms is negative, when we might expect homes with more bedrooms to sell for more. Also, the standard error of the estimate is still very large ($19,817) and does not provide the precision the managers need to set prices for homes. More work needs to be done before the model is complete. Possible Improvements to the First City Appraisal Model Because the standard error of the estimate is still too high, we look to improve the model. We could start by identifying possible problems: 1. We may be missing useful independent variables. 2. Independent variables may have been included that should not have been included. There is no sure way of determining the correct model specification. However, a recommended approach is for the decision maker to try adding variables or removing variables from the model. We begin by removing the bedrooms variable, which has an unexpected sign on the regression slope coefficient. (Note: If the regression model’s sole purpose is for prediction, independent variables with unexpected signs do not automatically pose a problem and do not necessarily need to be deleted. However, insignificant variables should be deleted.) The resulting model is shown in Figures 15.12a and 15.12b. Now, all the variables in the model have the expected signs. However, the standard error of the estimate has increased slightly. Adding other explanatory variables might help. For instance, consider whether the house has central air conditioning, which might affect sales. If we can identify whether a house has air conditioning, we could add a dummy variable coded as follows: If air conditioning, x7 1 If no air conditioning, x7 0 Other potential independent variables might include a more detailed location variable, a measure of the physical condition, or whether the house has one or two stories. Can you think of others?.

(196) 658. CHAPTER 15. FIGURE 15.12A. |. Multiple Regression Analysis and Model Building. |. Excel 2007 Output for the First City Real Estate Revised Model. Excel 2007 Instructions: 1. Open file: First City.xls (worksheet: HomesSample-2). 2. Click on Data tab—then click on Data Analysis. 3. Select Regression. 4. Define y variable range and x variables range. 5. Click OK.. All variables are significant and have the expected signs.. The First City example illustrates that even though a regression model may pass the statistical tests of significance, it may not be functional. Good appraisal models can be developed using multiple regression analysis, provided more detail is available about such characteristics as finish quality, landscaping, location, neighborhood characteristics, and so forth. The cost and effort required to obtain these data can be relatively high. Developing a multiple regression model is more of an art than a science. The real decisions revolve around how to select the best set of independent variables for the model. FIGURE 15.12B. |. Minitab Output for the First City Real Estate Revised Model. Minitab Instructions: 1. Open file: First City. MTW. 2. Choose Stat Regression Regression. 3. In Response, enter dependent (y) variable. 4. In Predictors, enter independent (x) variables. 5. Click OK.. All variables are significant and have the expected signs..

(197) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 659. MyStatLab. 15-2: Exercises Skill Development 15-12. Consider the following regression model: y b0 b1x1 b2x2 where: x1 A quantitative variable ⎧1 if x1 20 x2 ⎨ ⎩ 0 if x1

(198) 20 The following estimated regression equation was obtained from a sample of 30 observations: yˆ 24.1 5.8 x1 7.9 x2 a. Provide the estimated regression equation for instances in which x1 20. b. Determine the value of yˆ when x1 10. c. Provide the estimated regression equation for instances in which x1 20. d. Determine the value of yˆ when x1 30. 15-13. You are considering developing a regression equation relating a dependent variable to two independent variables. One of the variables can be measured on a ratio scale, but the other is a categorical variable with two possible levels. a. Write a multiple regression equation relating the dependent variable to the independent variables. b. Interpret the meaning of the coefficients in the regression equation. 15-14. You are considering developing a regression equation relating a dependent variable to two independent variables. One of the variables can be measured on a ratio scale, but the other is a categorical variable with four possible levels. a. How many dummy variables are needed to represent the categorical variable? b. Write a multiple regression equation relating the dependent variable to the independent variables. c. Interpret the meaning of the coefficients in the regression equation. 15-15. A real estate agent wishes to estimate the monthly rental for apartments based on the size (square feet) and the location of the apartments. She chose the following model: y b0 b1x1 b2x2 where: x1 Square footage of the apartment ⎧1 if loocated in town center x2 ⎨ ⎩ 0 if not located in toown center This linear regression model was fitted to a sample of size 50 to produce the following regression equation: yˆ 145 1.2 x1 300 x2. a. Predict the average monthly rent for an apartment located in the town center that has 1,500 square feet. b. Predict the average monthly rent for an apartment located in the suburbs that has 1,500 square feet. c. Interpret b2 in the context of this exercise.. Business Applications 15-16. The Polk Utility Corporation is developing a multiple regression model that it plans to use to predict customers’ utility usage. The analyst currently has three quantitative variables (x1, x2, and x3) in the model, but she is dissatisfied with the R-squared and the estimate of the standard deviation of the model’s error. Two variables she thinks might be useful are whether the house has a gas water heater or an electric water heater and whether the house was constructed after the 1974 energy crisis or before. Provide the model she should use to predict customers’ utility usage. Specify the dummy variables to be used, the values these variables could assume, and what each value will represent. 15-17. A study was recently performed by the American Automobile Association in which it attempted to develop a regression model to explain variation in Environmental Protection Agency (EPA) mileage ratings of new cars. At one stage of the analysis, the estimate of the model took the following form: yˆ 34.20 0.003x1 4.56 x2 where: x1 Vehicle weight ⎧1, if standard transmission x2 ⎨ ⎩ 0, if automatic transmission a. Interpret the regression coefficient for variable x1. b. Interpret the regression coefficient for variable x2. c. Present an estimate of a model that would predict the average EPA mileage rating for an automobile with standard transmission as a function of the vehicle’s weight. d. Cadillac’s STS-V with automatic transmission weighs approximately 4,394 pounds. Provide an estimate of the average highway mileage you would expect to obtain from this model. e. Discuss the effect of a dummy variable being incorporated in a regression equation like this one. Use a graph if it is helpful. 15-18. A real estate agent wishes to determine the selling price of residences using the size (square feet) and whether the residence is a condominium or a.

(199) 660. CHAPTER 15. |. Multiple Regression Analysis and Model Building. single-family home. A sample of 20 residences was obtained with the following results:. Price($). Type. Square Feet. Price($). Type. Square Feet. 199,700 211,800 197,100 228,400 215,800 190,900 312,200 313,600 239,000 184,400. Family Condo Family Family Family Condo Family Condo Family Condo. 1,500 2,085 1,450 1,836 1,730 1,726 2,300 1,650 1,950 1,545. 200,600 208,000 210,500 233,300 187,200 185,200 284,100 207,200 258,200 203,100. Condo Condo Family Family Condo Condo Family Family Family Family. 1,375 1,825 1,650 1,960 1,360 1,200 2,000 1,755 1,850 1,630. a. Produce a regression equation to predict the selling price for residences using a model of the following form: yi b0 b1x1 b2x2 ε where: ⎧1 if a condo x1 Square footage and x 2 ⎨ ⎩ 0 if a single-family home b. Interpret the parameters b1 and b2 in the model given in part a. c. Produce an equation that describes the relationship between the selling price and the square footage of (1) condominiums and (2) single-family homes. d. Conduct a test of hypothesis to determine if the relationship between the selling price and the square footage is different between condominiums and single-family homes. 15-19. When cars from Korean automobile manufacturers started coming to the United States they were given very poor quality ratings. That started changing several years ago. J.D. Power and Associates generates a widely respected report on initial quality. The improved quality started being seen in the 2004 Initial Quality Study. Results were based on responses from more than 62,000 purchasers and lessors of newmodel-year cars and trucks, who were surveyed after 90 days of ownership. Initial quality is measured by the number of problems per 100 vehicles (PP100). The PP100 data from the interval 1998–2004 follow: 1998. 1999. 2000. 2001. 2002. 2003. 2004. Korean. 272. 227. 222. 214. 172. 152. 117. Domestic. 182. 177. 164. 153. 137. 135. 123. European. 158. 171. 154. 141. 137. 136. 122. a. Produce a regression equation to predict the PP100 for vehicles in the model yi b0 b1x1 b2x2 ε. where ⎧1 if Domestic ⎧1 if European and x2 ⎨ x1 ⎨ 0 if not Domestic ⎩ ⎩ 0 if not European b. Interpret the parameters b0, b1, and b2 in the model given in part a. c. Conduct a test of hypothesis using the model in part a to determine if the average PP100 is the same for the three international automobile production regions.. Computer Database Exercises 15-20. The Energy Information Administration (EIA), created by Congress in 1977, is a statistical agency of the U.S. Department of Energy. It provides data, forecasts, and analyses to promote sound policymaking and public understanding regarding energy and its interaction with the economy and the environment. One of the most important areas of analysis is petroleum. The file entitled Crude contains data for the period 1991–2006 concerning the price, supply, and demand for fuel. It has been conjectured that the pricing structure of gasoline changed at the turn of the century. a. Produce a regression equation to predict the selling price of gasoline: yi b0 b1x1 ε where: ⎧1 if in twenty-first century x1 ⎨ ⎩ 0 if in twenttieth century b. Conduct a hypothesis test to address the conjecture. Use a significance level of 0.05 and the test statistic approach. c. Produce a 95% confidence interval to estimate the change of the average selling price of gasoline between the twentieth and the twenty-first centuries. 15-21. The Gilmore Accounting firm, in an effort to explain variation in client profitability, collected the data found in the file called Gilmore, where: y Net profit earned from the client x1 Number of hours spent working with the client x2 Type of client: 1, if manufacturing 2, if service 3, if governmental a. Develop a scatter plot of each independent variable against the client income variable. Comment on what, if any, relationship appears to exist in each case. b. Run a simple linear regression analysis using only variable x1 as the independent variable. Describe the resulting estimate fully. c. Test to determine if the number of hours spent working with the client is useful in predicting client profitability..

(200) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 661. math and verbal sections. There had been conjecture about whether a relationship between the average math SAT score and the average verbal SAT score and the gender of the student taking the SAT examination existed. Consider the following relationship: yi b0 b1x1 b2x2 . 15-22. Using the data from the Gilmore Accounting firm found in the data file Gilmore (see Exercise 15-21), a. Incorporate the client type into the regression analysis using dummy variables. Describe the resulting multiple regression estimate. b. Test to determine if this model is useful in predicting the net profit earned from the client. c. Test to determine if the number of hours spent working with the client is useful in this model in predicting the net profit earned from a client. d. Considering the tests you have performed, construct a model and its estimate for predicting the net profit earned from the client. e. Predict the average difference in profit if the client is governmental versus one in manufacturing. Also state this in terms of a 95% confidence interval estimate. 15-23. Several previous problems have dealt with the College Board changing the format of the SAT test taken by many entering college freshmen. Many reasons were given for changing the format. The class of 2005 was the last to take the former version of the SAT, featuring. where: ⎧1 if female x1 Average verbal SAT score and x2 ⎨ ⎩0 if male a. Use the file MathSAT to compute the linear regression equation to predict the average math SAT score using the gender and the average verbal SAT score of the students taking the SAT examination. b. Interpret the parameters in the model. c. Conduct a hypothesis test to determine if the gender of the student taking the SAT examination is a significant predictor of the student’s average math SAT score for a given average verbal SAT score. d. Predict the average math SAT score of female students with an average verbal SAT score of 500. END EXERCISES 15-2. Chapter Outcome 7.. 15.3 Working with Nonlinear. Relationships Section 14.1 in Chapter 14 showed there are a variety of ways in which two variables can be related. Correlation and regression analysis techniques are tools for measuring and modeling linear relationships between variables. Many situations in business have a linear relationship between two variables, and regression equations that model that relationship will be appropriate to use in these situations. However, there are also many instances in which the relationship between two variables will be curvilinear, rather than linear. For instance, demand for electricity has grown at an almost exponential rate relative to the population growth in some areas. Advertisers believe that a diminishing returns relationship will occur between sales and advertising if advertising is allowed to grow too large. These two situations are shown in Figures 15.13 and 15.14, respectively. They represent just two of the great many possible curvilinear relationships that could exist between two variables. As you will soon see, models with nonlinear relationships become more complicated than models showing only linear relationships. Although complicated models are sometimes. |. Exponential Relationship of Increased Demand for Electricity versus Population Growth. Electricity Demand. FIGURE 15.13. 25 20 15 10 5 0. 0. 5. 10. 15 Population. 20. 25.

(201) CHAPTER 15. FIGURE 15.14. |. Multiple Regression Analysis and Model Building. |. Diminishing Returns Relationship of Advertising versus Sales. 5 4 Sales. 662. 3 2 1 0. 0. 5. 10. 15. 20. 25. Advertising. necessary, decision makers should use them with caution for several reasons. First, management researchers and authors have written that people use decision aids they understand and don’t use those they don’t understand. So, the more complicated a model is, the less likely it is to be used. Second, the scientific principle of parsimony suggests using the simplest model possible that provides a reasonable fit of the data, because complex models typically do not reflect the underlying phenomena that produce the data in the first place. This section provides a brief introduction to how linear regression analysis can be used in dealing with curvilinear relationships. To model such curvilinear relationships, we must incorporate terms into the multiple regression model that will create “curves” in the model we are building. Including terms whose independent variable has an exponent larger than 1 generates these curves. When a model possesses such terms we refer to it as a polynomial model. The general equation for a polynomial with one independent variable is given in Equation 15.11. Polynomial Population Regression Model y b0 b1x b2x2 . . . bpx p . (15.11). where: b0 Population regression’s constant bj Population’s regression coefficient for variable x j; j 1, 2, . . . , p p Order (or degree) of the polynomial Model error The order, or degree, of the model is determined by the largest exponent of the independent variable in the model. For instance, the model y b0 b1x b2x2 is a second-order polynomial because the largest exponent in any term of the polynomial is 2. You will note that this model contains terms of all orders less than or equal to 2. A polynomial with this property is said to be a complete polynomial. Therefore, the previous model would be referred to as a complete second-order regression model. A second-order model produces a parabola. The parabola either opens upward (b2 0) or downward (b2 0), shown in Figure 15.15. You will notice that the models in Figures 15.13, 15.14, and 15.15 possess a single curve. As more curves appear in the data, the order of the polynomial must be increased. A general (complete) third-order polynomial is given by the equation y b0 b1x b2x2 b3x3 This model produces a curvilinear model that reverses the direction of the initial curve to produce a second curve, as shown in Figure 15.16. Note that there are two curves in the thirdorder model. In general, a pth-order polynomial will exhibit p 1 curves. Although polynomials of all orders exist in the business sector, perhaps second-order polynomials are the most common. Sharp reversals in the curvature of a relationship between.

(202) CHAPTER 15. FIGURE 15.15. |. Multiple Regression Analysis and Model Building. 663. | y. Second-Order Regression Models. 2 > 0. 0 2 < 0 x. 0. variables in the business environment usually point to some unexpected or, perhaps severe, changes that were not foreseen. The vast majority of organizations try to avoid such reverses. For this reason, and the fact that this is an introductory business statistics course, we will direct most of our attention to second-order polynomials. The following examples illustrate two of the most common instances in which curvilinear relationships can be used in decision making. They should give you an idea of how to approach similar situations. Chapter Outcome 7.. EXAMPLE 15-2. MODELING CURVILINEAR RELATIONSHIPS. Ashley Investment Services Ashley Investment Services was severely shaken by the downturn in the stock market during the summer and fall of 2008. To maintain profitability and save as many jobs as possible, since then everyone has been extra busy analyzing new investment opportunities. The director of personnel has noticed an increased number of people suffering from “burnout,” in which physical and emotional fatigue hurt job performance. Although he cannot change the job’s pressures, he has read that the more time a person spends socializing with coworkers away from the job, the more likely there is to be a higher degree of burnout. With the help of the human resources lab at the local university, the personnel director has administered a questionnaire to company employees. A burnout index has been computed from the responses to the survey. Likewise, the survey responses are used to determine quantitative measures of socialization. Sample data from questionnaires are contained in the file Ashley. The following steps. FIGURE 15.16. |. Third-Order Regression Models. y. 3 > 0. 3 < 0. 0. xx.

(203) 664. CHAPTER 15. |. Multiple Regression Analysis and Model Building. can be used to model the relationship between the socialization index and the burnout index for Ashley employees: Step 1 Specify the model by determining the dependent and potential independent variables. The dependent variable is the burnout index. The company wishes to explain the variation in burnout level. One potential independent variable is the socialization index. Step 2 Formulate the model. We begin by proposing that a linear relationship exists between the two variables. Figures 15.17a and 15.17b show the linear regression analysis results using Excel and Minitab. The correlation between the two variables is r 0.818, which is statistically different from zero at any reasonable significance level. The estimate of the population linear regression model shown in Figure 15.17a is yˆ 66.164 9.589 x Step 3 Perform diagnostic checks on the model. The sample data and the regression line are plotted in Figure 15.18. The line appears to fit the data. However, a closer inspection reveals instances where. FIGURE 15.17A. |. Excel 2007 Output of a Simple Linear Regression for Ashley Investment Services. Regression coefficients. Excel 2007 Instructions:. 1. 2. 3. 4.. Open file: Ashley.xls. Select Data Data Analysis. Select Regression. Specify y variable range and x variable range (include labels).. 5. Check Labels option. 6. Specify output location. 7. Click OK..

(204) CHAPTER 15. FIGURE 15.17B. |. Multiple Regression Analysis and Model Building. 665. |. Minitab Output of a Simple Linear Regression for Ashley Investment Services. Minitab Instructions: 1. Open file: Ashley.MTW. 2. Choose Stat Regression Regression. 3. In Response, enter the y variable column. 4. In Predictors, enter the x variable column. 5. Click OK.. Regression coefficients. several consecutive points lie above or below the line. The points are not randomly dispersed around the regression line, as should be the case given the regression analysis assumptions. As you will recall from earlier discussions, we can use an F-test to test whether a regression model explains a significant amount of variation in the dependent variable. H0: r2 0 HA: r2 0 From the output in Figure 15.17a, F 36.43 which has a p-value ≈ 0.0000. Thus, we conclude that the simple linear model is statistically significant. However, we should also examine the data to determine if any curvilinear relationships may be present.. FIGURE 15.18. | 1,200. Plot of Regression Line for the Ashley Investment Services Example. y. Burnout Index. 1,000 yˆ = –66.164 + 9.589x R2 = 0.6693. 800 600 400 200 0. 0. 20. 40 60 Socialization Measure. 80. x 100.

(205) 666. CHAPTER 15. FIGURE 15.19A. |. Multiple Regression Analysis and Model Building. |. Excel 2007 Output of a Second-Order Polynomial Fit for Ashley Investment. R-squared. Regression Coefficients. Excel 2007 Instructions: 1. Open file: Ashley.xls. 2. Use Excel equations to create the new variable in column C (i.e. for the first data value use = A2^2. Then copy down). 3. Select Data > Data Analysis. 4. Select Regression.. FIGURE 15.19B. |. Minitab Output of a SecondOrder Polynomial Fit for Ashley Investment. Minitab Instructions: 1. Open file: Ashley.MTW. 2. Use Calc Calculator to create socialization column 3. Choose Stat Regression Fitted Line Plot. 4. In Response, enter y variable. 5. In Predictor, enter x variables. 6. Under Type of Regression Model, choose Quadratic. 7. Click OK.. R-squared. vv. 5. Specify y variable range and x variable range (include the new variable and the labels). 6. Check Labels option. 7. Specify ouput location. 8. Click OK..

(206) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 667. Step 4 Model the curvilinear relationship. Finding instances of nonrandom patterns in the residuals for a regression model indicates the possibility of using a curvilinear relationship rather than a linear one. One possible approach to modeling the curvilinear nature of the data in the Ashley Investments example is with the use of polynomials. From Figure 15.18, we can see that there appears to be one curve in the data. This suggests fitting the second-order polynomial y b0 b1x b2x2 Before fitting the estimate for this population model, you will need to create the new independent variable by squaring the socialization measure variable. In Excel use the formula option or in Minitab use the Calc Calculator command to create the new variable. Figures 15.19a and 15.19b show the output after fitting this second-order polynomial model. Step 5 Perform diagnostics on the revised curvilinear model. Notice the second-order polynomial provides a model whose estimated regression equation has an R2 of 74.1%. This is higher than the R2 of 66.9% for the linear model. Figure 15.20 shows the plot of the second-order polynomial model. Comparing Figure 15.20 with Figure 15.18, we can see that the polynomial model does appear to fit the sample data better than the linear model. >>END EXAMPLE. TRY PROBLEM 15-24 (pg. 675). Analyzing Interaction Effects BUSINESS APPLICATION Excel and Minitab. tutorials. Excel and Minitab Tutorial. DEALING WITH INTERACTION. ASHLEY INVESTMENT SERVICES (CONTINUED) Referring to Example 15-3 involving Ashley Investment Services, the director of personnel wondered if the effects of burnout differ among male and female workers. He therefore identified the gender of the previously surveyed employees (see file Ashley-2). A multiple scatter plot of the data appears as Figure 15.21. The personnel director tried to determine the relationship between the burnout index and socialization measure for men and women. The graphical result is presented in Figure 15.21. Note that both relationships appear to be curvilinear, with a similarly shaped curve. As we showed earlier, curvilinear shapes often can be modeled by the second-order polynomial yˆ b0 b1 x1 b2 x12. FIGURE 15.20. | 1,200. Plot of Second-Order Polynomial Model for Ashley Investment. y. Second-Degree Polynomial. Burnout Index. 1,000 800 600 400 200 0. 0. 10. 20. 30. 40 50 60 70 Socialization Measure. 80. 90. x 100.

(207) 668. CHAPTER 15. FIGURE 15.21. |. Multiple Regression Analysis and Model Building. |. Excel 2007 Multiple Scatter Plot for Ashley Investment Services. Male. Female. Excel 2007 Instructions: corresponding to females (rows 2–11). 1. Open file: Ashley-2.xls. For Series Y Values select data from the 2. Select the Socialization Measure and Burnout column corresponding to females Burnout Index columns. (rows 2–11). 3. Select the Insert tab. 8. Repeat step 7 for males. 4. Select the XY (Scatter). 5. Select Chart and click the right mouse 9. Click on layout tab to remove legend and to add chart and axis titles. button—choose Select Data. 10. Select data points for males—right click 6. Click on Add on the Legend Entry and select Add Trendline Exponential. (Series) section. 11. Repeat step 10 for females. 7. Enter Series Name—(Females)— for Series X Values select data from Socialization column for row. However, the regression equations that estimate this second-order polynomial for men and women are not the same. The two equations seem to have different locations and different rates of curvature. Whether an employee is a man or woman seems to change the basic relationship between burnout index (y) and socialization measure (x1). To represent this difference, the equation’s coefficients b0, b1, and b2 must be different for male and female employees. Thus, we could use two models, one for each gender. Alternatively, we could use one model for both male and female employees by incorporating a dummy independent variable with two levels, which is shown as x2 1 if male, 0 if female As x2 changes values from 0 to 1, it affects the values of the coefficients b0, b1, and b2. Suppose the director fitted the second-order model for the female employees only. He obtained the following regression equation: yˆ 291.70 4.62x1 0.102 x12 The equation for only male employees was yˆ 149.59 4.40x1 0.160 x12.

(208) CHAPTER 15. Interaction The case in which one independent variable (such as x2 ) affects the relationship between another independent variable (x1) and the dependent variable ( y ).. |. Multiple Regression Analysis and Model Building. 669. To explain how a change in gender can cause this kind of change, we must introduce interaction. In our example, gender (x2) interacts with the relationship between socialization measure (x1) and burnout index (y). The question is how do we obtain the interaction terms to model such a relationship? To answer this question, we first obtain the model for the basic relationship between the x1 and the y variables. The population model is y b0 b1x1 b2 x12 To obtain the interaction terms, multiply the terms on the right-hand side of this model by the variable that is interacting with this relationship between y and x1. In this case, that interacting variable is x2. Then the interaction terms would be b3 x2 b4 x1 x2 b5 x12 x2. Composite Model The model that contains both the basic terms and the interaction terms.. Notice that we have changed the coefficient subscripts so we do not duplicate those in the original model. Then the interaction terms are added to the original model to produce the composite model. y b0 b1 x1 b2 x12 b3 x2 b4 x1 x2 b5 x12 x2 Note, the model for women is obtained by substituting x2 0 into the composite model. This gives y b0 b1 x1 b2 x12 b3 (0) b4 x1 (0) b5 x12 (0) b0 b1 x1 b2 x12 Similarly, for men we substitute the value of x2 1. The model then becomes y b0 b1 x1 b2 x12 b3 (1) b4 x1 (1) b5 x12 (1) ( b0 + b3 ) ( b1 + b4 ) x1 ( b2 b5 ) x12 This illustrates how the coefficients are changed for different values of x2 and, therefore, how x2 is interacting with the relationship between x1 and y. Once we know b3, b4, and b5, we know the effect of the interaction of gender on the original relationship between the burnout index (y) and the socialization measure (x1). To estimate the composite model, we need to. FIGURE 15.22. |. Excel 2007 Data Preparation for Estimating Interactive Effects for Second-Order Model for Ashley Investment. Excel 2007 Instructions: 1. Open file: Ashley-2.xls. 2. Use Excel formulas to create new variables in columns C, E, and F..

(209) 670. CHAPTER 15. FIGURE 15.23A. |. Multiple Regression Analysis and Model Building. |. Excel 2007 Composite Model for Ashley Investment Services Excel 2007 Instructions: 1. Open file: Ashley-2.xls. 2. Create new variables (see Figure 15.22 Excel 2007 Instructions). 3. Rearrange columns so all x variables are contiguous. 4. Select Data Data Analysis. 5. Select Regression. 6. Specify y variable range and x variable range (include the new variables and the labels). 7. Check Labels option. 8. Specify output location. 9. Click OK.. Regression coefficients for the composite model. create the required variables, as shown in Figure 15.22. Figures 15.23a and 15.23b show the regression for the composite model. The estimate for the composite model is yˆ 291.706 4.615 x1 0.102 x12 142.113x2 0.215 x1 x2 0.058 x12 x2 We obtain the model for females by substituting x2 0, giving yˆ 291.706 4.615 x1 0.102 x12 142.113(0 ) 0.215 x1 (0 ) 0.058 x12 (0 ) yˆ 291.706 4.615 x1 0.102 x12 FIGURE 15.23B. |. Minitab Composite Model for Ashley Investment Services. Minitab Instructions: 1. Continue from Figure 15.19b. 2. Choose Stat Regression Regression. 3. In Response, enter dependent (y) variable. 4. In Predictors, enter independent (x) variables. 5. Click OK.. Regression coefficients for the composite model.

(210) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 671. For males, we substitute x2 1, giving yˆ 291.706 4.615 x1 0.102 x12 142.113(1) 0.215 x1 (1) 0.058 x12 (1) yˆ 149.593 4.40 x1 0.160 x12 Note that these equations for male and female employees are the same as those we found earlier when we generated two separate regression models, one for each gender. In this example we have looked at a case in which a dummy variable interacts with the relationship between another independent variable and the dependent variable. However, the interacting variable need not be a dummy variable. It can be any independent variable. Also, strictly speaking, interaction is not said to exist if the only effect of the interaction variable is to change the y intercept of the equation relating another independent variable to the dependent variable. Therefore, when you examine a scatter plot to detect interaction, you are trying to determine if the relationships produced, when the interaction variable changes values, are parallel or not. If the relationships are parallel, that indicates that only the y intercept is being affected by the change of the interacting variable and that interaction does not exist. Figure 15.24 demonstrates this concept graphically.. The Partial-F Test So far you have been given the procedures required to test the significance of either one or all of the coefficients in a regression model. For instance, in Example 15-3 a hypothesis test was used to determine that a second-order model involving the socialization measure fit the sample data better than the linear model. Testing H0: b2 0 was the mechanism used to establish this. We could have determined whether both the linear and quadratic components were useful in predicting the burnout index level by testing the hypothesis H0: b1 b2 0. However, more complex models occur. The interaction model involving Ashley Investment Services containing five predictor variables was 2 b x b x x b x2 x yi b0 b1x1i b2x1i 3 2i 4 1i 2i 5 1i 2i. FIGURE 15.24. |. Graphical Evidence of Interaction y. y. x2 = 1. x2 = 1. x2 = 0. x2 = 0. x1 (a) First-order polynomial without interaction y. x1 (b) First-order polynomial with interaction y x2 = 11.3. x2 = 11.3 x2 = 13.2. x2 = 13.2 x1 (c) Second-order polynomial without interaction. x1 (d) Second-order polynomial with interaction.

(211) 672. CHAPTER 15. |. Multiple Regression Analysis and Model Building. Two of these predictor variables (i.e., x1i x2i and x12i x2i) existing in the model would indicate interaction is evident in this regression model. If the two interaction variables were absent, the model would be yi b0 b1x1i b2x21i b3 x2i To determine whether there is statistical evidence of interaction, we must determine if the coefficients of the interaction terms are all equal to 0. If they are, there is no interaction. Otherwise, at least some interaction exists. For the Ashley Investment example, we test the hypotheses H0: b4 b5 0 HA: At least one of the bis 0 Earlier in this chapter we introduced procedures for testing whether all of the coefficients of a model equaled 0. In that case, you use the analysis of variance F-test found on both Excel and Minitab output. However, to test whether there is significant interaction, we must test more than one but fewer than all the regression coefficients. The method for doing this is the partial-F test. This test relies on the fact that if given a choice between two models, one model is a better fit if its sum of squares of error (SSE) is significantly smaller than for the other model. Therefore, to determine if interaction exists in our model, we must obtain the SSE for the model with the interaction terms and for the model without the interaction terms. The model without the interaction terms is called the reduced model. The model containing the interaction terms is called the complete model. We will denote the respective sum of squares as SSER and SSEC. It is important to note that the procedure is appropriate to test any subset greater than one but less than all of the model’s coefficients. We use the interaction terms in this example just as one such procedure. There are many models not containing interaction terms in which the partial-F test is applicable. The test is based on the concept that the SSE will be significantly reduced if not all of the regression coefficients being tested equal zero. Of course if the SSE is significantly reduced, then SSER SSEC must be significantly different from zero. To determine if this difference is significantly different from zero we use the partial-F test statistic given by Equation 15.12.. Partial-F Test Statistic F. ( SSE R SSEC ) / (c r ) MSEC. (15.12). where: MSEC Mean square error for the complete model SSEC /(n c 1) r The number of coefficients in the reduced model c The number of coefficients in the complete model n Sample size. The numerator of this test statistic is basically the average SSE per degree of freedom reduced by including the coefficients being tested in the model. This is compared to the average SSE per degree of freedom for the complete model. If these averages are significantly different, then the null hypothesis is rejected. This test statistic has an F-distribution whose numerator degrees of freedom equals the number of parameters being tested (c r) and whose denominator degrees of freedom equals the degrees of freedom for the complete model (n c 1). We are now prepared to determine if the director’s data indicate a significant interaction between gender and the relationship between the Socialization Measure and the Burnout Index. In order to conduct the test of hypothesis, the director produced regression equations.

(212) CHAPTER 15. FIGURE 15.25A. |. |. Multiple Regression Analysis and Model Building. 673. Sum of Squares for the Complete Model. SSEC. MSEC. for both models (Figures 15.25a and 15.25b). He obtained the SSEC, MSEC from Figure 15.25a, and SSER from Figure 15.25c. He was then able to conduct the hypothesis test to determine if there was any interaction. Figure 15.25c displays this test. Since the null hypothesis was rejected we can conclude that interaction does exist in this model. Apparently, gender of the employee does affect the relationship between the Burnout Index and the Socialization Measure. The relationship between the Burnout Index and the Socialization Measure is different within men and women. You must be very careful with interpretations of regression coefficients when interaction exists. Notice that the equation that contains interaction terms is given by yˆi 292 4.61x1i 0.102x21i 142x2i 0.2x1i x2i 0.058x21ix2i When interpreting the coefficient b1, you may be tempted to say that the Burnout Index will decrease by an average of 4.61 units for every unit the Socialization Measure (x1i) increases, holding all other predictor variables constant. However, this is not true, there are three other components of this regression equation that contain x1i. When x1i increases by one unit x21i will also increase. In addition, the interaction terms also contain x1i, and therefore, those terms will change as well. This being the case, every time the variable x2 changes, the rate of change of the interaction terms are also affected. Perhaps you will see this more clearly if we rewrite the equation as yˆi (292 142x2i) (0.2x2i 4.61)x1i (0.102 0.058x2i) x21i 2 change whenever x changes. In this form you can see that the coefficients of x1i and x1i 2 Thus, the interpretation of any of these components depends on the value x2, as well as x1i. Whenever interaction or higher order components are present you should be very careful in your attempts to interpret the results of your regression analysis..

(213) 674. CHAPTER 15. FIGURE 15.25B. |. |. Multiple Regression Analysis and Model Building. Sum of Squares for the Reduced Model. SSER. FIGURE 15.25C. |. H0: 4 5 0 HA: At least one of the i s 0. Partial-F Hypothesis Test for Interaction. 0.05 Test Statistic: F. (SSER SSEC)/(c r) MSEC. . (231,845 127,317)/(5 3) 9,094. 5.747. Rejection Region: d.f.: D1 c r 5 3 2 D2 (n c 1) 20 5 1 14. Rejection Region. F 3.739 Because Partial-F 5.747 > 3.739, we reject H0.. 0.05.

(214) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 675. MyStatLab. 15-3: Exercises Skill Development. 15-27. Examine the following two sets of data:. 15-24. Consider the following values for the dependent and independent variables:. When x2 1 x1. y. When x2 2 x1. y. x. y. x. y. 1. 2. 2. 3. 5 15 40. 10 15 25. 50 60 80. 44 79 112. 4 5 7 8 12 11 14 19 20. 15 23 52 60 154 122 200 381 392. 3 6 7 9 10 14 13 16 21. 9 5 10 48 50 87 51 63 202. a. Develop a scatter plot of the data. Does the plot suggest a linear or nonlinear relationship between the dependent and independent variables? b. Develop an estimated linear regression equation for the data. Is the relationship significant? Test at an a 0.05 level. c. Develop a regression equation of the form yˆ b0 b1 x b2 x 2. Does this equation provide a better fit to the data than that found in part b? 15-25. Consider the following values for the dependent and independent variables: x 6 9 14. y. x. y. 5 20 28. 18 22 27. 30 33 35. a. Develop a scatter plot of the data. Does the plot suggest a linear or nonlinear relationship between the dependent and independent variables? b. Develop an estimated linear regression equation for the data. Is the relationship significant? Test at an a 0.05 level. c. Develop a regression equation of the form yˆ b0 b1 ln(x ). Does this equation provide a better fit to the data than that found in part b? 15-26. Examine the following data: x 2 8 9 12 15 y 4 75 175 415 620. 22 21 25 37 39 7,830 7,551 7,850 11,112 11,617. a. Construct a scatter plot of the data. Determine the order of the polynomial that is represented by the data. b. Obtain an estimate of the model identified in part a. c. Conduct a test of hypothesis to determine if a thirdorder, as opposed to a second-order, polynomial is a better representation of the relationship between y and x. Use a significance level of 0.05 and the p-value approach.. a. Produce a distinguishable scatter plot for each of the data sets on the same graph. Does it appear that there is interaction between x2 and the relationship between y and x1? Support your assertions. b. Consider the following model to represent the relationship among y, x1, and x2: yi b0 b1 x1 b2 x12 b3 x1 x2 b4 x12 x2 Produce the estimated regression equation for this model. c. Conduct a test of hypothesis for each interaction term. Use a significance level of 0.05 and the p-value approach. d. Based on the two hypothesis tests in part c, does it appear that there is interaction between x2 and the relationship between y and x1? Support your assertions. 15-28. Consider the following data: x 1 4 5 7 8 12 11 14 19 20 y 1 54 125 324 512 5,530 5,331 5,740 7,058 7,945. a. Construct a scatter plot of the data. Determine the order of the polynomial that is represented by this data. b. Obtain an estimate of the model identified in part a. c. Conduct a test of hypothesis to determine if a thirdorder, as opposed to a first-order, polynomial is a better representation of the relationship between y and x. Use a significance level of 0.05 and the p-value approach. 15-29. A regression equation to be used to predict a dependent variable with four independent variables is.

(215) 676. CHAPTER 15. |. Multiple Regression Analysis and Model Building. developed from a sample of size 10. The resulting equation is yˆ 32.8 0.470x1 0.554x2 4.77x3 0.929x4 Two other equations are developed from the sample: yˆ 12.4 0.60x1 1.60x2 and yˆ 49.7 5.38x3 1.35x4 The respective sum of squares errors for the three equations are 201.72, 1,343, and 494.6. a. Use the summary information to determine if the independent variables x3 and x4 belong in the complete regression model. Use a significance level of 0.05. b. Repeat part a for the independent variables x1 and x2. Use the p-value approach and a significance level of 0.05.. Computer Database Exercises 15-30. In a bit of good news for male students, American men have closed the gap with women on life span, according to a USA Today article. Male life expectancy attained a record 75.2 years, and women’s reached 80.4. The National Center for Health Statistics provided the data given in the file entitled Life. a. Produce a scatter plot depicting the relationship between the life expectancy of women and men. b. Determine the order of the polynomial that is represented on the scatter plot obtained in part a. Produce the estimated regression equation that represents this relationship. c. Determine if women’s average life expectancy can be used in a second-order polynomial to predict the average life expectancy of men. Use a significance level of 0.05. d. Use the estimated regression equation computed in part b to predict the average length of life of men when women’s length of life equals 100. What does this tell you about the wisdom (or lack thereof) of extrapolation in regression models? 15-31. The Gilmore Accounting firm previously mentioned, in an effort to explain variation in client profitability, collected the data found in the file called Gilmore, where: y Net profit earned from the client x1 Number of hours spent working with the client x2 Type of client: 1, if manufacturing 2, if service 3, if governmental Gilmore has asked if it needs the client type in addition to the number of hours spent working with the client to predict the net profit earned from the client. You are asked to provide this information. a. Fit a model to the data that incorporates the number of hours spent working with the client and the type of client as independent variables. (Hint: Client type has three levels.). b. Fit a second-order model to the data, again using dummy variables for client type. Does this model provide a better fit than that found in part a? Which model would you recommend be used? 15-32. McCullom’s International Grains is constantly searching out areas in which to expand its market. Such markets present different challenges since tastes in the international market are often different from domestic tastes. India is one country on which McCullom’s has recently focused. Paddy is a grain used widely in India, but its characteristics are unknown to McCullom’s. Charles Walters has been assigned to take charge of the handling of this grain. He has researched its various characteristics. During his research he came across an article, “Determination of Biological Maturity and Effect of Harvesting and Drying Conditions on Milling Quality of Paddy” [Journal of Agricultural Engineering Research (1975), pp. 353–361], which examines the relationship between y, the yield (kg/ha) of paddy, as a function of x, and the number of days after flowering at which harvesting took place. The accompanying data appeared in the article and are in a file called Paddy. y 2,508 2,518 3,304 3,423 3,057 3,190 3,500 3,883. x. y. x. 16 18 20 22 24 26 28 30. 3,823 3,646 3,708 3,333 3,517 3,241 3,103 2,776. 32 34 36 38 40 42 44 46. a. Construct a scatter plot of the yield (kg/ha) of paddy as a function of the number of days after flowering at which harvesting took place. Display at least two models that would explain the relationship you see in the scatter plot. b. Conduct tests of hypotheses to determine if the models you selected are useful in predicting the yield of paddy. c. Consider the model that includes the second-order term x2. Would a simple linear regression model be preferable to the model containing the second-order term? Conduct a hypothesis test using the p-value approach to arrive at your answer. d. Which model should Charles use to predict the yield of paddy? Explain your answer. 15-33. The National Association of Realtors Existing-Home Sales Series provides a measurement of the residential real estate market. One of the measurements it produces is the Housing Affordability Index (HAI). It is a measure of the financial ability of U.S. families to buy a house: 100 means that families earning the national median income have just the amount of money needed to qualify for a mortgage on a median-priced.

(216) CHAPTER 15. home; higher than 100 means they have more than enough, and lower than 100 means they have less than enough. The file entitled Index contains the HAI and associated variables. a. Construct a scatter plot relating the HAI to the median family income. b. Determine the order of the polynomial that is suggested by the scatter plot produced in part a. Obtain the estimated regression equation of the polynomial selected. c. Determine if monthly principle and interest interacts with the relationship between the HAI and the median family income indicated in part b. (Hint: You may need to conduct more than one hypothesis test.) Use a significance level of 0.05 and the p-value approach. 15-34. Badeaux Brothers Louisiana Treats ships packages of Louisiana coffee, cakes, and Cajun spices to individual customers around the United States. The cost to ship these products depends primarily on the weight of the package being shipped. Badeaux charges the customers for shipping and then ships the product itself. As a part of a study of whether it is economically feasible to continue to ship products itself, Badeaux sampled 20 recent shipments to determine what, if any, relationship exists between shipping costs and package weight. These data are in the file called Badeaux. a. Develop a scatter plot of the data with the dependent variable, cost, on the vertical axis and the independent variable, weight, on the horizontal axis. Does there appear to be a relationship between the two variables? Is the relationship linear? b. Compute the sample correlation coefficient between the two variables. Conduct a test, using a significance level of 0.05, to determine whether the population correlation coefficient is significantly different from zero. c. Badeaux Brothers has been using a simple linear regression equation to predict the cost of shipping various items. Would you recommend it use a second-order polynomial model instead? Is the second-order polynomial model a significant improvement on the simple linear regression equation? d. Badeaux Brothers has made a decision to stop shipping products if the shipping charges exceed $100. The company has asked you to determine the maximum weight for future shipments. Do this for both the first- and second-order models you have developed. 15-35. The National Association of Theatre Owners is the largest exhibition trade organization in the world, representing more than 26,000 movie screens in all 50 states and in more than 20 countries worldwide. Its membership includes the largest cinema chains and hundreds of independent theater owners. It publishes statistics concerning the movie sector of the economy. The file entitled Flicks contains data on total U.S. box. |. Multiple Regression Analysis and Model Building. 677. office grosses ($billion), total number of admissions (billion), average U.S. ticket price ($), and number of movie screens. One concern is the effect the increasing ticket prices have on the number of individuals who go to the theaters to view movies. a. Construct a scatter plot depicting the relationship between the total number of admissions and the U.S. ticket price. b. Determine the order of the polynomial that is suggested by the scatter plot produced in part a. Obtain the estimated regression equation of the polynomial selected. c. Examine the p-value associated with the F-test for the polynomial you have selected in part a. Relate these results to those of the t-tests for the individual parameters and the adjusted coefficient of determination. To what is this attributed? d. Conduct t-tests to remove higher order components until no components can be removed. 15-36. The Energy Information Administration (EIA), created by Congress in 1977, is a statistical agency of the U.S. Department of Energy. It provides data, forecasts, and analyses to promote sound policymaking and public understanding regarding energy and its interaction with the economy and the environment. One of the most important areas of analysis is petroleum. The file entitled Crude contains data for the period 1991–2006 concerning the price, supply, and demand for fuel. One concern has been the increase in imported oil into the United States. a. Examine the relationship between price of gasoline and the annual amount of imported crude oil. Construct a scatter plot depicting this relationship. b. Determine the order of the polynomial that would fit the data displayed in part a. Express “Imports” in millions of gallons, i.e., 3,146,454/1,000,000 3.146454. Produce an estimate of this polynomial. c. Is a linear or quadratic model more appropriate for predicting the price of gasoline using the annual quantity of imported oil? Conduct the appropriate hypothesis test to substantiate your answer. 15-37. The National Association of Realtors Existing-Home Sales Series provides a measurement of the residential real estate market. One of the measurements it produces is the Housing Affordability Index (HAI). It is a measure of the financial ability of U.S. families to buy a house. A value of 100 means that families earning the national median income have just the amount of money needed to qualify for a mortgage on a median-priced home; higher than 100 means they have more than enough and lower than 100 means they have less than enough. The file entitled INDEX contains the HAI and associated variables. a. Construct a second order of the polynomial relating the HAI to the median family income. b. Conduct a test of hypothesis to determine if this polynomial is useful in predicting the HAI. Use a p-value approach and a significance level of 0.01..

(217) 678. CHAPTER 15. |. Multiple Regression Analysis and Model Building. c. Determine if monthly principle and interest interacts with the relationship between the HAI and the median family income indicated in part b. Use a significance level of 0.01. Hint: You must produce another regression equation with the interaction terms inserted. You must then use the appropriate test to determine if the interaction terms belong in this latter model. 15.38. An investment analyst collected data of 20 randomly chosen companies. The data consisted of the 52-week-high stock prices, PE ratio, and the market value of the company. These data are in the file entitled INVESTMENT. The analyst wishes to produce a. regression equation to predict the market value using the 52-week-high stock price and the PE ratio of the company. He creates a complete second-degree polynomial. a. Construct an estimate of the regression equation using the indicated variables. b. Determine if any of the quadratic terms are useful in predicting the average market value. Use a p-value approach with a significance level of 0.10. c. Determine if any of the PE ratio terms are useful in predicting the average market value. Use a test statistic approach with a significance level of 0.05. END EXERCISES 15-3. Chapter Outcome 8.. 15.4 Stepwise Regression One option in regression analysis is to bring all possible independent variables into the model in one step. This is what we have done in the previous sections. We use the term full regression to describe this approach. Another method for developing a regression model is called stepwise regression. Stepwise regression, as the name implies, develops the least squares regression equation in steps, either through forward selection, backward elimination, or standard stepwise regression.. Forward Selection. Coefficient of Partial Determination The measure of the marginal contribution of each independent variable, given that other independent variables are in the model.. The forward selection procedure begins (Step 1) by selecting a single independent variable from all those available. The independent variable selected at Step 1 is the variable that is most highly correlated with the dependent variable. A t-test is used to determine if this variable explains a significant amount of the variation in the dependent variable. At Step 1, if the variable does explain a significant amount of the dependent variable’s variation, it is selected to be part of the final model used to predict the dependent variable. If it does not, the process is terminated. If no variables are found to be significant, the researcher will have to search for different independent variables than the ones already tested. In the next step (Step 2), a second independent variable is selected based on its ability to explain the remaining unexplained variation in the dependent variable. The independent variable selected in the second, and each subsequent, step is the variable with the highest coefficient of partial determination. Recall that the coefficient of determination (R2) measures the proportion of variation explained by all of the independent variables in the model. Thus, after the first variable (say, x1) is selected, R2 will indicate the percentage of variation explained by this variable. The forward selection routine will then compute all possible two-variable regression models, with x1 included, and determine the R2 for each model. The coefficient of partial determination at Step 2 is the proportion of the as yet unexplained variation (after x1 is in the model) that is explained by the additional variable. The independent variable that adds the most to R2, given the variable(s) already in the model, is the one selected. Then, a t-test is conducted to determine if the newly added variable is significant. This process continues until either all independent variables have been entered or the remaining independent variables do not add appreciably to R2. For the forward selection procedure, the model begins with no variables. Variables are entered one at a time, and after a variable is entered, it cannot be removed..

(218) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 679. Backward Elimination Backward elimination is just the reverse of the forward selection procedure. In the backward elimination procedure, all variables are forced into the model to begin the process. Variables are removed one at a time until no more insignificant variables are found. Once a variable has been removed from the model, it cannot be re-entered.. EXAMPLE 15-3. APPLYING FORWARD SELECTION STEPWISE REGRESSION ANALYSIS. B. T. Longmont Company The B. T. Longmont Company operates a large retail department store in Macon, Georgia. Like other department stores, Longmont has incurred heavy losses due to shoplifting and employee pilferage. The store’s security manager wants to develop a regression model to explain the monthly dollar loss. The following steps can be used when developing a multiple regression model using stepwise regression: Step 1 Specify the model by determining the dependent variable and potential independent variables. The dependent variable (y) is the monthly dollar losses due to shoplifting and pilferage. The security manager has identified the following potential independent variables: x1 Average monthly temperature (degrees Fahrenheit) x2 Number of sales transactions x3 Dummy variable for holiday month (1 if holiday during month, 0 if not) x4 Number of persons on the store’s monthly payroll The data are contained in the file Longmont. Step 2 Formulate the regression model. The correlation matrix for the data is presented in Figure 15.26. The forward selection procedure will select the independent variable most highly correlated with the dependent variable. By examining the bottom row in the correlation matrix in Figure 15.26, you can see the variable x2, number of sales transactions, is most highly correlated (r 0.6307) with dollars lost. Once this variable is entered into the model, the remaining independent variables will be entered based on their ability to explain the remaining variation in the dependent variable.. FIGURE 15.26. |. Excel 2007 Correlation Matrix Output for the B.T. Longmont Company. Excel 2007 Instructions: 1. Open file: Longmont.xls. 2. Select Data tab. 3. Select Data Analysis > Correlation. 4. Specify data range (include labels).. 5. Click Labels. 6. Specify output location. 7. Click OK..

(219) 680. CHAPTER 15. FIGURE 15.27A. |. Multiple Regression Analysis and Model Building. |. Excel 2007 (PHStat) Forward Selection Results for the B.T. Longmont Company—Step 1 R-Squared = SSR/SST = 1,270,172/3,192,631. Number of sales transactions entered t = 3.1481 p-value = 0.0066. Excel 2007 (PHStat) Instructions: 1. Open file: Longmont.xls. 2. Select Add-Ins. 3. Select PHStat. 4. Select Regression > Stepwise Regression.. 5. Define data range for y variable and data range for x variables. 6. Check p-value criteria. 7. Select Forward Selection. 8. Click OK.. Figure 15.27a shows the PHStat stepwise regression output, and Figure 15.27b shows the Minitab output. At Step 1 of the process, variable x2, number of monthly sales transactions, enters the model. Step 3 Perform diagnostic checks on the model. Although PHStat does not provide R2 or the standard error of the estimate directly, they can be computed from the output in the ANOVA section of the printout. Recall from Chapter 14 that R2 is computed as R2 . SSR 1, 270,172.193 0.398 SST 3,192, 631.529. This single independent variable explains 39.8% (R2 0.398) of the variation in the dependent variable. The standard error of the estimate is the square root of the mean square residual: s MSE MS residual 128,163.96 358 Now at Step 1, we test the following: H0: b2 0 (slope for variable x2 0) HA: b2 0 a 0.05 As shown in Figure 15.27a, the calculated t-value is 3.1481.We compare this to the critical value from the t-distribution for / 2 (0.05 / 2) 0.025 and degrees of freedom equal to n k 1 17 1 1 15.

(220) CHAPTER 15. FIGURE 15.27B. |. Multiple Regression Analysis and Model Building. 681. |. Minitab Forward Selection Results for the B.T. Longmont Company—Step 1. s =. Variable x2 has entered the model. t-value = 3.15 p-value = 0.007. ÷ √ MS Residual = 358. R 2 = SSR = 1,270,172.193 = 0.3978 SST 3,192,631.529. Minitab Instructions: 1. Open file: Longmont.MTW. 5. Select Methods. 2. Choose Stat 6. Select Forward selection, enter Regression Stepwise. in Alpha to enter and F in F to 3. In Response, enter dependent enter. variable (y). 7. Click OK. 4. In Predictors, enter independent variable (x).. This critical value is t0.025 2.1315 Because t 3.1481 2.1315 we reject the null hypothesis and conclude that the regression slope coefficient for the variable, number of sales transactions, is not zero. Note also, because the p-value 0.0066 a 0.05 we would reject the null hypothesis.9. 9Some authors use an F-distribution to perform these tests. This is possible since squaring a random variable that has a t-distribution produces a random variable that has an F-distribution with one degree of freedom in the numerator and the same number of degrees as the t-distribution in the denominator..

(221) 682. CHAPTER 15. |. Multiple Regression Analysis and Model Building. Step 4 Continue to formulate and diagnose the model by adding other independent variables. The next variable to be selected will be the one that can do the most to increase R2. If you were doing this manually, you would try each variable to see which one yields the highest R2, given that the transactions variable is already in the model. Both the PHStat add-in software and Minitab do this automatically. As shown in Figure 15.27b and Figure 15.28, the variable selected in Step 2 of the process is x4, number of employees. Using the ANOVA section, we can determine R2 and s as before. R2 . SSR 1, 833, 270.524 0.5742 and SST 3,192, 631.529. s MS residual 97, 097.22 311.6 The model now explains 57.42% of the variation in the dependent variable. The t-values for both slope coefficients exceed t 2.145 (the critical value from the t-distribution table with a one-tailed area equal to 0.025 and 17 2 1 14 degrees of freedom), so we conclude that both variables are significant in explaining the variation in the dependent variable, shoplifting loss. The forward selection routine continues to enter variables as long as each additional variable explains a significant amount of the remaining variation in the dependent variable. Note that PHStat allows you to set the significance level in terms of a p-value or in terms of the t statistic. Then, as long as the calculated p-value for an incoming variable is less than your limit, the variable is allowed to enter the model. Likewise, if the calculated t-statistic exceeds your t limit, the variable is allowed to enter. In this example, with the p-value limit set at 0.05, neither of the two remaining independent variables would explain a significant amount of the remaining variation in the dependent variable. The procedure is, therefore, terminated. The resulting regression equation provided by forward selection is yˆ 4600.8 0.203x2 21.57 x4. FIGURE 15.28. |. PHStat Forward Selection Results for the B.T. Longmont Company—Step 2.

(222) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 683. Note that the dummy variables for holiday and temperature did not enter the model. This implies that, given the other variables are already included, knowing whether the month in question has a holiday or knowing its average temperature does not add significantly to the model’s ability to explain the variation in the dependent variable. The B.T. Longmont Company can now use this regression model to explain variation in shoplifting and pilferage losses based on knowing the number of sales transactions and the number of employees. >>END EXAMPLE. TRY PROBLEM 15-40 (pg. 677). Standard Stepwise Regression The standard stepwise procedure (sometimes referred to as forward stepwise regression—not to be confused with forward selection) combines attributes of both backward elimination and forward selection. The standard stepwise method serves one more important function. If two or more independent variables are correlated, a variable selected in an early step may become insignificant when other variables are added at later steps. The standard stepwise procedure will drop this insignificant variable from the model. Standard stepwise regression also offers a means of observing multicollinearity problems, because we can see how the regression model changes as each new variable is added to it. The standard stepwise procedure is widely used in decision-making applications and is generally recognized as a useful regression method. However, care should be exercised when using this procedure because it is easy to rely too heavily on the automatic selection process. Remember, the order of variable selection is conditional, based on the variables already in the model. There is no guarantee that stepwise regression will lead you to the best set of independent variables from those available. Decision makers still must use common sense in applying regression analysis to make sure they have usable regression models.. Best Subsets Regression Another method for developing multiple regression models is called the best subsets method. As the name implies, the best subsets method works by selecting subsets from the chosen possible independent variables to form models. The user can then select the “best” model based on such measures as R-squared or the standard error of the estimate. Both Minitab and PHStat contain procedures for performing best subsets regression.. EXAMPLE 15-4. APPLYING BEST SUBSETS REGRESSION. Winston Investment Advisors Charles L. Winston, founder and CEO at Winston Investment Advisors in Burbank, California, is interested in developing a regression model to explain the variation in dividends paid per share by U.S. companies. Such a model would be useful in advising his clients. The following steps show how to develop such a model using the best subsets regression approach: Step 1 Specify the model. Some publicly traded companies pay higher dividends than others. The CEO is interested in developing a multiple regression model to explain the variation in dividends per share paid to shareholders. The dependent variable will be dividends per share. The CEO met with other analysts in his firm to identify.

(223) 684. CHAPTER 15. |. Multiple Regression Analysis and Model Building. potential independent variables for which data would be readily available. The following list of potential independent variables was selected: x1 Return on equity (net income/equity) x2 Earnings per share x3 Current assets in millions of dollars x4 Year-ending stock price x5 Current ratio (current assets/current liabilities) A random sample of 35 publicly traded U.S. companies was selected. For each company in the sample, the analysis obtained data on the dividend per share paid last year and the year-ending data on the five independent variables. These data are contained in the data file Company Financials. Step 2 Formulate the regression model. The CEO is interested in developing the “best” regression model for explaining the variation in the dependent variable, dividends per share. The approach taken is to use best subsets, which requires that multiple regression models be developed, each containing a different mix of variables. The models tried will contain from one to five independent variables. The resulting models will be evaluated by comparing values for R-squared and the standard error of the estimate. High R-squares and low standard errors are desirable. Another statistic, identified as Cp, is sometimes used to evaluate regression models. This statistic measures the difference between the estimated model and the true population model. Values of Cp close to p k 1 (k is the number of independent variables in the model) are preferred. Both the PHStat Excel add-ins and Minitab can be used to perform best subsets regression analysis. Figure 15.29 shows output from PHStat. Notice that all possible combinations of models with k 1 to k 5 independent variables are included. Several models appear to be good candidates based on R-squared, adjusted R-squared, standard error of the estimate, and Cp values. These are as follows:. Model. Cp. k1. R-square. Adj. R-square. Std. Error. x1 x2 x3 x4 x1 x2 x3 x4 x5 x1 x2 x3 x5 x2 x3 x2 x3 x4 x2 x3 x4 x5 x2 x3 x5. 4.0 6.0 5.0 1.4 2.5 4.5 3.4. 5 6 5 3 4 5 4. 0.628 0.629 0.615 0.610 0.622 0.622 0.610. 0.579 0.565 0.564 0.586 0.585 0.572 0.573. 0.496 0.505 0.505 0.492 0.493 0.500 0.500. There is little difference in these seven models in terms of the statistics shown. We can examine any of them in more detail by looking at further PHStat output. For instance, the model containing variables x1, x2, x3, and x5 is shown in Figure 15.30. Note that although this model is among the best with respect to R-squared, adjusted R-squared, standard error of the estimate, and Cp value, two of the four variables (x1 ROE and x5 Current ratio) have statistically insignificant regression coefficients. Figure 15.31 shows the regression model with the two statistically significant variables remaining. The R-squared value is 0.61, the adjusted R2 has increased, and the overall model is statistically significant. >>END EXAMPLE. TRY PROBLEM 15-43 (pg. 687).

(224) CHAPTER 15. FIGURE 15.29. |. Multiple Regression Analysis and Model Building. |. Best Subsets Regression Output for Winston Investment Advisors. Excel 2007 (PHStat) Instructions: 1. Open file: Company Financials.xls. 2. Select Add-Ins. 3. Select PHStat. 4. Select Regression > Best Subsets Regression.. 5. Define data range for y variable and data range for x variables. 6. Click OK.. 685.

(225) 686. CHAPTER 15. FIGURE 15.30. |. Multiple Regression Analysis and Model Building. |. One Potential Model for Winston Investment Advisors. FIGURE 15.31. |. A Final Model for Winston Financial Advisors.

(226) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 687. MyStatLab. 15-4: Exercises Skill Development 15-39. Suppose you have four potential independent variables, x1, x2, x3, and x4, from which you want to develop a multiple regression model. Using stepwise regression, x2 and x4 entered the model. a. Why did only two variables enter the model? Discuss. b. Suppose a full regression with only variables x2 and x4 had been run. Would the resulting model be different from the stepwise model that included only these two variables? Discuss. c. Now, suppose a full regression model had been developed, with all four independent variables in the model. Which would have the higher R2 value, the full regression model or the stepwise model? Discuss. 15-40. You are given the following set of data: y. x1. x2. x3. 33 44 34 60 20 30. 9 11 10 13 11 7. 192 397 235 345 245 235. 40 47 37 61 23 35. y. x1. x2. x3. 45 25 53 45 37 44. 12 9 10 13 11 13. 296 235 295 335 243 413. 52 27 57 50 41 51. a. Determine the appropriate correlation matrix and use it to predict which variable will enter in the first step of a stepwise regression model. b. Use standard stepwise regression to construct a model, entering all significant variables. c. Construct a full regression model. What are the differences in the model? Which model explains the most variation in the dependent variable? 15-41. You are given the following set of data: y. x1. x2. x3. y. x1. x2. x3. 45 41 43 38 50 39 50. 40 31 45 43 42 48 44. 41 41 49 41 42 40 44. 39 35 39 41 51 42 41. 45 43 34 49 45 40 43. 42 37 40 35 39 43 53. 39 52 47 44 40 30 34. 37 41 40 44 45 42 34. a. Determine the appropriate correlation matrix and use it to predict which variable will enter in the first step of a stepwise regression model.. b. Use standard stepwise regression to construct a model, entering all significant variables. c. Construct a full regression model. What are the differences in the model? Which model explains the most variation in the dependent variable? 15-42. The following data represent a dependent variable and four independent variables:. y. x1. x2. x3. x4. y. x1. x2. 61 37 22 48 66. 37 25 23 12 34. 13 7 6 8 15. 10 11 7 8 2. 21 32 18 30 33. 69 24 68 65 45. 35 23 35 37 30. 19 14 17 11 9. x3 9 7 3 17 24. x4 23 31 33 19 31. a. Use the standard stepwise regression to produce an estimate of a multiple regression model to predict y. Use 0.15 as the alpha to enter and to remove. b. Change the alpha to enter to 0.01. Repeat the standard stepwise procedure. c. Change the alpha to remove to 0.35, leaving alpha to enter to be 0.15. Repeat the standard stepwise procedure. d. Change the alpha to remove to 0.15, leaving alpha to enter to be 0.05. Repeat the standard stepwise procedure. e. Change the alpha to remove and to enter to 0.35. Repeat the standard stepwise procedure. f. Compare the estimated regression equations developed in parts a–e. 15-43. Consider the following set of data:. y. x1. x2. x3. x4. y. x1. x2. x3. x4. 61 37 22 48 66. 37 25 23 12 34. 18 5 12 6 14. 2 4 7 2 3. 13 10 4 15 25. 69 24 68 65 45. 35 23 35 37 30. 21 7 15 19 12. 2 6 3 2 3. 20 9 14 19 12. a. Use standard stepwise regression to produce an estimate of a multiple regression model to predict y. b. Use forward selection stepwise regression to produce an estimate of a multiple regression model to predict y..

(227) 688. CHAPTER 15. |. Multiple Regression Analysis and Model Building. c. Use backwards elimination stepwise regression to produce an estimate of a multiple regression model to predict y. d. Use best subsets regression to produce an estimate of a multiple regression model to predict y.. Computer Database Exercises 15-44. The U.S. Energy Information Administration publishes summary statistics concerning the energy sector of the U.S. economy. The electric power industry continues to grow. Electricity generation and sales rose to record levels. The file entitled Energy presents the revenue from retail sales ($million) and the net generation by energy source for the period 1993–2004. a. Produce the correlation matrix of all the variables. Predict the variables that will remain in the estimated regression equation if standard stepwise regression is used to predict the revenues from retail sales of energy. b. Use standard stepwise regression to develop an estimate of a model that is to predict the revenue from retail sales of energy ($million). c. Compare the results of parts a and b. Explain any difference between the two models. 15-45. The Western State Tourist Association gives out pamphlets, maps, and other tourist-related information to people who call a toll-free number and request the information. The association orders the packets of information from a document-printing company and likes to have enough available to meet the immediate need without having too many sitting around taking up space. The marketing manager decided to develop a multiple regression model to be used in predicting the number of calls that will be received in the coming week. A random sample of 12 weeks is selected, with the following variables: y Number of calls x1 Number of advertisements placed the previous week x2 Number of calls received the previous week x3 Number of airline tour bookings into Western cities for the current week These data are in the data file called Western States. a. Develop the multiple regression model for predicting the number of calls received, using backward elimination stepwise regression. b. At the final step of the analysis, how many variables are in the model? c. Discuss why the variables were removed from the model in the order shown by the stepwise regression. 15-46. Refer to Problem 15-45. a. Develop the correlation matrix that includes all independent variables and the dependent variable. Predict the order that the variables will be selected. into the model if forward selection stepwise regression is used. b. Use forward selection stepwise regression to develop a model for predicting the number of calls that the company will receive. Write a report that describes what has taken place at each step of the regression process. c. Compare the forward selection stepwise regression results in part b with the backward elimination results determined in Problem 15-45. Which model would you choose? Explain your answer. 15-47. An investment analyst collected data of 20 randomly chosen companies. The data consisted of the 52-weekhigh stock prices, PE ratios, and the market values of the companies. These data are in the file entitled Investment. The analyst wishes to produce a regression equation to predict the market value using the 52-week-high stock price and the PE ratio of the company. He creates a complete second-degree polynomial. a. Produce two scatter plots: (1) market value versus stock price and (2) market value versus PE ratio. Do the scatter plots support the analyst’s decision to produce a second-order polynomial? Support your assertion with statistical reasoning. b. Use forward selection stepwise regression to eliminate any unneeded components from the analyst’s model. c. Does forward selection stepwise regression support the analyst’s decision to produce a second-order polynomial? Support your assertion with statistical reasoning. 15-48. A variety of sources suggest that individuals assess their health, at least in part, by estimating their percentage of body fat. A widely accepted measure of body fat uses an underwater weighing technique. There are, however, more convenient methods using only a scale and a measuring tape. An article in the Journal of Statistics Education by Roger W. Johnson explored regression models to predict body fat. The file entitled Bodyfat lists a portion of the data presented in the cited article. a. Use best subsets stepwise regression to establish the relationship between body fat and the variables in the specified file. b. Predict the body fat of an individual whose age is 21, weight is 170 pounds, height is 70 inches, chest circumference is 100 centimeters, abdomen is 90 centimeters, hip is 105 centimeters, and thigh is 60 centimeters around. 15-49. The consumer price index (CPI) is a measure of the average change in prices over time in a fixed market basket of goods and services typically purchased by consumers. One of the items in this market basket that affects the CPI is the price of oil and its derivatives. The file entitled Consumer contains the price of the derivatives of oil and the CPI adjusted to 2005 levels..

(228) CHAPTER 15. |. Multiple Regression Analysis and Model Building. a. Use backward elimination stepwise regression to determine which combination of the oil derivative prices drive the CPI. If you encounter difficulties in completing this task, explain what caused the difficulties.. 689. b. Eliminate the source of the difficulties in part a by producing a correlation matrix to determine where the difficulty lies. c. Delete one of the variables indicated in part b and complete the instructions in part a. END EXERCISES 15-4. 15.5 Determining the Aptness of the Model In Section 15.1 we discussed the basic steps involved in building a multiple regression model. These are as follows: 1. Specify the model. 2. Build the model. 3. Perform diagnostic checks on the model. The final step is the diagnostic step in which you examine the model to determine how well it performs. In Section 15.2, we discussed several statistics that you need to consider when performing the diagnostic step, including analyzing R2, adjusted R2, and the standard error of the estimate. In addition, we discussed the concept of multicollinearity and the impacts that can occur when multicollinearity is present. Section 15.3 introduced another diagnostic step that involves looking for potential curvilinear relationships between the independent variables and the dependent variable. We presented some basic data transformation techniques for dealing with curvilinear situations. However, a major part of the diagnostic process involves an analysis of how well the model fits the regression analysis assumptions. The assumptions required to use multiple regression include the following: Assumptions. 1. Individual model errors, , are statistically independent of one another, and these values represent a random sample from the population of possible residuals at each level of x. 2. For a given value of x there can exist many values of y, and therefore many possible values for . Further, the distribution of possible -values for any level of x is normally distributed. 3. The distributions of possible -values have equal variances at each level of x. 4. The means of the dependent variable, y, for all specified values of x can be connected with a line called the population regression model. The degree to which a regression model satisfies these assumptions is called aptness.. Analysis of Residuals Residual. The residual is computed using Equation 15.13.. The difference between the actual value of the dependent variable and the value predicted by the regression model.. Residual ei yi yˆi. (15.13). A residual value can be computed for each observation in the data set. A great deal can be learned about the aptness of the regression model by analyzing the residuals. The principal means of residual analysis is a study of residual plots. The following problems can be inferred through graphical analysis of residuals: 1. 2. 3. 4.. The regression function is not linear. The residuals do not have a constant variance. The residuals are not independent. The residual terms are not normally distributed.. We will address each of these in order. The regression options in both Minitab and Excel provide extensive residual analysis..

(229) 690. CHAPTER 15. |. Multiple Regression Analysis and Model Building. Checking for Linearity A plot of the residuals (on the vertical axis) against the independent variable (on the horizontal axis) is useful for detecting whether a linear function is the appropriate regression function. Figure 15.32 illustrates two different residual plots. Figure 15.32a shows residuals that systematically depart from 0. When x is small the residuals are negative. When x is in the midrange the residuals are positive, and for large xvalues the residuals are negative again. This type of plot suggests that the relationship between y and x is nonlinear. Figure 15.32b shows a plot in which residuals do not show a systematic variation around 0, implying that the relationship between x and y is linear. If a linear model is appropriate, we expect the residuals to band around 0 with no systematic pattern displayed. If the residual plot shows a systematic pattern, it may be possible to transform the independent variable (refer to Section 15.3) so that the revised model will produce residual plots that will not systematically vary from 0.. BUSINESS APPLICATION. Excel and Minitab. tutorials. Excel and Minitab Tutorial. FIGURE 15.32. RESIDUAL ANALYSIS. FIRST CITY REAL ESTATE (CONTINUED) We have been using First City Real Estate to introduce multiple regression tools throughout this chapter. Remember, the managers wish to develop a multiple regression model for predicting the sales prices of homes in their market. Suppose that the most current model (First city-3) incorporates a transformation of the lot size variable as log of lot size. The output for this model is shown in Figure 15.33. Notice the model now has a R2 value of 96.9%. There are currently four independent variables in the model: square feet, bedrooms, garage, and the log of lot size. Both Minitab and Excel provide procedures for automatically producing residual plots. Figure 15.34 shows the plots of the residuals against each. | 3. Residual Plots Showing Linear and Nonlinear Patterns Residuals. 2 1 0 –1 –2 –3 x (a) Nonlinear Pattern 3. Residuals. 2 1 0 –1 –2 –3 x (b) Linear Pattern.

(230) CHAPTER 15. FIGURE 15.33. |. Multiple Regression Analysis and Model Building. |. Minitab Output of First City Real Estate Appraisal Model. Minitab Instructions: 1. Open file: First City-3. MTW. 2. Choose Stat Regression Regression. 3. In Response, enter dependent (y) variable. 4. In Predictors, enter independent (x) variables. 5. Click OK.. FIGURE 15.34. |. First City Real Estate Residual Plots versus the Independent Variables. 50,000. Residual. Residual. 50,000. 0. –50,000. 0. 1,000. 2,000 3,000 Square Feet (a) Residuals versus Square Feet (Response Is Price). –50,000. 4,000. 2. 3. 4. 50,000. Residual. Residual. 1. Garage (b) Residuals versus Garage (Response Is Price). 50,000. 0. –50,000. 0. 2. 3. 4 5 Bedrooms (c) Residuals versus Bedrooms (Response Is Price). 6. 0. –50,000. 7. 8. 9 10 LOG Lot (d) Residuals versus LOG Lot (Response Is Price). 11. 691.

(231) 692. CHAPTER 15. |. Multiple Regression Analysis and Model Building. of the independent variables. The transformed variable, log lot size, has a residual pattern that shows a systematic pattern (Figure 15.34d). The residuals are positive for small values of log lot size, negative for intermediate values of log lot size, and positive again for large values of log lot size. This pattern suggests that the curvature of the relationship between sales prices of homes and lot size is even more pronounced than the logarithm implies. Potentially, a second- or third-degree polynomial in the lot size should be pursued. Do the Residuals Have Equal Variances at all Levels of Each x Variable? Residual plots also can be used to determine whether the residuals have a constant variance. Consider Figure 15.35, in which the residuals are plotted against an independent variable. The plot in Figure 15.35a shows an example in which as x increases the residuals become less variable. Figure 15.35b shows the opposite situation. When x is small, the residuals are tightly packed around 0, but as x increases the residuals become more variable. Figure 15.35c shows an example in which the residuals exhibit a constant variance around the zero mean. When a multiple regression model has been developed, we can analyze the equal variance assumption by plotting the residuals against the fitted ( yˆ ) values. When the residual plot is cone-shaped, as in Figure 15.36, it suggests that the assumption of equal variance has been violated. This is evident because the residuals are wider on one end than the other. That indicates that the standard error of the estimate is larger on one end than the other, i.e., it is not constant.. | 6. Residuals. 4 2 0 –2 –4 –6 x (a) Variance Decreases as x Increases 6 4 Residuals. Residual Plots Showing Constant and Nonconstant Variances. 2 0 –2 –4 –6 x. (b) Variance Increases as x Increases 3 2 Residuals. FIGURE 15.35. 1 0 –1 –2 –3 x. (c) Equal Variance.

(232) CHAPTER 15. FIGURE 15.36. |. Multiple Regression Analysis and Model Building. 693. | 6. Residual Plots against the Fitted ( yˆ ) Values Residuals. 4 2 0 –2 –4 –6 ˆ Fitted Values (y) (a) Variance Not Constant 6. Residuals. 4 2 0 –2 –4 –6 ˆ Fitted Values (y) (b) Variance Not Constant. Figure 15.37 shows the residuals plotted against the yˆ -values for First City Real Estate’s appraisal model. We have drawn a band around the residuals that shows that the variance of the residuals stays quite constant over the range of the fitted values. Are the Residuals Independent? If the data used to develop the regression model are measured over time, a plot of the residuals against time is used to determine whether the residuals are correlated. Figure 15.38a shows an example in which the residual plot against time suggests independence. The residuals in Figure 15.38a appear to be randomly distributed around the mean of zero over time. However, in Figure 15.38b, the plot suggests that the residuals are not independent, because in the early time periods the residuals are negative and in later time periods the residuals are positive. This, or any other nonrandom pattern in the residuals over time, indicates that the assumption of independent residuals has been violated. Generally, this means some variable associated with the passage of time has been omitted from the model. Often, time is used as a surrogate for other time-related variables in a regression model. Chapter 16 will discuss time-series data analysis and forecasting techniques in more detail and will address the issue of incorporating the time variable into the model. In Chapter 16, we introduce a procedure called the Durbin-Watson test to determine whether residuals are correlated over time. Checking for Normally Distributed Error Terms The need for normally distributed model errors occurs when we want to test a hypothesis about the regression model. Small departures from normality do not cause serious problems. However, if the model errors depart dramatically from a normal distribution, there is cause for concern. Examining the sample residuals will allow us to detect such dramatic departures. One method for graphically analyzing the residuals is to form a frequency histogram of the residuals to determine whether the general shape is normal. The chi-square goodness-of-fit test presented in Chapter 13 can be used to test whether the residuals fit a normal distribution..

(233) 694. CHAPTER 15. FIGURE 15.37. |. Multiple Regression Analysis and Model Building. |. Minitab Plot of Residuals versus Fitted Values for First City Real Estate. Minitab Instructions: 1. Open file: First City-3. MTW. 2. Choose Stat Regression Regression. 3. In Response, enter dependent (y) variable. 4. In Predictors, enter independent (x) variables. 5. Choose Graphs. 6. Under Residual Plots, select Residuals versus fits. 7. Click OK. OK.. FIGURE 15.38. | 3. Plot of Residuals against Time Residuals. 2 1 0 –1 –2 –3 Time (a) Independent Residuals 6. Residuals. 4 2 0 –2 –4 –6 Time (b) Residuals Not Independent.

(234) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 695. Another method for determining normality is to calculate and plot the standardized residuals. In Chapter 3 you learned that a random variable is standardized by subtracting its mean and dividing the result by its standard deviation. The mean of the residuals is zero. Therefore, dividing each residual by an estimate of its standard deviation gives the standardized residual.10 Although the proof is beyond the scope of this text, it can be shown that the standardized residual for any particular observation for a simple linear regression model is found using Equation 15.14. Standardized Residual for Linear Regression ei se i ( xi x ) 2 1 s 1 n ( ∑ x )2 ∑ x2 n. (15.14). where: ei ith residual value s Standard error of the estimate xi Value of x used to generate the predicted y-value for the ith observation Computing the standardized residual for an observation in a multiple regression model is too complicated to be done by hand. However, the standardized residuals are generated from most statistical software, including Minitab and Excel. The Excel and Minitab tutorials illustrate the methods required to generate the standardized residuals and residual plots. Because other problems such as nonconstant variance and nonindependent residuals can result in residuals that seem to be abnormal, you should check these other factors before addressing the normality assumption. Recall that for a normal distribution, approximately 68% of the values will fall within 1 standard deviation of the mean, 95% will fall within 2 standard deviations of the mean, and virtually all values will fall within 3 standard deviations of the mean. Figure 15.39 illustrates the histogram of the residuals for the First City Real Estate example. The distribution of residuals looks to be close to a normal distribution. Figure 15.40 shows the FIGURE 15.39. |. Minitab Histogram of Residuals for First City Real Estate. Minitab Instructions: 1. Open file: First City-3. MTW. 2. Choose Stat Regression Regression. 3. In Response, enter dependent (y) variable. 4. In Predictors, enter independent (x) variables. 5. Choose Graphs. 6. Under Residual Plots, select Histogram of residuals. 7. Click OK. OK. 10The. standardized residual is also referred to as the studentized residual..

(235) 696. CHAPTER 15. FIGURE 15.40. |. Multiple Regression Analysis and Model Building. |. Minitab Histogram of Standardized Residuals for First City Real Estate. Minitab Instructions: 1. Open file: First City-3. MTW. 2. Choose Stat Regression Regression. 3. In Response, enter dependent (y) variable. 4. In Predictors, enter independent (x) variables. 5. Choose Graphs. 6. Under Residual for Plots, select Standardized. 7. Under Residual Plots, select Histogram of residuals. 8. Click OK. OK.. histogram for the standardized residuals, which will have the same basic shape as the residual distribution in Figure 15.39. Another approach for checking for normality of the residuals is to form a probability plot. We start by arranging the residuals in numerical order from smallest to largest. The standardized residuals are plotted on the horizontal axis, and the corresponding expected value for the standardized residual is plotted on the vertical axis. Although we won’t delve into how the expected value is computed, you can examine the normal probability plot to see whether the plot forms a straight line. The closer the line is to linear, the closer the residuals are to being normally distributed. Figure 15.41 shows the normal probability plot for the First City Real Estate Company example. FIGURE 15.41. |. Minitab Normal Probability Plot of Residuals for First City Real Estate. Minitab Instructions: 1. Open file: First City-3. MTW. 2. Choose Stat Regression Regression. 3. In Response, enter dependent (y) variable. 4. In Predictors, enter independent (x) variables. 5. Choose Graphs. 6. Under Residual Plots, select Normal plot of residuals. 7. Click OK.OK..

(236) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 697. You should be aware that Minitab and Excel format their residual plots in a slightly different way. However, the same general information is conveyed, and you can look for the same signs of problems with the regression model.. Corrective Actions If, based on analyzing the residuals, you decide the model constructed is not appropriate, but you still want a regression-based model, some corrective action may be warranted. There are three approaches that may work: Transform some of the existing independent variables, remove some variables from the model, or start over in the development of the regression model. Earlier in this chapter, we discussed a basic approach involved in variable transformation. In general, the transformations of the independent variables (such as raising x to a power, taking the square root of x, or taking the log of x) are used to make the data better conform to a linear relationship. If the model suffers from nonlinearity and if the residuals have a nonconstant variance, you may want to transform both the independent and dependent variables. In cases in which the normality assumption is not satisfied, transforming the dependent variable is often useful. In many instances, a log transformation works. In some instances, a transformation involving the product of two independent variables will help. A more detailed discussion is beyond the scope of this text. However, you can read more about this subject in the Kutner et al. reference listed at the end of the chapter. The alternative of using a different regression model means that we respecify the model to include new independent variables or remove existing variables from the model. In most modeling situations, we are in a continual state of model respecification. We are always seeking to improve the regression model by finding new independent variables.. MyStatLab. 15-5: Exercises Skill Development 15-50. Consider the following values for an independent and dependent variable: x. y. 6 9 14 18 22 27 33. 5 20 28 30 33 35 45. a. Determine the estimated linear regression equation relating the dependent and independent variables. b. Is the regression equation you found significant? Test at the a 0.05 level. c. Determine both the residuals and standardized residuals. Is there anything about the residuals that. would lead you to question whether the assumptions necessary to use regression analysis are satisfied? Discuss. 15-51. Consider the following values for an independent and dependent variable:. x. y. 6 9 14 18 22 27 33 50 61 75. 5 20 28 15 27 31 32 60 132 160.

(237) 698. CHAPTER 15. |. Multiple Regression Analysis and Model Building. a. Determine the estimated linear regression equation relating the dependent and independent variables. b. Is the regression equation you found significant? Test at the a 0.05 level. c. Determine both the residuals and standardized residuals. Is there anything about the residuals that would lead you to question whether the assumptions necessary to use regression analysis are satisfied? 15-52. Examine the following data set: y. x. 25 35 14 45 52 41 65 63 68. 10 10 10 20 20 20 30 30 30. a. Determine the estimated regression equation for this data set. b. Calculate the residuals for this regression equation. c. Produce the appropriate residual plot to determine if the linear function is the appropriate regression function for this data set. d. Use a residual plot to determine if the residuals have a constant variance. e. Produce a residual plot to determine if the residuals are independent. Assume the order of appearance is the time order of the data. f. Use a probability plot to determine if the error terms are normally distributed. 15-53. Examine the following data set:. y. x1. x2. 25 35 14 45 52 41 65 63 68 75. 5 5 5 25 25 25 30 30 30 40. 25 5 5 40 5 25 30 30 25 30. a. Determine the estimated regression equation for this data set. b. Calculate the residuals and the standardized residuals for this regression equation.. c. Produce the appropriate residual plot to determine if the linear function is the appropriate regression function for this data set. d. Use a residual plot to determine if the residuals have a constant variance. e. Produce the appropriate residual plot to determine if the residuals are independent. f. Construct a probability plot to determine if the error terms are normally distributed.. Computer Database Exercises 15-54. Refer to Exercise 15-9, which referenced an article in BusinessWeek that presented a list of the 100 companies perceived as having “hot growth” characteristics. The file entitled Logrowth contains sales ($million), sales increase (%), return on capital, market value ($million), and recent stock price of the companies ranked from 81 to 100. In Exercise 15-9, a regression equation was constructed in which the sales of the companies was predicted using their market value. a. Determine the estimated regression equation for this data set. b. Calculate the residuals and the standardized residuals for this regression equation. c. Produce the appropriate residual plot to determine if the linear function is the appropriate regression function for this data set. d. Use a residual plot to determine if the residuals have a constant variance. e. Produce the appropriate residual plot to determine if the residuals are independent. Assume the data were extracted in the order listed. f. Construct a probability plot to determine if the error terms are normally distributed. 15-55. The White Cover Snowmobile Association promotes snowmobiling in both the Upper Midwest and the Rocky Mountain region. The industry has been affected in the West because of uncertainty associated with conflicting court rulings about the number of snowmobiles allowed in national parks. The Association advertises in outdoor- and tourist-related publications and then sends out pamphlets, maps, and other regional related information to people who call a toll-free number and request the information. The association orders the packets from a documentprinting company and likes to have enough available to meet the immediate need without having too many sitting around taking up space. The marketing manager decided to develop a multiple regression model to be used in predicting the number of calls that will be received in the coming week. A random sample of 12 weeks is selected, with the following variables: y Number of calls x1 Number of advertisements placed the previous week x2 Number of calls received the previous week.

(238) CHAPTER 15. x3 Number of airline tour bookings into Western cities for the current week The data are in the file called Winter Adventures. a. Construct a multiple regression model using all three independent variables. Write a short report discussing the model. b. Based on the appropriate residual plots, what can you conclude about the constant variance assumption? Discuss. c. Based on the appropriate residual analysis, does it appear that the residuals are independent? Discuss. d. Use an appropriate analysis of the residuals to determine whether the regression model meets the assumption of normally distributed error terms. Discuss. 15-56. The athletic director of State University is interested in developing a multiple regression model that might be used to explain the variation in attendance at football games at his school. A sample of 16 games was selected from home games played during the past 10 seasons. Data for the following factors were determined: y Game attendance x1 Team win/loss percentage to date x2 Opponent win/loss percentage to date x3 Games played this season x4 Temperature at game time The sample data are in the file called Football. a. Build a multiple regression model using all four independent variables. Write a short report that outlines the characteristics of this model. b. Develop a table of residuals for this model. What is the average residual value? Why do you suppose it came out to this value? Discuss. c. Based on the appropriate residual plot, what can you conclude about the constant variance assumption? Discuss. d. Based on the appropriate residual analysis, does it appear that the model errors are independent? Discuss. e. Can you conclude, based on the appropriate method of analysis, that the model error terms are approximately normally distributed? 15-57. The consumer price index (CPI) is a measure of the average change in prices over time in a fixed market basket of goods and services typically purchased by consumers. One of the items in this market basket that affects the CPI is the price of oil and its derivatives. The file entitled Consumer contains the price of the derivatives of oil and the CPI adjusted to 2005 levels. In Exercise 15-49, backward elimination stepwise regression was used to determine the relationship between CPI and two independent variables: the price of heating oil and of diesel fuel.. |. Multiple Regression Analysis and Model Building. 699. a. Construct an estimate of the regression equation using the same variables. b. Produce the appropriate residual plots to determine if the linear function is the appropriate regression function for this data set. c. Use a residual plot to determine if the residuals have a constant variance. d. Produce the appropriate residual plot to determine if the residuals are independent. Assume the data were extracted in the order listed. e. Construct a probability plot to determine if the error terms are normally distributed. 15-58. In Exercise 15-48 you were asked to use best subsets stepwise regression to establish the relationship between body fat and the independent variables weight, abdomen circumference, and thigh circumference based on data in the file Bodyfat. This is an extension of that exercise. a. Construct an estimate of the regression equation using the same variables. b. Produce the appropriate residual plots to determine if the linear function is the appropriate regression function for this data set. c. Use a residual plot to determine if the residuals have a constant variance. d. Produce the appropriate residual plot to determine if the residuals are independent. Assume the data were extracted in the order listed. e. Construct a probability plot to determine if the error terms are normally distributed. 15-59. The National Association of Theatre Owners is the largest exhibition trade organization in the world, representing more than 26,000 movie screens in all 50 states and in more than 20 countries worldwide. Its membership includes the largest cinema chains and hundreds of independent theater owners. It publishes statistics concerning the movie sector of the economy. The file entitled Flicks contains data on total U.S. box office grosses ($billion), total number of admissions (billion), average U.S. ticket price ($), and number of movie screens. a. Construct a regression equation in which total U.S. box office grosses are predicted using the other variables. b. Produce the appropriate residual plots to determine if the linear function is the appropriate regression function for this data set. c. Square each of the independent variables and add them to the model on which the regression equation in part a was built. Produce the new regression equation. d. Use a residual plot to determine if the quadratic model in part c alleviates the problem identified in part b. e. Construct a probability plot to determine if the error terms are normally distributed for the updated model. END EXERCISES 15-5.

(239) 700. CHAPTER 15. |. Multiple Regression Analysis and Model Building. Visual Summary Chapter 15: Chapter 14 introduced linear regression, concentrating on analyzing a linear relationship between two variables. However, business problems are not limited to linear relationships involving only two variables, many situations involve linear and nonlinear relationships among three or more variables. This chapter introduces several extensions of the techniques covered in the last chapter including: multiple linear regression, incorporating qualitative variables in the regression model, working with nonlinear relationships, techniques for determining how “good” the model fits the data, and stepwise regression.. 15.1 Introduction to Multiple Regression Analysis (pg. 634–653) Summary Multiple linear regression analysis examines the relationship between a dependent and more than one independent variable. Determining the appropriate relationship starts with model specification, where the appropriate variables are determined, then moves to model building, followed by model diagnosis, where the quality of the model built is determined. The purpose of the model is to explain variation in the dependent variable. Useful independent variables are those highly correlated with the dependent variable. The percentage of variation in the dependent variable explained by the model is determined by the coefficient of determination, R 2. The overall model can be tested for significance as can the individual terms in the model. A common problem in multiple regressions models occurs when the independent variable are highly correlated, this is called multicollinearity.. Outcome Outcome Outcome Outcome Outcome. 1. 2. 3. 4. 5.. Understand the general concepts behind model building using multiple regression analysis. Apply multiple regression analysis to business decision-making situations. Analyze the computer output for a multiple regression model and interpret the regression results. Test hypotheses about the significance of a multiple regression model and test the significance of the independent variables in the model. Recognize potential problems when using multiple regression analysis and take steps to correct the problems.. 15.2 Using Qualitative Independent Variables (pg. 654–661) Summary Independent variables are not always quantitative and ratio level. Important independent variables might include determining if someone is married, or not, owns her home, or not, is recently employed and type of car owned. All of these are qualitative, not quantitative, variables and are incorporated into multiple regression analysis using dummy variables. Dummy variables are numerical codes, 0 or 1, depending on whether the observation has the indicated characteristic. Be careful to insure you use one fewer dummy variables than categories to avoid the dummy variable trap. Outcome 6. Incorporate qualitative variables into a regression model by using dummy variables.. 15.3 Working with Nonlinear Relationships (pg. 661–668) Summary Sometimes business situations involve a nonlinear relationship between the dependent and independent variables. Regression models with nonlinear relationship become more complicated to build and analyze. Start by plotting the data to see the relationships between the dependent variable and independent variable. Exponential, or second or third order polynomial relationships are commonly found. Once the appropriate relationship is determined, the independent variable is modified and used in the model.. Outcome 7. Apply regression analysis to situations where the relationship between the independent variable(s) and the dependent variable is nonlinear.. 15.4 Stepwise Regression (pg. 678–689) Summary Stepwise regression develops the regression equation either through forward selection, backward elimination, or standard stepwise regression. Forward selection begins by selecting a single independent variable which is most highly correlated with the dependent variable. Additional variables will be added to the model as long as they reduce a significant amount of the remaining variation in the dependent variable. Backward elimination starts with all variables in the model to begin the process. Variables are removed one at a time until no more insignificant variables are found. Standard stepwise is similar to forward selection. However, if two or more variables are correlated, a variable selected in an early step may become insignificant when other variables are added at later steps. The standard stepwise procedure will drop this insignificant variable from the model.. Outcome 8. Understand the uses of stepwise regression.. Conclusion Multiple regression uses two or more independent variables to explain the variation in the dependent 15.5 Determining the Aptness of the Model (pg. 689–699) variable. As a decision maker, you will generally not be required to manually develop the regression model, but Summary Determining the aptness of a model relies on an analysis of residuals, the difference between the observed value of you will have to judge its applicability based on a computer printout. Consequently, this chapter has the dependent variable and the value predicted by the model. The residuals should be randomly scattered about the largely involved an analysis of computer printouts. You regression line with a normal distribution and constant variance. If a plot of the residuals indicates any of the no doubt will encounter printouts that look somewhat preceding does not occur, corrective action should be taken which might involve transforming some independent different from those shown in this text and some of the variables, dropping some variables or adding new ones, or even starting over with the model building process. terms used may differ slightly, but Excel and Minitab software we have used are representative of the many Outcome 9. Analyze the extent to which a regression model satisfies the regression assumptions. software packages that are available..

(240) CHAPTER 15. |. Multiple Regression Analysis and Model Building. 701. Equations (15.1) Population Multiple Regression Model pg. 634. (15.8) Standard Error of the Estimate pg. 646. y b0 b1x1 b2x2 . . . bkxk ε (15.2) Estimated Multiple Regression Model pg. 634. yˆ b0 b1 x1 b2 x2 . . . bk xk. (15.9) Variance Inflation Factor pg. 648. (15.3) Correlation Coefficient pg. 638. VIF . ∑ ( x x )( y y ). r. pg. 650. bj tsb. ∑ ( xi xi )( x j x j ). r. ∑ ( xi xi ) 2 ∑ ( x j x j ) 2 One x variable with another x. (15.4) Multiple Coefficient of Determination (R2) pg. 642. R2. 1 (1 R 2j ). (15.10) Confidence Interval Estimate for the Regression Slope. ∑ ( x x )2 ∑ ( y y )2 One x variable wiith y. or. SSE MSE n k 1. s . j. (15.11) Polynomial Population Regression Model pg. 662. y b0 b1x b2x2 . . . bpxp ε (15.12) Partial-F Test Statistic pg. 672. Sum of squares regression SSR Total sum of squuares SST. F. ( SSE R SSEC ) / (c r ) MSEC. (15.5) F-Test Statistic pg. 643 (15.13) Residual pg. 689. SSR k F SSE n k 1. ei yi yˆi (15.14) Standardized Residual for Linear Regression pg. 695. se . (15.6) Adjusted R-Squared pg. 644. i. ⎛ n 1 ⎞ R-sq(adj) RA2 1 (1 R 2 ) ⎜ ⎝ n k 1 ⎟⎠. ei 1 s 1 n. ( xi x ) 2 ∑ x2 . (∑ x )2 n. (15.7) t-Test for Significance of Each Regression Coefficient pg. 645. t. bj 0 sb. df n k 1. j. Key Terms Adjusted R-squared pg. 644 Coefficient of partial determination pg. 678 Composite model pg. 669 Correlation coefficient pg. 638. Correlation matrix pg. 638 Dummy variable pg. 654 Interaction pg. 669 Model pg. 636 Multicollinearity pg. 647. Chapter Exercises Conceptual Questions 15-60. Go to the library or use the Internet to locate three articles using a regression model with more than one independent variable. For each article write a short summary covering the following points:. Multiple coefficient of determination (R2) pg. 642 Regression hyperplane pg. 635 Residual pg. 689 Variance inflation factor (VIF) pg. 648. MyStatLab Purpose for using the model How the variables in the model were selected How the data in the model were selected Any possible violations of the needed assumptions The conclusions drawn from using the model.

(241) 702. CHAPTER 15. |. Multiple Regression Analysis and Model Building. 15-61. Discuss in your own terms the similarities and differences between simple linear regression analysis and multiple regression analysis. 15-62. Discuss what is meant by the least squares criterion as it pertains to multiple regression analysis. Is the least squares criterion any different for simple regression analysis? Discuss. 15-63. List the basic assumptions of regression analysis and discuss in your own terms what each means. 15-64. What does it mean if we have developed a multiple regression model and have concluded that the model is apt? 15-65. Consider the following model: yˆ 5 3x1 5x2 a. Provide an interpretation of the coefficient of x1. b. Is the interpretation provided in part a true regardless of the value of x2? Explain. c. Now consider the model yˆ 5 3x1 5x2 4x1x2. Let x2 1. Give an interpretation of the coefficient of x1 when x2 1. d. Repeat part c when x2 2. Is the interpretation provided in part a true regardless of the value of x2? Explain. e. Considering your answers to parts c and d, what type of regression components has conditional interpretations?. Computer Database Exercises 15-66. Amazon.com has become one of the most successful online merchants. Two measures of its success are sales and net income/loss figures. They are given here. Year. Net Income/Loss. Sales. 0.3 5.7. 0.5 15.7. 1997. 27.5. 147.7. 1998. 124.5. 609.8. 1999. 719.9. 1,639.8. 2000. 1,411.2. 2,761.9. 2001. 567.3. 3,122.9. 2002. 149.1. 3,933. 2003. 35.3. 1995 1996. 5,263.7. 2004. 588.5. 6,921. 2005. 359. 8,490. 2006. 190. 10,711. 2007. 476. 14,835. a. Produce a scatter plot for Amazon’s net income/loss and sales figures for the period 1995 to 2007. Determine the order (or degree) of the polynomial that. could be used to predict Amazon’s net income/loss using sales figures for the period 1995 to 2007. b. To simplify the analysis, consider only the values from 1995–2004. Produce the polynomial indicated by this data. c. Test to determine whether the overall model from part b is statistically significant. Use a significance level of 0.10. d. Conduct a hypothesis test to determine if curvature exists in the model that predicts Amazon’s net income/loss using sales figures from part b. Use a significance level of 0.02 and the test statistic approach. The following information applies to Exercises 15-67, 15-68, and 15-69. A publishing company in New York is attempting to develop a model that it can use to help predict textbook sales for books it is considering for future publication. The marketing department has collected data on several variables from a random sample of 15 books. These data are given in the file Textbooks. 15-67. Develop the correlation matrix showing the correlation between all possible pairs of variables. Test statistically to determine which independent variables are significantly correlated with the dependent variable, book sales. Use a significance level of 0.05. 15-68. Develop a multiple regression model containing all four independent variables. Show clearly the regression coefficients. Write a short report discussing the model. In your report make sure you cover the following issues: a. How much of the total variation in book sales can be explained by these four independent variables? Would you conclude that the model is significant at the 0.05 level? b. Develop a 95% confidence interval for each regression coefficient and interpret these confidence intervals. c. Which of the independent variables can you conclude to be significant in explaining the variation in book sales? Test using a 0.05. d. How much of the variation in the dependent variable is explained by the independent variable? Is the model statistically significant at the a 0.01 level? Discuss. e. How much, if at all, does adding one more page to the book impact the sales volume of the book? Develop and interpret a 95% confidence interval estimate to answer this question. f. Perform the appropriate analysis to determine the aptness of this regression model. Discuss your results and conclusions. 15-69. The publishing company recently came up with some additional data for the 15 books in the original sample. Two new variables, production expenditures (x5) and.

(242) CHAPTER 15. number of prepublication reviewers (x6), have been added. These additional data are as follows:. |. Multiple Regression Analysis and Model Building. 703. Twenty-five customers were selected at random, and values for the following variables were recorded in the file called McCracken:. Book. x5 ($). x6. Book. x5 ($). x6. 1 2 3 4 5 6 7 8. 38,000 86,000 59,000 80,000 29,500 31,000 40,000 69,000. 5 8 3 9 3 3 5 4. 9 10 11 12 13 14 15. 51,000 34,000 20,000 80,000 60,000 87,000 29,000. 4 6 2 5 5 8 3. Incorporating these additional data, calculate the correlation between each of these additional variables and the dependent variable, book sales. a. Test the significance of the correlation coefficients, using a 0.05. Comment on your results. b. Develop a multiple regression model that includes all six independent variables. Which, if any, variables would you recommend be retained if this model is going to be used to predict book sales for the publishing company? For any statistical tests you might perform, use a significance level of 0.05. Discuss your results. c. Use the F-test approach to test the null hypothesis that all slope coefficients are 0. Test with a significance level of 0.05. What do these results mean? Discuss. d. Do multicollinearity problems appear to be present in the model? Discuss the potential consequences of multicollinearity with respect to the regression model. e. Discuss whether the standard error of the estimate is small enough to make this model useful for predicting the sales of textbooks. f. Plot the residuals against the predicted value of y and comment on what this plot means relative to the aptness of the model. g. Compute the standardized residuals and form these into a frequency histogram. What does this indicate about the normality assumption? h. Comment on the overall aptness of this model and indicate what might be done to improve the model. The following information applies to Exercises 15-70 through 15-79. The J. J. McCracken Company has authorized its marketing research department to make a study of customers who have been issued a McCracken charge card. The marketing research department hopes to identify the significant variables that explain the variation in purchases. Once these variables are determined, the department intends to try to attract new customers who would be predicted to make a high volume of purchases.. y Average monthly purchases (in dollars) at McCracken x1 Customer age x2 Customer family income x3 Family size 15-70. A first step in regression analysis often involves developing a scatter plot of the data. Develop the scatter plots of all the possible pairs of variables, and with a brief statement indicate what each plot says about the relationship between the two variables. 15-71. Compute the correlation matrix for these data. Develop the decision rule for testing the significance of each coefficient. Which, if any, correlations are not significant? Use a 0.05. 15-72. Use forward selection stepwise regression to develop the multiple regression model. The variable x2, family income, was brought into the model. Discuss why this happened. 15-73. Test the significance of the regression model at Step 1 of the process. Justify the significance level you have selected. 15-74. Develop a 95% confidence level for the slope coefficient for the family income variable at Step 1 of the model. Be sure to interpret this confidence interval. 15-75. Describe the regression model at Step 2 of the analysis. In your discussion, be sure to discuss the effect of adding a new variable on the standard error of the estimate and on R2. 15-76. Referring to Problem 15-75, suppose the manager of McCracken’s marketing department questions the appropriateness of adding a second variable. How would you respond to her question? 15-77. Looking carefully at the stepwise regression model, you can see that the value of the slope coefficient for variable x2, family income, changes as a new variable is added to the regression model. Discuss why this change takes place. 15-78. Analyze the stepwise regression model. Write a report to the marketing manager pointing out the strengths and weaknesses of the model. Be sure to comment on the department’s goal of being able to use the model to predict which customers will purchase high volumes from McCracken. 15-79. Plot the residuals against the predicted value of y and comment on what this plot means relative to the aptness of the model. a. Compute the standardized residuals and form these in a frequency histogram. What does this indicate about the normality assumption? b. Comment on the overall aptness of this model and indicate what might be done to improve the model. 15-80. The National Association of Realtors Existing-Home Sales Series provides a measurement of the residential.

(243) 704. CHAPTER 15. |. Multiple Regression Analysis and Model Building. real estate market. One of the measurements it produces is the Housing Affordability Index (HAI), which is a measure of the financial ability of U.S. families to buy a house. A value of 100 means that families earning the national median income have just the amount of money needed to qualify for a mortgage on a median-priced home; higher than 100 means they have more than enough, and lower than 100 means they have less than enough. The file entitled Index contains the HAI and associated variables. a. Produce the correlation matrix of all the variables. Predict the variables that will remain in the estimated regression equation if standard stepwise regression is used. b. Use standard stepwise regression to develop an estimate of a model that is to predict the HAI from the associated variables found in the file entitled Index. c. Compare the results of parts a and b. Explain any difference between the two models. 15-81. An investment analyst collected data from 20 randomly chosen companies. The data consisted of the 52-weekhigh stock prices, PE ratios, and the market values of the companies. This data are in the file entitled Investment. The analyst wishes to produce a regression equation to predict the market value using the 52-week-high stock price and the PE ratio of the company. He creates a complete second-degree polynomial. a. Construct an estimate of the regression equation using the indicated variables. b. Produce the appropriate residual plots to determine if the polynomial function is the appropriate regression function for this data set. c. Use a residual plot to determine if the residuals have a constant variance. d. Produce the appropriate residual plot to determine if the residuals are independent. Assume the data were extracted in the order listed. e. Construct a probability plot to determine if the error terms are normally distributed. 15-82. The consumer price index (CPI) is a measure of the average change in prices over time in a fixed market basket of goods and services typically purchased by consumers. One of the items in this market basket that affects the CPI is the price of oil and its derivatives. The file entitled Consumer contains the price of the derivatives of oil and the CPI adjusted to 2005 levels. a. Produce a multiple regression equation depicting the relationship between the CPI and the price of the derivatives of oil. b. Conduct a t-test on the coefficient that has the highest p-value. Use a significance level of 0.02 and the p-value approach. c. Produce a multiple regression equation depicting the relationship between the CPI and the price of the derivatives of oil leaving out the variable tested in part b. d. Referring to the regression results in part c, repeat the tests indicated in part b.. e. Perform a test of hypothesis to determine if the resulting overall model is statistically significant. Use a significance level of 0.02 and the p-value approach. 15-83. Badeaux Brothers Louisiana Treats ships packages of Louisiana coffee, cakes, and Cajun spices to individual customers around the United States. The cost to ship these products depends primarily on the weight of the package being shipped. Badeaux charges the customers for shipping and then ships the product itself. As a part of a study of whether it is economically feasible to continue to ship products themselves, Badeaux sampled 20 recent shipments to determine what if any relationship exists between shipping costs and package weight. The data are contained in the file Badeaux. a. Develop a scatter plot of the data with the dependent variable, cost, on the vertical axis and the independent variable, weight, on the horizontal axis. Does there appear to be a relationship between the two variables? Is the relationship linear? b. Compute the sample correlation coefficient between the two variables. Conduct a test, using an alpha value of 0.05, to determine whether the population correlation coefficient is significantly different from zero. c. Determine the simple linear regression model for this data. Plot the simple linear regression model together with the data. Would a nonlinear model better fit the sample data? d. Now develop a nonlinear model and plot the model against the data. Does the nonlinear model provide a better fit than the linear model developed in part c? 15-84. The State Tax Commission must download information files each morning. The time to download the files primarily depends on the size of the file. The Tax Commission has asked your computer consulting firm to determine what, if any, relationship exists between download time and size of files. The Tax Commission randomly selected a sample of days and provided the information contained in the file Tax Commission. a. Develop a scatter plot of the data with the dependent variable, download time, on the vertical axis and the independent variable, size, on the horizontal axis. Does there appear to be a relationship between the two variables? Is the relationship linear? b. Compute the sample correlation coefficient between the two variables. Conduct a test, using an alpha value of 0.05, to determine whether the population correlation coefficient is significantly different from zero. c. Determine the simple linear regression model for these data. Plot the simple linear regression model together with the data. Would a nonlinear model better fit the sample data? d. Now determine a nonlinear model and plot the model against the data. Does the nonlinear model provide a better fit than the linear model developed in part c? 15-85. Refer to the State Department of Transportation data set called Liabins. The department was interested in determining the rate of compliance with the state’s.

(244) CHAPTER 15. mandatory liability insurance law, as well as other things. Assume the data were collected using a simple random sampling process. Develop the best possible linear regression model using vehicle year as the dependent variable and any or all of the other variables as potential independent variables. Assume that your objective is to. |. Multiple Regression Analysis and Model Building. 705. develop a predictive model. Write a report that discusses the steps you took to develop the final model. Include a correlation matrix and all appropriate statistical tests. Use an a 0.05. If you are using a nominal or ordinal variable, remember that you must make sure it is in the form of one or more dummy variables.. Case 15.1 Dynamic Scales, Inc. In 2005, Stanley Ahlon and three financial partners formed Dynamic Scales, Inc. The company was based on an idea Stanley had for developing a scale to weigh trucks in motion and thus eliminate the need for every truck to stop at weigh stations along highways. This dynamic scale would be placed in the highway approximately one-quarter mile from the regular weigh station. The scale would have a minicomputer that would automatically record truck speed, axle weights, and climate variables, including temperature, wind, and moisture. Stanley Ahlon and his partners believed that state transportation departments in the United States would be the primary market for such a scale. As with many technological advances, developing the dynamic scale has been difficult. When the scale finally proved accurate for trucks traveling 40 miles per hour, it would not perform for trucks traveling at higher speeds. However, eight months ago, Stanley TABLE 15.3. Month January. February. March. April. May. June. |. announced that the dynamic scale was ready to be field-tested by the Nebraska State Department of Transportation under a grant from the federal government. Stanley explained to his financial partners, and to Nebraska transportation officials, that the dynamic weight would not exactly equal the static weight (truck weight on a static scale). However, he was sure a statistical relationship between dynamic weight and static weight could be determined, which would make the dynamic scale useful. Nebraska officials, along with people from Dynamic Scales, installed a dynamic scale on a major highway in Nebraska. Each month for six months data were collected for a random sample of trucks weighed on both the dynamic scale and a static scale. Table 15.3 presents these data. Once the data were collected, the next step was to determine whether, based on this test, the dynamic scale measurements could be used to predict static weights. A complete report will be submitted to the U.S. government and to Dynamic Scales.. Test Data for the Dynamic Scales Example Front-Axle Static Weight (lb.) 1,800 1,311 1,504 1,388 1,250 2,102 1,410 1,000 1,430 1,073 1,502 1,721 1,113 978 1,254 994 1,127 1,406 875 1,350 1,102 1,240 1,087 993 1,408 1,420 1,808 1,401 933 1,150. Front-Axle Dynamic Weight (lb.). Truck Speed (mph). Temperature (°F). Moisture (%). 1,625 1,904 1,390 1,402 1,100 1,950 1,475 1,103 1,387 948 1,493 1,902 1,415 983 1,149 1,052 999 1,404 900 1,275 1,120 1,253 1,040 1,102 1,400 1,404 1,790 1,396 1,004 1,127. 52 71 48 50 61 55 58 59 43 59 62 67 48 59 60 58 52 59 47 68 55 57 62 59 67 58 54 49 62 64. 21 17 13 19 24 26 32 38 24 18 34 36 42 29 48 37 34 40 48 51 52 57 63 62 68 70 71 83 88 81. 0.00 0.15 0.40 0.10 0.00 0.10 0.20 0.15 0.00 0.40 0.00 0.00 0.21 0.32 0.00 0.00 0.21 0.40 0.00 0.00 0.00 0.00 0.00 0.10 0.00 0.00 0.00 0.00 0.40 0.00.

(245) 706. CHAPTER 15. |. Multiple Regression Analysis and Model Building. Case 15.2 Glaser Machine Works Glaser Machine Works has experienced a significant change in its business operations over the past 50 years. Glaser started business as a machine shop that produced specialty tools and products for the timber and lumber industry. This was a logical fit, given its location in the southern part of the United States. However, over the years Glaser looked to expand its offerings beyond the lumber and timber industry. Initially, its small size coupled with its rural location made it difficult to attract the attention of large companies that could use its products. All of that began to change as Glaser developed the ability not only to fabricate parts and tools but also to assemble products for customers who needed special components in large quantities. Glaser’s business really took off when first foreign, and then domestic, automakers began to build automobile plants in the southern United States. Glaser was able to provide quality parts quickly for firms that expected high quality and responsive delivery. Many of Glaser’s customers operated with little inventory and required that suppliers be able to provide shipments with short lead times. As part of its relationship with the automobile industry, Glaser was expected to buy into the lean manufacturing and quality improvement initiatives of its customers. Glaser had always prided itself on its quality, but as the number and variety of its products increased, along with ever higher expectations by its customers, Glaser knew that it would have to respond by ensuring its quality and operations were continually improving. Of recent concern was the performance of its manufacturing line 107B. This line produced a component part for a Japanese automobile company. The Japanese firm had initially been pleased with Glaser’s performance, but lately the number of defects was approaching an unacceptable level. Managers of the 107B line knew the line and its workers had been asked to ramp up production to meet increased demand and that some workers were concerned with the amount of. overtime being required. There was also concern about the second shift now being run at 107B. Glaser had initially run only one shift, but when demand for its product became so high that there was not sufficient capacity with one shift, additional workers were hired to operate a night shift. Management was wondering if the new shift had been stretched beyond its capabilities. Glaser plant management asked Kristi Johnson, the assistant production supervisor for line 107B, to conduct an analysis of product defects for the line. Kristi randomly selected several days of output and counted the number of defective parts produced on the 107B line. This information, along with other data, is contained in the file Glaser Machine Works. Kristi promised to have a full report for the management team by the end of the month.. Required Tasks: 1. Identify the primary issue of the case. 2. Identify a statistical model you might use to help analyze the case. 3. Develop a multiple regression model that can be used to help Kristi Johnson analyze the product defects for line 107B. Be sure to carefully specify the dependent variable and the independent variables. 4. Discuss how the variables overtime hours, supervisor training, and shift will be modeled. 5. Run the regression model you developed and interpret the results. 6. Which variables are significant? 7. Provide a short report that describes your analysis and explains in managerial terms the findings of your model. Be sure to explain which variables, if any, are significant explanatory variables. Provide a recommendation to management.. Case 15.3 Hawlins Manufacturing Ross Hawlins had done it all at Hawlins Manufacturing, a company founded by his grandfather 63 years ago. Among his many duties, Ross oversaw all the plant’s operations, a task that had grown in responsibility given the company’s rapid growth over the past three decades. When Ross’s grandfather founded the company, there were only two manufacturing sites. Expansion and acquisition of competitors over the years had caused that number to grow to over 50 manufacturing plants in 18 states. Hawlins had a simple process that produced only two products, but the demand for these products was strong and Ross had spent millions of dollars upgrading his facilities over the past decade. Consequently, most of the company’s equipment was less than 10 years old on average. Hawlins’s two products were produced for local markets, as prohibitive shipping costs prevented shipping the product long distances. Product demand was sufficiently strong to support two manufacturing shifts (day and night). at every plant, and every plant had the capability to produce both products sold by Hawlins. Recently, the management team at Hawlins noticed that there were differences in output levels across the various plants. They were uncertain what, if anything, might explain these differences. Clearly, if some plants were more productive than others, there might be some meaningful insights that could be standardized across plants to boost overall productivity. Ross Hawlins asked Lisa Chandler, an industrial engineer at the company’s headquarters, to conduct a study of the plant’s productivity. Lisa randomly sampled 159 weeks of output from various plants together with the number of plant employees working that week, the plants’ average age in years, the product mix produced that week (either product A or B), and whether the output was from the day or night shift. The sampled data are contained in the file Hawlins Manufacturing. The Hawlins management team is expecting a written report and a presentation by Lisa when it meets again next Tuesday..

(246) CHAPTER 15. Required Tasks: 1. Identify the primary issue of the case. 2. Identify a statistical model you might use to help analyze the case. 3. Develop a multiple regression model for Lisa Chandler. Be sure to carefully specify the dependent variable and the independent variables. 4. Discuss how the type of product (A or B) and the shift (Day or Night) can be included in the regression model.. |. Multiple Regression Analysis and Model Building. 707. 5. Run the regression model you developed and interpret the results. 6. Which variables are significant? 7. Provide a short report that describes your analysis and explains in management terms the findings of your model. Be sure to explain which variables, if any, are significant explanatory variables. Provide a recommendation to management.. Case 15.4 Sapphire Coffee—Part 2 Jennie Garcia could not believe that her career had moved so far so fast. When she left graduate school with a master’s degree in anthropology, she intended to work at a local coffee shop until something else came along that was more related to her academic background. But after a few months, she came to enjoy the business, and in a little over a year she was promoted to store manager. When the company for whom she worked continued to grow, Jennie was given oversight of a few stores. Now, eight years after she started as a barista, Jennie is in charge of operations and planning for the company’s southern region. As a part of her responsibilities, Jennie tracks store revenues and forecasts coffee demand. Historically, Sapphire Coffee would base its demand forecast on the number of stores, believing that each store sold approximately the same amount of coffee. This approach seemed to work well when the company had shops of similar size and layout, but as the company grew, stores became more varied. Now, some stores have drive-thru windows, a feature that top management added to some stores believing that it would increase coffee sales for customers who wanted a cup of coffee on their way to work but who were too rushed to park and enter the store.. Jennie noticed that weekly sales seemed to be more variable across stores in her region and was wondering what, if anything, might explain the differences. The company’s financial vice president had also noticed the increased differences in sales across stores and was wondering what might be happening. In an e-mail to Jennie he stated that weekly store sales are expected to average $5.00 per square foot. Thus, a 1,000-square-foot store would have average weekly sales of $5,000. He asked that Jennie analyze the stores in her region to see if this rule of thumb was a reliable measure of a store’s performance. Jennie had been in the business long enough to know that a store’s size, although an important factor, was not the only thing that might influence sales. She had never been convinced of the efficacy of the drive-thru window, believing that it detracted from the coffee house experience that so many of Sapphire Coffee customers had come to expect. The VP of finance was expecting the analysis to be completed by the weekend. Jennie decided to randomly select weekly sales records for 53 stores, along with each store’s size, whether it was located close to a college, and whether it had a drive-thru window. The data are in the file Sapphire Coffee-2. A full analysis would need to be sent to the corporate office by Friday.. Case 15.5 Wendell Motors Wendell Motors manufactures and ships small electric motors and drives to a variety of industrial and commercial customers in and around St. Louis. Wendell is a small operation with a single manufacturing plant. Wendell’s products are different from other motor and drive manufacturers because Wendell only produces small motors (25 horsepower or less) and because its products are used in a variety of industries and businesses that appreciate Wendell’s quality and speed of delivery. Because it has only one plant, Wendell ships motors directly from the plant to its customers. Wendell’s reputation for quality and speed of delivery allows it to maintain low inventories of motors and to ship make-to-order products directly. As part of its ongoing commitment to lean manufacturing and continuous process improvement, Wendell carefully monitors the cost associated with both production and shipping. The manager of shipping for Wendell, Tyler Jenkins, regularly reports the shipping. costs to Wendell’s management team. Because few finished goods inventories are maintained, competitive delivery times often require that Wendell expedite shipments. This is almost always the case for those customers who operate their business around the clock every day of the week. Such customers might maintain their own backup safety stock of a particular motor or drive, but circumstances often result in cases where replacement products have to be rushed through production and then expedited to the customer. Wendell’s management team wondered if these special orders were too expensive to handle in this way and if it might be less expensive to produce and hold certain motors as finished goods inventory, enabling off-the-shelf delivery using less expensive modes of shipping. This might especially be true for orders that must be filled on a holiday, incurring an additional shipping charge. At the last meeting of the management team, Tyler Jenkins was asked to analyze expedited shipping costs and to develop a model that could be used to estimate the cost of expediting a customer’s order..

(247) 708. CHAPTER 15. |. Multiple Regression Analysis and Model Building. Donna Layton, an industrial engineer in the plant, was asked to prepare an inventory cost analysis to determine the expenses of holding additional finished goods inventory. Tyler began his analysis by randomly selecting 45 expedited shipping records. The sampled data can be found in the file Wendell Motors.. The management team expects a full report in five days. Tyler knew he would need a model for explaining shipping costs for expedited orders and that he would also need to answer the questions as to what effect, if any, shipping on a holiday had on costs.. References Berenson, Mark L., and David M. Levine, Basic Business Statistics: Concepts and Applications, 11th ed. (Upper Saddle River, NJ: Prentice Hall, 2009). Bowerman, Bruce L., and Richard T. O’Connell, Linear Statistical Models: An Applied Approach, 2nd ed. (Belmont, CA: Duxbury Press, 1990). Cryer, Jonathan D., and Robert B. Miller, Statistics for Business: Data Analysis and Modeling, 2nd ed. (Belmont, CA: Duxbury Press, 1994). Demmert, Henry, and Marshall Medoff, “Game-Specific Factors and Major League Baseball Attendance: An Econometric Study,” Santa Clara Business Review (1977), pp. 49–56. Dielman, Terry E., Applied Regression Analysis: A Second Course in Business and Economic Statistics, 4th ed. (Belmont, CA: Duxbury Press, 2005). Draper, Norman R., and Harry Smith, Applied Regression Analysis, 3rd ed. (New York City: John Wiley and Sons, 1998). Frees, Edward W., Data Analysis Using Regression Models: The Business Perspective (Upper Saddle River, NJ: Prentice Hall, 1996). Gloudemans, Robert J., and Dennis Miller, “Multiple Regression Analysis Applied to Residential Properties.” Decision Sciences 7 (April 1976), pp. 294–304. Kleinbaum, David G., Lawrence L. Kupper, Azhar Nizam, and Keith E. Muller, Applied Regression Analysis and Multivariable Methods, 4th ed. (Florence, KY: Cengage Learning, 2008). Kutner, Michael H., Christopher J. Nachtsheim, John Neter, and William Li, Applied Linear Statistical Models, 5th ed. (New York City: McGraw-Hill Irwin, 2005). Microsoft Excel 2007 (Redmond, WA: Microsoft Corp. 2007). Minitab for Windows Version 15 (State College, PA: Minitab, 2007)..

(248) • Review the steps used to develop a line chart discussed in Chapter 2.. • Make sure you understand the steps necessary to construct and interpret linear and nonlinear regression models in Chapters 14 and 15.. • Review the concepts and properties associated with means discussed in Chapter 3.. chapter 16. Chapter 16 Quick Prep Links. Analyzing and Forecasting Time-Series Data 16.1 Introduction to Forecasting, Time-Series Data, and Index Numbers (pg. 710–723). Outcome 1. Identify the components present in a time series.. 16.2 Trend-Based Forecasting Techniques (pg. 724–749). Outcome 3. Apply the fundamental steps in developing and implementing forecasting models.. Outcome 2. Understand and compute basic index numbers.. Outcome 4. Apply trend-based forecasting models, including linear trend, nonlinear trend, and seasonally adjusted trend.. 16.3 Forecasting Using Smoothing Methods. Outcome 5. Use smoothing-based forecasting models, including single and double exponential smoothing.. (pg. 750–761). Why you need to know No organization, large or small, can function effectively without a forecast for the goods or services it provides. A retail clothing store must forecast the demand for the shirts it sells by shirt size. The concessionaire at Dodger Stadium in Los Angeles must forecast each game’s attendance to determine how many soft drinks and Dodger dogs to have on hand. Your state’s elected officials must forecast tax revenues in order to establish a budget each year. These are only a few of the instances in which forecasting is required. For many organizations, the success of the forecasting effort will play a major role in determining the general success of the organization. When you graduate and join an organization in the public or private sectors, you will almost certainly be required to prepare forecasts or to use forecasts provided by someone else in the organization. You won’t have access to a crystal ball on which to rely for an accurate prediction of the future. Fortunately, if you have learned the material presented in this chapter, you will have a basic understanding of forecasting and of how and when to apply various forecasting techniques. We urge you to focus on the material and take with you the tools that will give you a competitive advantage over those who are not familiar with forecasting techniques.. 709.

(249) 710. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 16.1 Introduction to Forecasting,. Time-Series Data, and Index Numbers Decision makers often confuse forecasting and planning. Planning is the process of determining how to deal with the future. On the other hand, forecasting is the process of predicting what the future will be like. Forecasts are used as inputs for the planning process. Experts agree that good planning is essential for an organization to be effective. Because forecasts are an important part of the planning process, you need to be familiar with forecasting methods. There are two broad categories of forecasting techniques: qualitative and quantitative. Qualitative forecasting techniques are based on expert opinion and judgment. Quantitative forecasting techniques are based on statistical methods for analyzing quantitative historical data. This chapter focuses on quantitative forecasting techniques. In general, quantitative forecasting techniques are used whenever the following conditions are true: Historical data relating to the variable to be forecast exist, the historical data can be quantified, and you are willing to assume that the historical pattern will continue into the future.. General Forecasting Issues. Model Specification The process of selecting the forecasting technique to be used in a particular situation.. Model Fitting The process of estimating the specified model’s parameters to achieve an adequate fit of the historical data.. Model Diagnosis The process of determining how well a model fits past data and how well the model’s assumptions appear to be satisfied.. Forecasting Horizon The number of future periods covered by a forecast. It is sometimes referred to as forecast lead time.. Decision makers who are actively involved in forecasting frequently say that forecasting is both an art and a science. Operationally, the forecaster is engaged in the process of modeling a real-world system. Determining the appropriate forecasting model is a challenging task, but it can be made manageable by employing the same model-building process discussed in Chapter 15 consisting of model specification, model fitting, and model diagnosis. As we will point out in later sections, guidelines exist for determining which techniques may be more appropriate than others in certain situations. However, you may have to specify (and try) several model forms for a given situation before deciding on one that is acceptable. The idea is that if the future tends to look like the past, a model should adequately fit the past data to have a reasonable chance of forecasting the future. As a forecaster, you will spend much time selecting a model’s specification and estimating its parameters to reach an acceptable fit of the past data. You will need to determine how well a model fits past data, how well it performs in mock forecasting trials, and how well its assumptions appear to be satisfied. If the model is unacceptable in any of these areas, you will be forced to revert to the model specification step and begin again. An important consideration when you are developing a forecasting model is to use the simplest available model that will meet your forecasting needs. The objective of forecasting is to provide good forecasts. You do not need to feel that a sophisticated approach is better if a simpler one will provide acceptable forecasts. As in football, in which some players specialize in defense and others in offense, forecasting techniques have been developed for special situations, which are generally dependent on the forecasting horizon. For the purpose of categorizing forecasting techniques in most business situations, the forecast horizon, or lead time, is typically divided into four categories: 1. 2. 3. 4.. Forecasting Period The unit of time for which forecasts are to be made.. Forecasting Interval The frequency with which new forecasts are prepared.. Immediate term—less than one month Short term—one to three months Medium term—three months to two years Long term—two years or more. As we introduce various forecasting techniques, we will indicate the forecasting horizon(s) for which each is typically best suited. In addition to determining the desired forecasting horizon, the forecaster must determine the forecasting period. For instance, the forecasting period might be a day, a week, a month, a quarter, or a year. Thus, the forecasting horizon consists of one or more forecasting periods. If quantitative forecasting techniques are to be employed, historical quantitative data must be available for a similar period. If we want weekly forecasts, weekly historical data must be available. The forecasting interval is generally the same length as the forecast period. That is, if the forecast period is one week, then we will provide a new forecast each week..

(250) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 711. Components of a Time Series Quantitative forecasting models have one factor in common: They use past measurements of the variable of interest to generate a forecast of the future. The past data, measured over time, are called time-series data. The decision maker who plans to develop a quantitative forecasting model must analyze the relevant time-series data. Chapter Outcome 1.. BUSINESS APPLICATION. IDENTIFYING TIME-SERIES COMPONENTS. WEB SITE DESIGN AND CONSULTING For the past four years, White-Space, Inc., has been helping firms to design and implement Web sites. The owners need to forecast revenues in order to make sure they have ample cash flows to operate the business. In forecasting this company’s revenue for next year, they plan to consider the historical pattern over the prior four years. They want to know whether demand for consulting services has tended to increase or decrease and whether there have been particular times during the year when demand was typically higher than at other times. The forecasters can perform a time-series analysis of the historical sales. Table 16.1 presents the time-series data for the revenue generated by the firm’s sales for the four-year period. An effective means for analyzing these data is to develop a time-series plot, or line chart, as shown in Figure 16.1. By graphing the data, much can be observed about the firm’s revenue over the past four years. The time-series plot is an important tool in identifying the time-series components. All time-series data exhibit one or more of the following: 1. 2. 3. 4.. Linear Trend A long-term increase or decrease in a time series in which the rate of change is relatively constant.. Trend component Seasonal component Cyclical component Random component. Trend Component A trend is the long-term increase or decrease in a variable being measured over time. Figure 16.1 shows that White-Space’s revenues exhibited an upward trend over the four-year period. In other situations, the time series may exhibit a downward trend. Trends can be classified as linear or nonlinear. A trend can be observed when a time series is measured in any time increment, such as years, quarters, months, or days. Figure 16.1 shows a good example of a positive linear trend. Time-series data that exhibit a linear trend will tend to increase or decrease at a fairly constant rate. However, not all trends are linear.. TABLE 16.1. |. Time-Series Data for Sales Revenues (Thousands of Dollars) Billing Total. Month. 2006. 2007. 2008. 2009. January. 170. 390. 500. 750. February. 200. 350. 470. 700. March. 190. 300. 510. 680. April. 220. 320. 480. 710. May. 180. 310. 530. 710. June. 230. 350. 500. 660. July. 220. 380. 540. 630. August. 260. 420. 580. 670. September. 300. 460. 630. 700. October. 330. 500. 690. 720. November. 370. 540. 770. 850. December. 390. 560. 760. 880.

(251) 712. CHAPTER 16. FIGURE 16.1. |. Analyzing and Forecasting Time-Series Data. |. BILLINGS. 1,000. Time-Series Plot for Billing Data. 900. $ in 1,000s. 800 700 600 500 400 300 200 100 0 January 2006. July. January 2007. July. January 2008. July. January 2009. July. Many time series will show a nonlinear trend. For instance in the 8 years between 2001 and 2008, total annual game attendance for the New York Yankees Major League baseball team is shown in Figure 16.2. Attendance was fairly flat between 2001 and 2003, increased dramatically between 2003 and 2006, and then slowed down again through 2008.. Seasonal Component A wavelike pattern that is repeated throughout a time series and has a recurrence period of at most one year.. FIGURE 16.2. |. New York Yankees Annual Attendance Showing a Nonlinear Trend. Seasonal Component Another component that may be present in time-series data is the seasonal component. Many time series show a repeating pattern over time. For instance, Figure 16.1 showed a time series that exhibits a wavelike pattern. This pattern repeats itself throughout the time series. Web site consulting revenues reach an annual maximum around January and then decline to an annual minimum around April. This pattern repeats itself every 12 months. The shortest period of repetition for a pattern is known as its recurrence period. A seasonal component’s recurrence period is at most one year. If the time series exhibits a repetitious pattern with a recurrence period longer than a year, the time series is said to exhibit a cyclical effect—a concept to be explored shortly. In analyzing past sales data for a retail toy store, we would expect to see sales increase in the months leading into Christmas and then substantially decrease after Christmas. Automobile.

(252) CHAPTER 16. FIGURE 16.3. |. |. Analyzing and Forecasting Time-Series Data. 713. 4,500. Hotel Sales by Quarter 4,000. Sales in Millions. 3,500 3,000 2,500 2,000 1,500 1,000 500 0 Sum.’04 Wint.’04 Sum.’05 Wint.’05 Sum.’06 Wint.’06 Sum.’07 Wint.’07 Sum.’08 Wint.’08 Sum.’09 Wint.’09 Year. gasoline sales might show a seasonal increase during the summer months, when people drive more, and a decrease during the cold winter months. These predictable highs and lows at specific times during the year indicate seasonality in data. To view seasonality in a time series, the data must be measured quarterly, monthly, weekly, or daily. Annual data will not show seasonal patterns of highs and lows. Figure 16.3 shows quarterly sales data for a major hotel chain from June 2004 through December 2009. Notice that the data exhibit a definite seasonal pattern. The local maximums occur in the spring. The recurrence period of the component in the time series is, therefore, one year. The winter quarter tends to be low, whereas the following quarter (spring) is the high quarter each year. Seasonality can be observed in time-series data measured over time periods shorter than a year. For example, the number of checks processed daily by a bank may show predictable highs and lows at certain times during a month. The pattern of customers arriving at the bank during any hour may be “seasonal” within a day, with more customers arriving near opening time, around the lunch hour, and near closing time.. Cyclical Component A wavelike pattern within the time series that repeats itself throughout the time series and has a recurrence period of more than one year.. Random Component Changes in time-series data that are unpredictable and cannot be associated with a trend, seasonal, or cyclical component.. Cyclical Component If you observe time-series data over a long enough time span, you may see sustained periods of high values followed by periods of lower values. If the recurrence period of these fluctuations is larger than a year, the data are said to contain a cyclical component. National economic measures such as the unemployment rate, gross national product, stock market indexes, and personal saving rates tend to cycle. The cycles vary in length and magnitude. That is, some cyclical time series may have longer runs of high and low values than others. Also, some time series may exhibit deeper troughs and higher crests than others. Figure 16.4 shows quarterly housing starts in the United States between 1995 and 2006. Note the definite cyclical pattern, with low periods in 1995, 1997, and 2000. Although the pattern resembles the shape of a seasonal component, the length of the recurrence period identifies this pattern as being the result of a cyclical component. Random Component Although not all time series possess a trend, seasonal, or cyclical component, virtually all time series will have a random component. The random component is often referred to as “noise” in the data. A time series with no identifiable pattern is completely random and contains only noise. In addition to other components, each of the time series in Figures 16.1 through 16.4 contains random fluctuations. In the following sections of this chapter, you will see how various forecasting techniques deal with the time-series components. An important first step in forecasting is to identify which components are present in the time series to be analyzed. As we have shown, constructing a time-series plot is the first step in this process..

(253) 714. CHAPTER 16. FIGURE 16.4. |. Analyzing and Forecasting Time-Series Data. |. 2400. Time-Series Plot of Housing Starts Housing Starts (Millions). 2200. 2000. 1800. 1600. 1400. 1200 Month Jan. Year 1995. Chapter Outcome 2.. Base Period Index The time-series value to which all other values in the time series are compared. The index number for the base period is defined as 100.. Jan. 1996. Jan. 1997. Jan. 1998. Jan. 1999. Jan. 2000. Jan. 2001. Jan. 2002. Jan. 2003. Jan. 2004. Jan. 2005. Jan. 2006. Introduction to Index Numbers When analyzing time-series data, decision makers must often compare one value measured at one point in time with other values measured at different points in time. For example, a real estate broker may wish to compare house prices in 2009 with house prices in previous years. A common procedure for making relative comparisons is to begin by determining a base period index to which all other data values can be fairly compared. Equation 16.1 is used to make relative comparisons for data found in different periods by calculating a simple index number.. Simple Index Number It . yt 100 y0. (16.1). where: It Index number at time period t yt Value of the time series at time t y0 Value of the time series at the index base period. EXAMPLE 16-1. COMPUTING SIMPLE INDEX NUMBERS. Wilson Windows, Inc. The managers at Wilson Windows, Inc., are considering the purchase of a window and door plant in Wisconsin. The current owners of the window and door plant have touted their company’s rapid sales growth over the past 10 years as a reason for their asking price. Wilson executives wish to convert the company’s sales data to index numbers. The following steps can be used to do this: Step 1 Obtain the time-series data. The company has sales data for each of the 10 years since 2000. Step 2 Select a base period. Wilson managers have selected 2000 as the index base period. Sales in 2000 were $14.0 million..

(254) |. CHAPTER 16. Analyzing and Forecasting Time-Series Data. 715. Step 3 Compute the simple index numbers for each year using Equation 16.1. For instance, sales in 2001 were $15.2 million. Using Equation 16.1, the index for 2001 is It I 2001 . yt y0. 100. 15.2 100 108.6 14.0. For the 10 years, we get: Year. Sales ($ millions). Index. 2000 2001 2002 2003 2004 2005 2006 2007 2008. 14.0 15.2 17.8 21.4 24.6 30.5 29.8 32.4 37.2. 100.0 108.6 127.1 152.9 175.7 217.9 212.9 231.4 265.7. 2009. 39.1. 279.3 >>. END EXAMPLE. TRY PROBLEM 16-8 (pg. 722). Referring to Example 16-1, we can use the index numbers to determine the percentage change any year is from the base year. For instance, sales in 2007 have an index of 231.4. This means that sales in 2007 are 131.4% above sales in the base year of 2000. Sales in 2009 are 179.3% higher than they were in 2000. Note that although you can use the index number to compare values between any one time period and the base period and can express the difference in percentage-change terms, you cannot compare period-to-period changes by subtracting the index numbers. For instance, in Example 16-1, when comparing sales for 2008 and 2009, we cannot say that the growth has been 279.3 - 265.7 13.6% To determine the actual percentage growth, we do the following: 279.3 265.7 100 5.1% 265.7 Thus, the sales growth rate between 2008 and 2009 has been 5.1%, not 13.6%.. Aggregate Price Indexes. Aggregate Price Index An index that is used to measure the rate of change from a base period for a group of two or more items.. “The dollar’s not worth what it once was” is a saying that everyone has heard. The problem is that nothing is worth what it used to be; sometimes it is worth more, and other times it is worth less. The simple index shown in Equation 16.1 works well for comparing prices when we wish to analyze the price of a single item over time. For instance, we could use the simple index to analyze how apartment rents have changed over time or how college tuition has increased over time. However, if we wish to compare prices of a group of items, we might construct an aggregate price index..

(255) 716. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. Equation 16.2 is used to compute an unweighted aggregate price index. Unweighted Aggregate Price Index It . ∑ pt (100) ∑ p0. (16.2). where: It Unweighted aggregate index at time period t pt Sum of the prices for the group of items at time period t p0 Sum of the prices for the group of items at base time period. EXAMPLE 16-2. COMPUTING AN UNWEIGHTED AGGREGATE PRICE INDEX. College Costs There have been many news stories recently discussing the rate of growth of college and university costs. One university is interested in analyzing the growth in the total costs for students over the past five years. The university wishes to consider three main costs: tuition and fees, room and board, and books and supplies. Rather than analyzing these factors individually using three simple indexes, an unweighted aggregate price index can be developed using the following steps: Step 1 Define the variables to be included in the index and gather the time-series data. The university has identified three main categories of costs: tuition fees, room and board, and books and supplies. Data for the past five years have been collected. Full-time tuition and fees for two semesters are used. The full dormand-meal package offered by the university is priced for the room-and-board variable, and the books-and-supplies cost for a “typical” student are used for that component of the total costs. Step 2 Select a base period. The base period for this study will be the 2004–2005 academic year. Step 3 Use Equation 16.2 to compute the unweighted aggregate price index. The equation is It . ∑ pt (100) ∑ p0. The sum of the prices for the three components during the base academic year of 2004–2005 is $13,814. The sum of the prices in the 2008–2009 academic year is $19,492. Applying Equation 16.2, the unweighted aggregate price index is I 2008 − 2009 . $19, 492 (100) 141.1 $13, 814. This means, as a group, the components making up the cost of attending this university have increased by 41.1% since the 2004–2005 academic year. The indexes for the other years are shown as follows: Academic Year. Tuition & Fees ($). Room & Board ($). Books & Supplies ($). ∑ pt ($). Index. 2004–2005 2005–2006 2006–2007 2007–2008. 7,300 7,720 8,560 9,430. 5,650 5,980 6,350 6,590. 864 945 1,067 1,234. 13,814 14,645 15,977 17,254. 100.0 106.0 115.7 124.9. 2008–2009. 10,780. 7,245. 1,467. 19,492. 141.1. >>. END EXAMPLE. TRY PROBLEM 16-9 (pg. 722).

(256) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 717. Weighted Aggregate Price Indexes Example 16-2 utilized an unweighted aggregate price index to determine the change in university costs. This was appropriate because each student would incur the same set of three costs. However, in some situations, the items composing a total cost are not equally weighted. For instance, in a consumer price study of a “market basket” of 10 food items, a typical household will not use the same number (or volume) of each item. During a week, a typical household might use three gallons of milk, but only two loaves of bread. In these types of situations, we need to compute a weighted aggregate price index to account for the different levels of use. Two common weighted indexes are the Paasche Index and the Laspeyres Index. The Paasche Index Equation 16.3 is used to compute a Paasche Index. Note that the weighting percentage in Equation 16.3 for the Paasche Index is always the percentage for the time period for which the index is being computed. The idea is that the prices in the base period should be weighted relative to their current use, not to what that use level was in other periods.. Paasche Index It . ∑ qt pt (100) ∑ qt p0. (16.3). where: qt Weighting percentage at time t pt Price in time period t p0 Price in the base period. EXAMPLE 16-3. COMPUTING THE PAASCHE INDEX. Wage Rates Before a company makes a decision to locate a new manufacturing plant in a community, the managers will be interested in knowing how the wage rates have changed. Two categories of wages are to be analyzed as a package: production hourly wages and administrative/clerical hourly wages. Annual data showing the average hourly wage rates since 2000 are available. Each year, the makeup of the labor market differs in terms of the percentage of employees in the two categories. To compute a Paasche Index, use the following steps: Step 1 Define the variables to be included in the index and gather the time-series data. The variables are the mean hourly price for production workers and the mean average price for administrative/clerical workers. Data are collected for the 10-year period through 2009. Step 2 Select the base period. Because similar data for another community are available only back to 2003, the company will use 2003 as the base period to make comparisons between the two communities easier. Step 3 Use Equation 16.3 to compute the Paasche Index. The equation is It =. ∑ qt pt (100) ∑ qt p0. The hourly wage rate for production workers in the base year 2003 was $10.80, whereas the average hourly administrative/clerical rate was $10.25..

(257) 718. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. In 2009, the production hourly rate had increased to $15.45, and the administrative/clerical rate was $13.45. In 2009, 60% of the employees in the community were designated as working in production and 40% were administrative/clerical. Equation 16.3 is used to compute the Paasche Index for 2009, as follows: I 2009 . (0.60)($15.45) (0.40)($13.45) (100) 138.5 (0.60)($1 10.80) (0.40)($10.25). This means that, overall, the wage rates in this community have increased by 38.5% since the base year of 2003. The following table shows the Paasche Indexes for all years.. Year. Production Wage Rate ($). Percent Production. Administrative/Clerical Wage Rate ($). Percent Admin./Clerical. Paasche Index. 2000 2001 2002 2003 2004 2005 2006 2007 2008. 8.50 9.10 10.00 10.80 11.55 12.15 12.85 13.70 14.75. 0.78 0.73 0.69 0.71 0.68 0.67 0.65 0.65 0.62. 9.10 9.45 9.80 10.25 10.60 10.95 11.45 11.90 12.55. 0.22 0.27 0.31 0.29 0.32 0.33 0.35 0.35 0.38. 80.8 86.3 93.5 100.0 105.9 110.7 116.5 123.2 131.4. 2009. 15.45. 0.60. 13.45. 0.40. 138.5 >>. END EXAMPLE. TRY PROBLEM 16-12 (pg. 723). The Laspeyres Index The Paasche Index is computed using the logic that the index for the current period should be compared to a base period with the current period weightings. An alternate index, called the Laspeyres Index, uses the base-period weighting in its computation, as shown in Equation 16.4.. Laspeyres Index It . ∑ q0 pt (100) ∑ q0 p0. (16.4). where: q0 Weighting percentage at base period pt Price in time period t p0 Price in base period. EXAMPLE 16-4. COMPUTING THE LASPEYRES INDEX. Wage Rates Refer to Example 16-3, in which the managers of a company are interested in knowing how the wage rates have changed in the community in which they are considering building a plant. Two categories of wages are to be analyzed as a package: production hourly wages and administrative/clerical hourly wages. Annual data showing the average hourly wage rate since.

(258) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 719. 2000 are available. Each year, the makeup of the labor market differs in terms of the percentage of employees in the two categories. To compute a Laspeyres Index, use the following steps: Step 1 Define the variables to be included in the index and gather the time-series data. The variables are the mean hourly price for production workers and the mean average price for administrative/clerical workers. Data are collected for the 10-year period through 2009. Step 2 Select the base period. Because similar data for another community are available only back to 2003, the company will use 2003 as the base period to make comparisons between the two communities easier. Step 3 Use Equation 16.4 to compute the Laspeyres Index. The equation is It =. ∑ q0 pt (100) ∑ q0 p0. The hourly wage rate for production workers in the base year of 2003 was $10.80, whereas the average hourly administrative/clerical rate was $10.25. In that year, 71% of the workers were classified as production. In 2009, the production hourly rate had increased to $15.45, and the administrative/ clerical rate was at $13.45. Equation 16.4 is used to compute the Laspeyres Index for 2009, as follows: I 2009 . (0.71)($15.45) (0.29)($13.45) (100) 139.7 (0.71)($1 10.80) (0.29)($10.25). This means that, overall, the wage rates in this community have increased by 39.7% since the base year of 2003. The following table shows the Laspeyres Indexes for all years.. Year. Production Wage Rate ($). Percent Production. Administrative/Clerical Wage Rate ($). Percent Admin./Clerical. Laspeyres Index. 2000 2001 2002 2003 2004 2005 2006 2007 2008. 8.50 9.10 10.00 10.80 11.55 12.15 12.85 13.70 14.75. 0.78 0.73 0.69 0.71 0.68 0.67 0.65 0.65 0.62. 9.10 9.45 9.80 10.25 10.60 10.95 11.45 11.90 12.55. 0.22 0.27 0.31 0.29 0.32 0.33 0.35 0.35 0.38. 81.5 86.5 93.4 100.0 106.0 110.9 116.9 123.8 132.6. 2009. 15.45. 0.60. 13.45. 0.40. 139.7 >>. END EXAMPLE. TRY PROBLEM 16-13 (pg. 723). Commonly Used Index Numbers In addition to converting time-series data to index numbers, you will encounter a variety of indexes in your professional and personal life. Consumer Price Index To most of us, inflation has come to mean increased prices and less purchasing power for our dollar. The Consumer Price Index (CPI) attempts to measure the overall changes in retail prices for goods and services. The CPI, originally published in.

(259) 720. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. TABLE 16.2. |. CPI Index (1996 to 2008). Year 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 CPI 156.9 157.0 160.5 166.6 172.2 177.1 179.9 184.0 188.9 195.3 201.6 207.3 215.3 Base 1982 to 1984 (Index 100) Source: Bureau of Labor Statistics: 1913 by the U.S. Department of Labor, uses a “market basket” of goods and services purchased by a typical wage earner living in a city. The CPI, a weighted aggregate index similar to a Laspeyres Index, is based on items grouped into seven categories, including food, housing, clothing, transportation, medical care, entertainment, and miscellaneous items. The items in the market basket have changed over time to keep pace with the buying habits of our society and as new products and services have become available. Since 1945, the base period used to construct the CPI has been updated. Currently, the base period, 1982 to 1984, has an index of 100. Table 16.2 shows the CPI index values for 1996 to 2008. For instance, the index for 2005 is 195.3, which means that the price of the market basket of goods increased 95.3% between 1984 and 2005. Remember also that you cannot determine the inflation rate by subtracting index values for successive years. Instead, you must divide the difference by the earlier year’s index. For instance, the rate of inflation between 2004 and 2005 was Inflation rate . 195.3 188.9 (100) 3.39% 188.9. Thus, in general terms, if your income did not increase by at least 3.39% between 2004 and 2005, you failed to keep pace with inflation and your purchasing power was reduced. Producer Price Index The U.S. Bureau of Labor Statistics publishes the Producer Price Index (PPI) on a monthly basis to measure the rate of change in nonretail prices. Like the CPI, the PPI is a Laspeyres weighted aggregate index. This index is used as a leading indicator of upcoming changes in the CPI. Table 16.3 shows the PPI between 1996 and 2005.. Stock Market Indexes Every night on the national and local TV news, reporters tell us what happened on the stock market that day by reporting on the Dow Jones Industrial Average (DJIA). The Dow, as this index is commonly referred to, is not the same type of index as the CPI or PPI, in that it is not a percentage of a base year. Rather, the DJIA is the sum of the stock prices for 30 large industrial companies whose stocks trade on the New York Stock Exchange divided by a factor that is adjusted for stock splits. Many analysts use the DJIA, which is computed daily, as a measure of the health of the stock market. Other analysts prefer other indexes, such as the Standard and Poor’s 500 (S&P 500). The S&P 500 includes stock prices for 500 companies and is thought by some to be more representative of the broader market.. TABLE 16.3. |. PPI Index (1996 to 2005). Year. 1996. 1997. 1998. 1999. 2000. 2001. 2002. 2003. 2004. 2005. PPI. 127.7. 127.6. 124.4. 125.5. 132.7. 134.2. 131.1. 138.1. 142.7. 157.4. Base 1984 (Index 100) Source: Bureau of Labor Statistics:

(260) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 721. The NASDAQ is an index made up of stocks on the NASDAQ exchange and is heavily influenced by technology-based companies that are traded on this exchange. Publications such as The Wall Street Journal and Barrons publish all these indexes and others every day for investors to use in their investing decisions.. Using Index Numbers to Deflate a Time Series A common use of index numbers is to convert values measured at different times into more directly comparable values. For instance, if your wages increase, but at a rate less than inflation, you will in fact be earning less in “real terms.” A company experiencing increasing sales at a rate of increase less than inflation is actually not increasing in “real terms.”. BUSINESS APPLICATION. DEFLATING TIME-SERIES VALUES USING INDEX VALUES. WYMAN-GORMAN COMPANY The Wyman-Gorman Company, located in Massachusetts, designs and produces forgings, primarily for internal combustion engines. The company has recently been experiencing some financial difficulty and has discontinued its agricultural and earthmoving divisions. Table 16.4 shows sales in millions of dollars for the company since 1996. Also shown is the PPI (Producer Price Index) for the same years. Finally, sales, adjusted to 1984 dollars, are also shown. Equation 16.5 is used to determine the adjusted time-series values.. Deflation Formula yadj t. yt (100) It. (16.5). where: yadj Deflated time-series value at time t t yt Actual value of the time series at time t It Index (such as CPI or PPI) at time t. For instance, in 1996 sales were $610.3 million. The PPI for that year was 127.7. The sales, adjusted to 1984 dollars, is yadj. 1996. TABLE 16.4. |. . 610.3 (100) $477.9 127.7. Deflated Sales Data—Using Producer Price Index (PPI). Year. Sales ($ millions). PPI (Base = 1984). Sales ($ millions, adjusted to 1984 dollars). 1996. 610.3. 127.7. 477.9. 1997. 473.1. 127.6. 370.8. 1998. 383.5. 124.4. 308.3. 1999. 425.5. 125.5. 339.0. 2000. 384.1. 132.7. 289.4. 2001. 341.1. 134.2. 254.2. 2002. 310.3. 131.1. 236.7. 2003. 271.6. 138.1. 196.7. 2004. 371.6. 142.7. 260.4. 2005. 390.2. 157.4. 247.9.

(261) 722. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. MyStatLab. 16-1: Exercises Skill Development 16-1. What is meant by time-series data? Give an example. 16-2. Explain the difference between time-series data and cross-sectional data. Are these two types of data sets mutually exclusive? What do they have in common? How do they differ? 16-3. What are the differences between quantitative and qualitative forecasting techniques? Under what conditions is it appropriate to use a quantitative technique? 16-4. Provide an example of a business decision that requires (1) a short-term forecast, (2) a medium-term forecast, and (3) a long-term forecast. 16-5. What is meant by the trend component of a time series? How is a linear trend different from a nonlinear trend? 16-6. Must a seasonal component be associated with the seasons (fall, spring, summer, winter) of the year? Provide an example of a seasonal effect that is not associated with the seasons of the year. 16-7. A Greek entrepreneur followed the olive harvests. He noted that olives ripen in September. Each March he would try to determine if the upcoming olive harvest would be especially bountiful. If his analysis indicated it would, he would enter into agreements with the owners of all the olive oil presses in the region. In exchange for a small deposit months ahead of the harvest, he would obtain the right to lease the presses at market prices during the harvest. If he was correct about the harvest and demand for olive oil presses boomed, he could make a great deal of money. Identify the following quantities in the context of this scenario: a. forecasting horizon b. category that applies to the forecasting horizon identified in part a c. forecasting period d. forecasting interval 16-8. Consider the following median selling prices ($thousands) for homes in a community: Year. Price. 1 2 3 4 5 6 7 8 9 10. 320 334 329 344 358 347 383 404 397 411. a. Use year 1 as a base year and construct a simple index number to show how the median selling price has increased.. b. Determine the actual percentage growth in the median selling price between the base year and year 10. c. Determine the actual percentage growth in the median selling price between the base year and year 5. d. Determine the actual percentage growth in the median selling price between year 5 and year 10. 16-9. The following values represent advertising rates paid by a regional catalog retailer that advertises either on radio or in newspapers: Year. Radio Rates ($). Newspaper Rates ($). 1 2 3 4 5 6 7. 300 310 330 346 362 380 496. 400 420 460 520 580 640 660. a. Determine a relative index for each type of advertisement using year 1 as the base year. b. Determine an unweighted aggregate index for the two types of advertisement. c. In year 1 the retailer spent 30% of the advertisement budget on radio advertising. Construct a Laspeyres index for the data. d. Using year 1 as the base, construct a Paasche index for the same data.. Business Applications Problems 16-10 through 16-13 refer to Gallup Construction and Paving, a company whose primary business has been constructing homes in planned communities in the upper Midwest. The company has kept a record of the relative cost of labor and materials in its market areas for the last 11 years. These data are as follows: Year. Hourly Wages ($). Average Material Cost ($). 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009. 30.10 30.50 31.70 32.50 34.00 35.50 35.10 35.05 34.90 33.80 34.20. 66,500 68,900 70,600 70,900 71,200 71,700 72,500 73,700 73,400 74,100 74,000.

(262) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 723. Retail Forward Future Spending IndexTM (December 2005 100) 110. 107.5 104.6 102.8 103.5. 105 100. 99.7. 99.1. 96.8. 97.3. 101.6 95.9. 94.0. 95. 101.3. 99.6. 90 Jun05. Jul05. Aug- Sep05 05. Oct05. Nov- Dec05 05. 16-10. Using 1999 as the base year, construct a separate index for each component in the construction of a house. 16-11. Plot both series of data and comment on the trend you see in both plots. 16-12. Construct a Paasche index for 2004 using the data. Use 1999 as the base year and assume that in 2004 60% of the cost of a townhouse was in materials. 16-13. Construct a Laspeyres index using the data, assuming that in 1999, 40% of the cost of a townhouse was labor. 16-14. Retail Forward, Inc., is a global management consulting and market research firm specializing in retail intelligence and strategies. One of its press releases (June Consumer Outlook: Spending Plans Show Resilience, June 1, 2006) divulged the result of the Retail Forward ShopperScape™ survey conducted each month from a sample of 4,000 U.S. primary household shoppers. A measure of consumer spending is represented by the figure at the top of the page: a. Describe the type of index used by Retail Forward to explore consumer spending. b. Determine the actual percentage change in the Future Spending Index between December 2005 and June 2006. c. Determine the actual percentage change in the Future Spending Index between June 2005 and June 2006.. Computer Database Exercises 16-15. The Energy Information Administration (EIA), created by Congress in 1977, is a statistical agency of the U.S. Department of Energy. It provides data, forecasts, and analyses to promote sound policymaking and public understanding regarding energy and its interaction with the economy and the environment. The price of the sources of energy is becoming more and more important as our natural resources are consumed. The file entitled Prices contains data for the period. Jan06. Feb- Mar06 06. Apr- May- Jun06 06 06. 1993–2008 concerning the price of gasoline ($/gal.), natural gas ($/cu. ft.), and electricity (cents/ kilowatt hr.). a. Using 1993 as the base, calculate an aggregate energy price index for these three energy costs. b. Determine the actual percentage change in the aggregate energy prices between 1993 and 2008. c. Determine the actual percentage change in the aggregate energy prices between 1998 and 2008. 16-16. The federal funds rate is the interest rate charged by banks when banks borrow “overnight” from each other. The funds rate fluctuates according to supply and demand and is not under the direct control of the Federal Reserve Board, but is strongly influenced by the Fed’s actions. The file entitled The Fed contains the federal funds rates for the period 1955–2008. a. Construct a time-series plot for the federal funds rate for the period 1955–2008. b. Describe the time-series components that are present in the data set. c. Indicate the recurrence periods for any seasonal or cyclical components. 16-17. The Census Bureau of the Department of Commerce released the U.S. retail e-commerce sales for the period Fourth Quarter 1999–Fourth Quarter 2008. The file entitled E-Commerce contains that data. a. Using the fourth quarter of 1999 as the base, calculate a Laspeyres Index for the retail sales for the period of Fourth Quarter 1999–Fourth Quarter 2008. b. Determine the actual percentage change in the retail sales for the period Fourth Quarter 1999–First Quarter 2004. c. Determine the actual percentage change in the retail sales for the period First Quarter 2004–First Quarter 2006. END EXERCISES 16-1.

(263) 724. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 16.2 Trend-Based Forecasting Techniques As we discussed in Section 16.1, some time series exhibit an increasing or decreasing trend. Further, the trend may be linear or nonlinear. A plot of the data will be very helpful in identifying which, if any, of these trends exist.. Developing a Trend-Based Forecasting Model In this section, we introduce trend-based forecasting techniques. As the name implies, these techniques are used to identify the presence of a trend and to model that trend. Once the trend model has been defined, it is used to provide forecasts for future time periods. Chapter Outcomes 3 and 4.. Excel and Minitab. tutorials. Excel and Minitab Tutorial. BUSINESS APPLICATION. LINEAR TREND FORECASTING. THE TAFT ICE CREAM COMPANY The Taft Ice Cream Company is a family-operated company selling gourmet ice cream to resort areas, primarily on the North Carolina coast. Figure 16.5 displays the annual sales data for the 10-year period 1997–2006 and shows the time-series plot illustrating that sales have trended up in the 10-year period. These data are in a file called Taft. Taft’s owners are considering expanding their ice cream manufacturing facilities. As part of the bank’s financing requirements, the managers are asked to supply a forecast of future sales. Recall from our earlier discussions that the forecasting process has three steps: (1) model specification, (2) model fitting, and (3) model diagnosis. Step 1 Model Specification The time-series plot in Figure 16.5 indicates that sales have exhibited a linear growth pattern. A possible forecasting tool is a linear trend (straight-line) model. Step 2 Model-Fitting Because we have specified a linear trend model, the process of fitting can be accomplished using least squares regression analysis of a form described by Equation 16.6.. FIGURE 16.5. |. Excel 2007 Output Showing Taft Ice Cream Sales Trend Line. Excel 2007 Instructions: 1. Open file: Taft.xls. 2. Select data in the Sales data column. 3. Click on Insert Line Chart. 4. Click on Select Data. 5. Under Horizontal (categories) Axis Labels, select data in Year column. 6. Click on Layout Chart Title and enter desired title. 7. Click on Layout Axis Titles and enter horizontal and vertical axis titles.. Minitab Instructions (for similar results): 5. Click Time/Scale. 1. Open file: Taft.MTW. 6. Under Time Scale select Calendar and 2. Choose Graph > Time Series Plot. Year. 3. Select Simple. 4. Under Series, enter time series’ column. 7. Under Start Values, insert the starting year. 8. Click OK. OK..

(264) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 725. Linear Trend Model yt 0 lt t. (16.6). where: yt Value of the trend at time t 0 y intercept of the trend line 1 Slope of the trend line t Time period (t 1, 2, . . .) t Model error at time t. We let the first period in the time series be t 1, the second period be t 2, and so forth. The values for time form the independent variable, with sales being the dependent variable. Referring to Chapter 14, the least squares regression equations for the slope and intercept are estimated by Equations 16.7 and 16.8. Here the sums are taken over the values of t (t 1, 2 . . .).. Least Squares Equations Estimates ∑ t ∑ yt n b1 2 ∑ t) ( ∑ t2 − n ∑ tyt −. b0 . ∑ yt ∑t − b1 n n. (16.7). (16.8). where: n Number of periods in the time series t Time period (independent variable) yt Dependent variable at time t. The linear regression procedures in either Excel or Minitab can be used to compute the least squares trend model. Figure 16.6 shows the Excel output for the Taft Ice Cream Company example. The least squares trend model for the Taft Company is yˆt b0 b1t yˆt 277, 333.33 14, 575.76(t ) For a forecast, we use Ft as the forecast value or predicted value at time period t. Thus, Ft 277,333.33 14,575.76(t) Step 3 Model Diagnosis The linear trend regression output in Figure 16.6 offers some conclusions about the potential capabilities of our model. The R-squared 0.9123 shows that for these 10 years of data, the linear trend model explains more than 91% of the variation in sales. The p-value for the regression slope coefficient to four decimal places is 0.0000. This means that time (t) can be used to explain a significant portion of the variation in sales. Figure 16.7 shows the plot of the trend line through the data. You can see the trend model fits the historical data quite closely. Although these results are a good sign, the model diagnosis step requires further analysis..

(265) 726. CHAPTER 16. FIGURE 16.6. |. Analyzing and Forecasting Time-Series Data. |. Excel 2007 Output for Taft Ice Cream Trend Model. Excel 2007 Instructions: 1. Open file: Taft.xls. 2. Select Data > Data Analysis. 3. Click on Regression. 4. Enter range for the y variable (Sales). 5. Enter range for the x variable (t 1, 2 . . .). 6. Specify output location. 7. Click on Labels. Linear Trend Equation: Sales = 277,333.33 + 14,575.76(t). Minitab Instructions (for similar results): 1. Open file: Taft.MTW. 4. In Predictors, enter the time variable 2. Choose Stat Regression Regression. column, t. 3. In Response, enter the time series column, 5. Click OK. Sales.. FIGURE 16.7. |. Excel 2007 Output for Taft Ice Cream Trend Line Excel 2007 Instructions: 1. Open file: Taft.xls. 2. Select the Sales data. 3. Click on Insert Line Chart. 4. Click on Select Data. 5. Under Horizontal (categories) Axis Labels, select data in Year column. 6. Click on Layout Chart Title and enter desired title. 7. Click on Layout Axis Titles and enter horizontal and vertical axes titles. 8. Select the data. 9. Right-click and select Add Trendline Linear. 10. To set color, go to Trend Line Options (see step 9).. Linear trend line Ft = 277,333.33 + 14,575.76(t). Minitab Instructions (for similar results): 1. Open file: Taft.MTW. 3. In Variable, enter the time series column. 2. Choose Stat Time Series 4. Under Model Type choose Linear. Trend Analysis. 5. Click OK..

(266) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 727. Comparing the Forecast Values to the Actual Data The slope of the trend line indicates the Taft Ice Cream Company has experienced an average increase in sales of $14,575.76 per year over the 10-year period. The linear trend model’s fitted sales values for periods t 1 through t 10 can be found by substituting for t in the following forecast equation: Ft 277,333.33 14,575.76(t) For example, for t 1, we get Ft 277,333.33 14,575.76(1) $291,909.09 Note that the actual sales figure, y1, for period 1 was $300,000. The difference between the actual sales in time t and the forecast values in time t, found using the trend model, is called the forecast error or the residual. Figure 16.8 shows the forecasts for periods 1 through 10 and the forecast errors at each period. Computing the forecast error by comparing the trend-line values with actual past data is an important part of the model diagnosis step. The errors measure how closely the model fits the actual data at each point. A perfect fit would lead to residuals of 0 each time. We would like to see small residuals and an overall good fit. Two commonly used measures of fit are mean squared residual, or mean squared error (MSE), and mean absolute deviation (MAD). These measures are computed using Equations 16.9 and 16.10, respectively. MAD measures the average magnitude of the forecast errors. MSE is a measure of the variability in the forecast errors. The forecast error is the observed value, yt , minus the predicted value, Ft . Mean Squared Error MSE =. ∑( yt Ft )2 n. Mean Absolute Deviation (16.9). MAD . ∑ | yt Ft | n. (16.10). where: yt Actual value at time t Ft Predicted value at time t n Number of time periods FIGURE 16.8. |. Excel 2007 Residual Output for Taft Ice Cream. Residual = Forecast error. Excel 2007 Instructions: 1. Follow Figure 16.6 Excel Instructions. 2. In Regression procedure, click on Residuals..

(267) 728. CHAPTER 16. FIGURE 16.9. |. Analyzing and Forecasting Time-Series Data. |. Excel 2007 MSE and MAD Computations for Taft Ice Cream. Excel 2007 Instructions: 1. Follow Figure 16.6 Excel Instructions. 2. In Regression procedure, click on Residuals. 3. Create a new column of squared residuals. 4. Create a column of absolute values of the residuals using the ABS function in Excel. 5. Use Equations 16.9 and 16.10 to calculate MSE and MAD.. Excel equations: Squared residual = C25^2 Absolute value = ABS(C25). MSE . (ytFt)2 n. 168,515,151.52. MAD . ytFt n. 11,042.42. Figure 16.9 shows the MSE and MAD calculations using Excel for the Taft Ice Cream example. The MAD value of $11,042 indicates the linear trend model has an average absolute error of $11,042 per period. The MSE (in squared units) equals 168,515,151.52. The square root of the MSE (often referred to as RMSE, root mean square error) is $12,981.34, and although it is not equal to the MAD value, it does provide similar information about the relationship between the forecast values and the actual values of the time series.1 These error measures are particularly helpful when comparing two or more forecasting techniques. We can compute the MSE and/or the MAD for each forecasting technique. The forecasting technique that gives the smallest MSE or MAD is generally considered to provide the best fit.. Autocorrelation Correlation of the error terms (residuals) occurs when the residuals at points in time are related.. Autocorrelation In addition to examining the fit of the forecasts to the actual time series, the model-diagnosis step also should examine how a model meets the assumptions of the regression model. One regression assumption is that the error terms are uncorrelated, or independent. When using regression with time-series data, the assumption of independence could be violated. That is, the error terms may be correlated over time. We call this serial correlation, or autocorrelation. When dealing with a time-series variable, the value of y at time period t is commonly related to the value of y at previous time periods. If a relationship between yt and yt -1 exists, we conclude that first-order autocorrelation exists. If yt is related to yt -2, second-order autocorrelation exists, and so forth. If the time-series values are autocorrelated, the assumption that the error terms are independent is violated. The autocorrelation can be positive or negative. For instance, when the values are firstorder positively autocorrelated, we expect a positive residual to be followed by a positive residual in the next period, and we expect a negative residual to be followed by another negative residual. With negative first-order autocorrelation, we expect a positive residual to be followed by a negative residual, followed by a positive residual, and so on. The presence of autocorrelation can have adverse consequences on tests of statistical significance in a regression model. Thus, you need to be able to detect the presence of autocorrelation and take action to remove the problem. The Durbin-Watson statistic, which is shown in Equation 16.11, is used to test whether residuals are autocorrelated. 1Technically this is the square root of the average squared distance between the forecasts and the observed data values. Algebraically, of course, this is not the same as the average forecast error, but it is comparable..

(268) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 729. Durbin-Watson Statistic n. ∑ (et − et −1 )2. d t 2. (16.11). n. ∑. et2. t 1. where: d Durbin-Watson test statistic et ( yt − yˆt ) Residual at time t n Number of time periods in the time series Figure 16.10 shows the Minitab output providing the Durbin-Watson statistic for the Taft Ice Cream data, as follows: n. ∑ (et − et −1 )2. d t 2. n. ∑. 2.65. et2. t 1. Examining Equation 16.11, we see that if successive values of the residual are close in value, the Durbin-Watson d statistic will be small. This situation would describe one in which residuals are positively correlated. The Durbin-Watson statistic can have a value ranging from 0 to 4. A value of 2 indicates no autocorrelation. However, like any other statistics computed from a sample, the DurbinWatson d is subject to sampling error. We may wish to test formally to determine whether positive autocorrelation exists. H0: r 0 HA: r 0 FIGURE 16.10. |. Minitab Output—DurbinWatson Statistic: Taft Ice Cream Company Example. Minitab Instructions: 1. Open file: Taft.MTW. 2. Choose Stat Regression Regression. 3. In Response, enter the time series column, Sales. 4. In Predictors, enter the time variable column, t. 5. Select Options. 6. Under Display, select Durbin-Watson statistic. 8. Click OK. OK.. Excel 2007 Instructions (for similar results): 1. Open file: Taft.xls. 2. Click on Add-Ins. 3. Select PHStat. 4. Select Regression Simple Linear Regression. 5. Define y variable. 6. Define x variable. 7. Check box for DurbinWatson statistic. 8. Click OK..

(269) 730. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. If the d statistic is too small, we will reject the null hypothesis and conclude that positive autocorrelation exists. If the d statistic is too large, we will not reject and will not be able to conclude that positive autocorrelation exists. Appendix O contains a table of one-tailed Durbin-Watson critical values for a 0.05 and a 0.01 levels. (Note: The critical values in Appendix O are for one-tailed tests with a 0.05 or 0.01. For a two-tailed test, the alpha is doubled.) The Durbin-Watson table provides two critical values: dL and dU. In this test for positive autocorrelation, the decision rule is If d dL, reject H0 and conclude that positive autocorrelation exists. If d dU, do not reject H0 and conclude that no positive autocorrelation exists. If dL d dU, the test is inconclusive. The Durbin-Watson test is not reliable for sample sizes smaller than 15. Therefore, for the Taft Ice Cream Company application, we are unable to conduct the hypothesis test for autocorrelation. However, Example 16-5 shows a Durbin-Watson test carried out.. EXAMPLE 16-5. TESTING FOR AUTOCORRELATION. Banion Automotive, Inc. Banion Automotive, Inc., has supplied parts to General Motors since the company was founded in 1992. During this time, revenues from the General Motors account have grown steadily. Figure 16.11 displays the data in a time-series plot. The data are in a file called Banion Automotive. Recently the managers of the company developed a linear trend regression model they hope to use to forecast revenue for the next two years to determine whether they can support adding another production line to their Ohio factory. They are now interested in determining whether the linear model is subject to positive autocorrelation. To test for this, the following steps can be used:. FPO. Step 1 Specify the model. Based on a study of the line chart, the forecasting model is to be a simple linear trend regression model, with revenue as the dependent variable and time (t) as the independent variable. Step 2 Fit the model. Because we have specified a linear trend model, the process of fitting can be accomplished using least squares regression analysis and Excel or Minitab to. |. Revenue Time-Series Plot. Time-Series Plot of Banion Automotive Revenue Data. $80.00. $60.00 $50.00 $40.00 $30.00 $20.00 $10.00. Year. 09. 08. 20. 07. 20. 06. 20. 05. 20. 04. 20. 03. 20. 02. 20. 01. 20. 00. 20. 99. 20. 98. 19. 97. 19. 96. 19. 95. 19. 94. 19. 93. 19. 92. $ 19. Dollars (in Millions). $70.00. 19. FIGURE 16.11.

(270) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 731. estimate the slope and intercept for the model. Fitting the 18 data points with a least squares line, we find the following: Ft 5.0175 3.3014(t) Step 3 Diagnose the model. The following values were also found: R2 0.935 F-statistic 230.756 Standard error 4.78 The large F-statistic indicates that the model explains a significant amount of variation in revenue over time. However, looking at a plot of the trend line shown in Figure 16.12, we see a pattern of actual revenue values first above, and then below, the trend line. This pattern indicates possible autocorrelation among the error terms. We will test for autocorrelation by calculating the Durbin-Watson d statistic. Both Minitab and the PHStat add-ins for Excel have the option to generate the Durbin-Watson statistic. The output is shown in Figure 16.13. Figure 16.13 shows the Durbin-Watson d statistic as d 0.661 The null and alternative hypotheses for testing for positive autocorrelation are H0: 0 HA: 0 We next go to the Durbin-Watson table (Appendix O) for a 0.05, sample size 18, and number of independent variables, p 1. The values from the table for dL and dU are dL 1.16. and dU 1.39. The decision rule for testing whether we have positive autocorrelation is If d 1.16, reject H0 and conclude that positive autocorrelation exists. If d 1.39, conclude that no positive autocorrelation exists. If 1.16 d 1.39, the test is inconclusive.. |. Revenue Time-Series Plot. Banion Automotive Trend Line $80.00. $60.00 $50.00 $40.00 $30.00 $20.00 $10.00. Year. 09. 08. 20. 07. 20. 06. 20. 05. 20. 04. 20. 03. 20. 02. 20. 01. 20. 00. 20. 99. 20. 98. 97. 19. 96. 19. 95. 19. 94. 19. 93. 19. 19. 92. $ 19. Dollars (in Millions). $70.00. 19. FIGURE 16.12.

(271) 732. CHAPTER 16. FIGURE 16.13. |. Analyzing and Forecasting Time-Series Data. |. Excel 2007 (PHStat) Output— Durbin-Watson Statistic for Banion Automotive. Excel 2007 Instructions: 1. Open file: Banion Automotive.xls. 2. Select Add-Ins. 3. Select PHStat. 4. Select Regression Simple Linear Regression.. 5. Define y variable data range. 6. Specify x variable data range (time t values). 7. Click on Durbin-Watson Statistic. 8. Click OK.. Minitab Instructions (for similar results): 5. 1. Open file: Banion Automotive.MTW. 2. Choose Stat Regression Regression. 6. 3. In Response, enter the time series column, 7. Revenue. 4. In Predictors, enter the time variable column, t.. Select Options. Under Display, select Durbin-Watson statistic. Click OK. OK.. Because d 0.661 dL 1.16, we must reject the null hypothesis and conclude that significant positive autocorrelation exists in the regression model. This means that the assumption of uncorrelated error terms has been violated in this case. Thus, the linear trend model is not the appropriate model to provide the annual revenue forecasts for the next two years. There are several techniques for dealing with the problem of autocorrelation. Some of these are beyond the scope of this text. (Refer to books by Nelson and Wonnacott.) However, one option is to attempt to fit a nonlinear trend to the data, which is discussed starting on page 734. >>. END EXAMPLE. TRY PROBLEM 16-18 (pg. 747). True Forecasts Although a decision maker is interested in how well a forecasting technique can fit historical data, the real test comes with how well it forecasts future values. Recall in the Taft example, we had 10 years of historical data. If we wish to forecast ice cream sales for year 11 using the linear trend model, we substitute t 11 into the forecast equation to produce a forecast as follows: F11 277,333.33 14,575.76(11) $437,666.69 This method of forecasting is called trend projection. To determine how well our trend model actually forecasts, we would have to wait until the actual sales amount for period 11 is known..

(272) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 733. As we just indicated, a model’s true forecasting ability is determined by how well it forecasts future values, not by how well it fits historical values. However, having to wait until after the forecast period to know how effective a forecast is doesn’t help us assess a model’s effectiveness ahead of time. This problem can be partially overcome by using split samples, which involves dividing a time series into two groups. You put the first (nl) periods of historical data in the first group. These (nl) periods will be used to develop the forecasting model. The second group contains the remaining (n2) periods of historical data, which will be used to test the model’s forecasting ability. These data are called the holdout data. Usually, three to five periods are held out, depending on the total number of periods in the time series. In the Taft Ice Cream business application, we have only 10 years of historical data, so we will hold out the last three periods and use the first seven periods to develop the linear trend model. The computations are performed as before, using Excel or Minitab or Equations 16.7 and 16.8. Because we are using a different data set to develop the linear equation, we get a slightly different trend line than when all 10 periods were used. The new trend line is Ft 277,142.85 14,642.85(t) This model is now used to provide forecasts for periods 8 through 10 by using trend projection. These forecasts are Year. Actual. Forecast. Error. t. yt. Ft. (yt - Ft). 8. 400,000. 394,285.65. 5,714.35. 9. 395,000. 408,928.50. -13,928.50. 10. 430,000. 423,571.35. 6,428.65. Then we can compute the MSE and the MAD values for periods 8 through 10. MSE . [(5, 714.35 )2 (−13, 928.50 )2 (6, 428.65 )2 ] 89, 328, 149.67 3. and MAD . (|5, 714.35| |− 13, 928.50| |6, 428.65|) 8, 690.50 3. These values could be compared with those produced using other forecasting techniques or evaluated against the forecaster’s own standards. Smaller values are preferred. Other factors should also be considered. For instance, in some cases, the forecast values might tend to be higher (or lower) than the actual values. This may imply the linear trend model isn’t the best model to use. Forecasting models that tend to over- or underforecast are said to contain forecast bias. Equation 16.12 is used as an estimator of the bias.. Forecast Bias Forecast bias . ∑ ( yt − Ft ). (16.12). n. The forecast bias can be either positive or negative. A positive value indicates a tendency to underforecast. A negative value indicates a tendency to overforecast. The estimated bias taken from the forecasts for periods 8 through 10 in our example is [(5, 714.35) ( −13, 928.50) (6, 428.65)] −595.17 3 This means that, on average, the model overforecasts sales by $595.17. Forecast bias .

(273) 734. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. Suppose that on the basis of our bias estimate we judge that the linear trend model does an acceptable job in forecasting. Then all available data (periods 1 through 10) would be used to develop a linear trend model (see Figure 16.6), and a trend projection would be used to forecast for future time periods by substituting appropriate values for t into the trend model. Ft 277,333.33 14,575.76(t) However, if the linear model is judged to be unacceptable, the forecaster will need to try a different technique. For the purpose of the bank loan application, the Taft Ice Cream Company needs to forecast sales for the next three years (periods 11 through 13). Assuming the linear trend model is acceptable, these forecasts are F11 277,333.33 14,575.76(11) $437,666.69 F12 277,333.33 14,575.76(12) $452,242.45 F13 277,333.33 14,575.76(13) $466,818.21. Nonlinear Trend Forecasting As we indicated earlier, you may encounter a time series that exhibits a nonlinear trend. Figure 16.2 showed an example of a nonlinear trend. When the historical data show a nonlinear trend, you should consider using a nonlinear trend forecasting model. A common method for dealing with nonlinear trends is to use an extension of the linear trend method. This extension calls for making a data transformation before applying the least squares regression analysis. BUSINESS APPLICATION Excel and Minitab. tutorials. Excel and Minitab Tutorial. FIGURE 16.14. FORECASTING NONLINEAR TRENDS. HARRISON EQUIPMENT COMPANY Consider Harrison Equipment Company, which leases large construction equipment to contractors in the Southwest. The lease arrangements call for Harrison to perform all repairs and maintenance on this equipment. Figure 16.14 shows a line chart for the repair costs for a crawler tractor leased to a contractor in Phoenix for the past 20 quarters. The data are contained in the file Harrison.. |. Excel 2007 Time-Series Plot for Harrison Equipment Repair Costs. Excel 2007 Instructions: 1. Open file: Harrison.xls. 2. Select data in the Repair Costs data column. 3. Click on Insert Line Chart. 4. Click on Select Data. 5. Under Horizontal (categories) Axis Labels, select data in year and quarter columns. 6. Click on Layout Chart Title and enter desired title. 7. Click on Layout Axis Titles and enter horizontal and vertical axes titles.. Nonlinear trend. Minitab Instructions (for similar results): 1. Open file: Harrison.MTW. 6. 2. Select Graph Time Series Plot. 3. Select Simple. 7. 4. Under Series, enter time series column. 5. Click Time/Scale. 8.. Under Time Scale select Calendar and Quarter Year. Under Start Values, insert the starting quarter and year. Click OK. OK..

(274) CHAPTER 16 Chapter Outcome 4.. |. Analyzing and Forecasting Time-Series Data. 735. Model Specification Harrison Equipment is interested in forecasting future repair costs for the crawler tractor. Recall that the first step in forecasting is model specification. Even though the plot in Figure 16.14 indicates a sharp upward nonlinear trend, the forecaster may start by specifying a linear trend model. Model Fitting As a part of the model-fitting step, the forecaster could use Excel’s or Minitab’s regression procedure to obtain the linear forecasting model shown in Figure 16.15. As shown the linear trend model is Ft –1,022.7 570.9(t) Model Diagnosis The fit is pretty good with an R-squared 0.8214 and a standard error of 1,618.5. But we need to look closer. Figure 16.15 shows a plot of the trend line compared with the actual data. A close inspection indicates the linear trend model may not be best for this case. Notice that the linear model underforecasts, then overforecasts, then underforecasts again. From this we might suspect positive autocorrelation. We can establish the following null and alternative hypotheses: H0: 0 HA: 0 Equation 16.11 could be used to manually compute the Durbin-Watson d statistic, or more likely, we would use either PHStat or Minitab. The calculated Durbin-Watson is d 0.505. FIGURE 16.15. |. Excel 2007 (PHStat) Output for the Harrison Equipment Company Linear Trend Model. Excel 2007 Instructions: 1. Open file: Harrison.xls. 2. Select Add-Ins. 3. Select PHStat. 4. Select Regression > Simple Linear Regression. 5. Specify y variable data range. 6. Specify x variable data range (time = t values). 7. Check box for DurbinWatson Statistic. 8. Copy Figure 16.14 line chart onto output—add linear trend line. 9. Click OK. Minitab Instructions (for similar results): 1. Open file: Harrison.MTW. 4. In Variable, enter time-series column. 2. Follow Minitab instructions in Figure 16.13. 5. Under Model Type, Choose Linear. 3. Choose Stat Time Series Trend 6. Click OK. Analysis..

(275) 736. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. The dL critical value from the Durbin-Watson table in Appendix O for 0.05 and a sample size of n 20 and p 1 independent variable is 1.20. Because d 0.505 dL 1.20, we reject the null hypothesis. We conclude that the error terms are significantly positively autocorrelated. The model-building process needs to be repeated. Model Specification After examining Figure 16.15 and determining the results of the test for positive autocorrelation, a nonlinear trend will likely provide a better fit for these data. To account for the nonlinear growth trend, which starts out slowly and then builds rapidly, the forecaster might consider transforming the time variable by squaring t to form a model of the form y 0 1t 2t2 This transformation is suggested because the growth in costs appears to be increasing at an increasing rate. Other nonlinear trends may require different types of transformations, such as taking a square root or natural log. Each situation must be analyzed separately. (See the reference by Kutner et al. for further discussion of transformations.) Model Fitting Figure 16.16 shows the Excel regression results, and Figure 16.17 shows the revised timeseries plot using the polynomial transformation. The resulting nonlinear trend regression model is Ft 2,318.7 - 340.4(t) 43.4(t 2) FIGURE 16.16. |. Excel 2007 (PHStat) Transformed Regression Model for Harrison Equipment. Excel 2007 (PHStat) Instructions: 1. Open data file: Harrison.xls. 2. Select Add-Ins. 3. Select PHStat. 4. Select Regression Multiple Regression. 6. Specify y variable data range. 7. Specify x variable data range (time = t values and t 2 = squared values). 8. Check on Residuals table (output not shown here). 9. Check box for DurbinWatson Statistic. 10. Click OK.. Minitab Instructions (for similar results): 6. 7. 1. Open File: Harrison.MTW. 2. Choose Calc > Calculator. 8. 3. In Store result in variable, enter 9. destination column Qrt square. 4. With cursor in Expressions, enter Quarter column then **2. 10. 5. Click OK.. Choose Stat Regression Regression. In Response, enter Repair Costs. In Predictors, enter Qrt square. Click Storage. Under Diagnostic Measures, select Residuals. Under Characteristics of Estimated Equation, select Fits. Click OK. OK..

(276) CHAPTER 16. FIGURE 16.17. |. Analyzing and Forecasting Time-Series Data. 737. |. Excel 2007 Transformed Model for Harrison Equipment Company. Excel 2007 (PHStat) Instructions: 1. Open data file: Harrison.xls. 2. Follow Figure 16.14 instruction to generate line chart. 3. Select Add-Ins. 4. Select PHStat. 5. Select Regression > Multiple Regression. 6. Specify y variable data range. 7. Specify x variable data range (time = t values and t-squared values). 8. Click on Residuals table (output not shown here). 9. Paste Predicted Values on the line chart.. Fitted values. Model Diagnosis Visually, the transformed model now looks more appropriate. The fit is much better as the Rsquared value is increased to 0.9466 and the standard error is reduced to 910.35. The null and alternative hypotheses for testing whether positive autocorrelation exists are H0: 0 HA: 0 As seen in Figure 16.16, the calculated Durbin-Watson statistic is d 1.63 The dL and dU critical values from the Durbin-Watson table in Appendix O for a 0.05 and a sample size of n 20 and p 2 independent variables are 1.10 and 1.54, respectively. Because d 1.63 1.54, the Durbin-Watson test indicates that there is no positive autocorrelation. Given this result and the improvements to R-squared and the standard error of the estimate, the nonlinear model is judged superior to the original linear model. Forecasts for periods 21 and 22, using this latest model, are obtained using the trend projection method. For period t 21: F21 2,318.7 - 340.4(21) 43.4(212) $14,310 For period t 22: F22 2,318.7 - 340.4(22) 43.4(222) $15,836 Using transformations often provides a very effective way of improving the fit of a time series. However, a forecaster should be careful not to get caught up in an exercise of “curvefitting.” One suggestion is that only explainable terms—terms that can be justified—be used.

(277) 738. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. for transforming data. For instance, in our example, we might well expect repair costs to increase at a faster rate as a tractor gets older and begins wearing out. Thus, the t2 transformation seems to make sense. Some Words of Caution The trend projection method relies on the future behaving in a manner similar to the past. In the previous example, if equipment repair costs continue to follow the pattern displayed over the past 20 quarters, these forecasts may prove acceptable. However, if the future pattern changes, there is no reason to believe these forecasts will be close to actual costs.. Adjusting for Seasonality In Section 16.1, we discussed seasonality in a time series. The seasonal component represents those changes (highs and lows) in the time series that occur at approximately the same time every period. If the forecasting model you are using does not already explicitly account for seasonality, you should adjust your forecast to take into account the seasonal component. The linear and nonlinear trend models discussed thus far do not automatically incorporate the seasonal component. Forecasts using these models should be adjusted as illustrated in the following application.. BUSINESS APPLICATION Excel and Minitab. tutorials. Excel and Minitab Tutorial. FIGURE 16.18. FORECASTING WITH SEASONAL DATA. BIG MOUNTAIN SKI RESORT Most businesses in the tourist industry know that sales are seasonal. For example, at the Big Mountain Ski Resort, business peaks at two times during the year: winter for skiing and summer for golf and tennis. These peaks can be identified in a time series if the sales data are measured on at least a quarterly basis. Figure 16.18 shows the quarterly sales data for the past four years in spreadsheet form. The line chart for these data is also shown. The data are in the file Big Mountain. The time-series plot clearly shows that the summer and winter quarters are the busy times. There has also been a slightly increasing linear trend in sales over the four years.. |. Excel 2007 Big Mountain Resort Quarterly Sales Data. Excel 2007 Instructions: 1. Open file: Big Mountain.xls. 2. Select data in the Sales data column. 3. Click on Insert Line Chart. 4. Click on Select Data. 5. Under Horizontal (categories) Axis. Labels, select data in Year and Season columns. 6. Click on Layout Chart Title and enter desired title. 7. Click on Layout Axis Titles and enter horizontal and vertical axes titles..

(278) CHAPTER 16. Seasonal Index A number used to quantify the effect of seasonality in time-series data.. |. Analyzing and Forecasting Time-Series Data. 739. Big Mountain Resort wants to forecast sales for each quarter of the coming year, and it hopes to use a linear trend model. When the historical data show a trend and seasonality, the trend-based forecasting model needs to be adjusted to incorporate the seasonality. One method for doing this involves computing seasonal indexes. For instance, when we have quarterly data, we can develop four seasonal indexes, one each for winter, spring, summer, and fall. A seasonal index below 1.00 indicates that the quarter has a value that is typically below the average value for the year. On the other hand, an index greater than 1.00 indicates that the quarter’s value is typically higher than the yearly average. Computing Seasonal Indexes Although there are several methods for computing the seasonal indexes, the procedure introduced here is the ratio-to-moving-average method. This method assumes that the actual time-series data can be represented as a product of the four time-series components—trend, seasonal, cyclical, and random—which produces the multiplicative model shown in Equation 16.13. Multiplicative Time-Series Model yt Tt St Ct It. (16.13). where: yt Value of the time series at time t Tt Trend value at time t St Seasonal value at time t Ct Cyclical value at time t It Irregular or random value at time t. Moving Average The successive averages of n consecutive values in a time series.. FIGURE 16.19. The ratio-to-moving-average method begins by removing the seasonal and irregular components, St and It, from the data, leaving the combined trend and cyclical components, Tt and Ct. This is done by first computing successive four-period moving averages for the time series. A moving average is the average of n consecutive values of a time series. Using the Big Mountain sales data in Figure 16.19, we find that the moving average using the first four quarters is 205 96 194 102 149.25 4. |. Excel 2007 Seasonal Index— Step 1: Moving Average Values for Big Mountain Resort Each moving average corresponds to the midpoint between its cell and the following cell.. Excel 2007 Instructions: 1. Open file: Big Mountain. xls. 2. Create a new column of 4-period moving averages using Excel’s AVERAGE function. The first moving average is placed in cell E3 and the equation is =AVERAGE(D2:D5). 3. Copy the equation down to cell E15..

(279) 740. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. This moving average is associated with the middle time period of the data values in the moving average. The middle period of the first four quarters is 2.5 (between quarter 2 and quarter 3). The second moving average is found by dropping the value from period 1 and adding the value from period 5, as follows: 96 194 102 230 155.50 4 This moving average is associated with time period 3.5, the middle period between quarters 3 and 4. Figure 16.19 shows the moving averages for the Big Mountain sales data in Excel spreadsheet form.2 We selected 4 data values for the moving average because we have quarterly data; with monthly data, 12 data values would have been used. The next step is to compute the centered moving averages by averaging each successive pair of moving averages. Centering the moving averages is necessary so that the resulting moving average will be associated with one of the data set’s original time periods. In this example, Big Mountain is interested in quarterly sales data—that is, time periods 1, 2, 3, etc. Therefore, the moving averages we have representing time periods 2.5, 3.5, and so forth are not of interest to Big Mountain. Centering these averaged time series values, however, produces moving averages for the (quarterly) time periods of interest. For example, the first two moving averages are averaged to produce the first centered moving average. We get 149.25 155.5 152.38 2 This centered moving average is associated with quarter 3. The centered moving averages are shown in Figure 16.20.3 These values estimate the Tt Ct value. FIGURE 16.20. |. Excel 2007 Seasonal Index— Step 2: Big Mountain Resort Centered Moving Averages. Excel 2007 Instructions: 1. Open File: Big Mountain.xls. 2. Follow instructions in Figure 16.19. 3. Create a new column of centered moving averages using Excel’s AVERAGE function. The first moving average is placed in cell F4 and the equation is =AVERAGE (E3:E4). 4. Copy the equation down to cell F15.. 2The Excel process illustrated by Figures 16.19 through 16.23 is also accomplished using Minitab’s Time Series >. Decomposition command, which is illustrated by Figure 16.25 on page 744. 3Excel’s tabular format does not allow the uncentered moving averages to be displayed with their “interquarter” time periods. That is, 149.25 is associated with time period 2.5, 155.50 with time period 3.5, and so forth..

(280) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 741. If the number of data values used for a moving average is odd, the moving average will be associated with the time period of the middle observation. In such cases, we would not have to center the moving average, as we did in Figure 16.20, because the moving averages would already be associated with one of the time periods from the original time series. Next, we estimate the St It value. Dividing the actual sales value for each quarter by the corresponding centered moving average, as in Equation 16.14, does this. As an example, we examine the third time period: summer of 2006. The sales value of 194 is divided by the centered moving average of 152.38, to produce 1.273. This value is called the ratio-to-movingaverage. Figure 16.21 shows these values for the Big Mountain data. Ratio-to-Moving-Average St I t . yt Tt Ct. (16.14). The final step in determining the seasonal indexes is to compute the mean ratio-to-movingaverage value for each season. Each quarter’s ratio-to-moving-average is averaged over the years to produce the seasonal index for that quarter. Figure 16.22 shows the seasonal indexes. The seasonal index for the winter quarter is 1.441. This indicates that sales for Big Mountain during the winter are 44.1% above the average for the year. Also, sales in the spring quarter are only 60.8% of the average for the year. One important point about the seasonal indexes is that the sum of the indexes is equal to the number of seasonal indexes. That is, the average of all seasonal indexes equals 1.0. In the Big Mountain Resort example, we find Summer 1.323. Fall . 0.626. Winter . 1.441. Spring . 0.608. . 3.998 (difference from 4 due to rounding). Likewise, in an example with monthly data instead of quarterly data, we would generate 12 seasonal indexes, one for each month. The sum of these indexes should be 12. The Need to Normalize the Indexes If the sum of the seasonal indexes does not equal the number of time periods in the recurrence period of the time series, an adjustment is necessary. In the Big Mountain Resort example, the sum of the four seasonal indexes may have. FIGURE 16.21. |. Excel 2007 Seasonal Index— Step 3: Big Mountain Resort Ratio-to-Moving-Averages. Excel 2007 Instructions: 1. Open File: Big Mountain.xls. 2. Follow instructions in Figures 16.19 and 16.20. 3. Create a new column of ratio-to-moving-averages using an Excel equation (e.g., D4/F4). 4. Copy the equation down to cell G15..

(281) 742. CHAPTER 16. FIGURE 16.22. |. Analyzing and Forecasting Time-Series Data. |. Excel 2007 Seasonal Index— Step 4: Big Mountain Resort Mean Ratios Seasonal indexes: Summer = 1.323 Fall = 0.626 Winter = 1.441 Spring = 0.608. Excel 2007 Instructions: 1. Open File: Big Mountain.xls. 2. Follow instructions in Figures 16.19 through 16.21. 3. Rearrange the ratio-tomoving-average value, organizing them by season of the year (summer, fall, etc.). 4. Total and average the ratio-to-movingaverages for each season.. been something other than 4 (the recurrence period). In such cases, we must adjust the seasonal indexes by multiplying each by the number of time periods in the recurrence period over the sum of the unadjusted seasonal indexes. For quarterly data such as the Big Mountain Resort example, we would multiply each seasonal index by 4/(Sum of the unadjusted seasonal indexes). Performing this multiplication will normalize the seasonal indexes. This adjustment is necessary if the seasonal adjustments are going to even out over the recurrence period. Deseasonalizing A strong seasonal component may partially mask a trend in the timeseries data. Consequently, to identify the trend you should first remove the effect of the seasonal component. This is called deseasonalizing the time series. Again, assume that the multiplicative model shown previously in Equation 16.13 is appropriate: yt Tt St Ct It Deseasonalizing is accomplished by dividing yt by the appropriate seasonal index, S t, as shown in Equation 16.15. Deseasonalization Tt Ct It . yt St. (16.15). For time period 1, which is the winter quarter, the seasonal index is 1.441. The deseasonalized value for y1 is 205/1.441 142.26 Figure 16.23 presents the deseasonalized values and the graph of these deseasonalized sales data for the Big Mountain example. This shows that there has been a gentle upward trend over the four years. Once the data have been deseasonalized, the next step is to determine the trend based on the deseasonalized data. As in the previous examples of trend estimation, you can use either.

(282) CHAPTER 16. FIGURE 16.23. |. Analyzing and Forecasting Time-Series Data. 743. |. Excel 2007 Deseasonalized Time Series for Big Mountain Sales Data Excel 2007 Instructions: 1. Open File: Big Mountain.xls. 2. Follow instructions for Figures 16.18 through 16.22. 3. Create a new column containing the deseasonalized values. Use Excel equations and Equation 16.15. 4. Select the new deseasonalized data and paste onto the line graph.. Excel or Minitab to develop the linear model for the deseasonalized data. The results are shown in Figure 16.24. The linear regression trend line equation is Ft 142.113 4.686(t) You can use this trend line and the trend projection method to forecast sales for period t 17: F17 142.113 4.686(17) 221.775 $221,775 Seasonally Unadjusted Forecast A forecast made for seasonal data that does not include an adjustment for the seasonal component in the time series.. This is a seasonally unadjusted forecast, because the time-series data used in developing the trend line were deseasonalized. Now we need to adjust the forecast for period 17 to reflect the quarterly fluctuations. We do this by multiplying the unadjusted forecast values by the appropriate seasonal index. In this case, period 17 corresponds to the winter quarter. The winter quarter has a seasonal index of 1.441, indicating a high sales period. The adjusted forecast is F17 (221.775)(1.441) 319.578, or $319,578. FIGURE 16.24. |. Excel 2007 Regression Trend Line of Big Mountain Deseasonalized Data Excel 2007 Instructions: 1. Open File: Big Mountain.xls. 2. Follow instructions in Figures 16.19 through 16.23. 3. Click on Data. 4. Select Data Analysis > Regression. 5. Specify y variable range (deseasonalized variable) and x variable range (time variable). 6. Click OK.. Linear trend equation: Ft = 142.113 + 4.686(t).

(283) 744. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. How to do it The Seasonal Adjustment Process: The Multiplicative Model We can summarize the steps for performing a seasonal adjustment to a trend-based forecast as follows:. 1. Compute each moving average. 2. 3.. 4.. 5. 6.. 7.. 8. 9.. from the k appropriate consecutive data values, where k is the number of values in one period of the time series. Compute the centered moving averages. Isolate the seasonal component by computing the ratio-tomoving-average values. Compute the seasonal indexes by averaging the ratio-tomoving-average values for comparable periods. Normalize the seasonal indexes (if necessary). Deseasonalize the time series by dividing the actual data by the appropriate seasonal index. Use least squares regression to develop the trend line using the deseasonalized data. Develop the unadjusted forecasts using trend projection. Seasonally adjust the forecasts by multiplying the unadjusted forecasts by the appropriate seasonal index.. The seasonally adjusted forecasts for each quarter in 2010 are as follows: Quarter (2010). t. Unadjusted Forecast. Index. Adjusted Forecast. Winter. 17. 221.775. 1.441. 319.578 $319,578. Spring. 18. 226.461. 0.608. 137.688 $137,688. Summer. 19. 231.147. 1.323. 305.807 $305,807. Fall. 20. 235.833. 0.626. 147.631 $147,631. You can use the seasonally adjusted trend model when a time series exhibits both a trend and seasonality. This process allows for a better identification of the trend and produces forecasts that are more sensitive to seasonality in the data. Minitab contains a procedure for generating seasonal indexes and seasonally adjusted forecasts. Figure 16.25 shows the Minitab results for the Big Mountain Ski Resort example. Notice that the forecast option in Minitab gives different forecasts than we showed earlier. This is because Minitab generates the linear trend model using original sales data rather than deseasonalized data. Our suggestion is to use Minitab to generate the seasonal indexes, but then follow our outline to generate seasonally adjusted forecasts.4 Using Dummy Variables to Represent Seasonality The multiplicative model approach for dealing with seasonal data in a time-series forecasting application is one method that is commonly used by forecasters. Another method used to incorporate the seasonal component into a linear trend forecast involves the use of dummy variables. To illustrate, we again use the Big Mountain example, which had four years of quarterly data. Because of quarterly data, start by constructing 3 dummy variables (one less than the number of data values in the. Trend model is based on original time-series data, not on deseasonalized data.. Seasonal Indexes Minitab forecasts— Based on data with seasonal component still present.. Measures of forecast error: MAPE = Mean Absolute Percentage Error.. FIGURE 16.25. |. Minitab Output Showing Big Mountain Seasonal Indexes. Minitab Instructions: 1. Open file: Big Mountain. MTW. 2. Select Stat Time Series Decomposition. 3. In Variable enter time series column. 4. In Seasonal length, enter number of time periods in season. 5. Under Model Type, choose Multiplicative. 6. Under Model Components, choose Trend plus seasonal. 7. Select Generate forecasts, for Number of forecasts insert 4: for Starting with origin insert the last time series time period:16. 8. Click OK.. 4Neither Excel nor PHStat offers a procedure for automatically generating seasonal indexes. However, as shown in the Big Mountain example, you can use the spreadsheet formulas to do this. See the Excel Tutorial that accompanies this text..

(284) |. CHAPTER 16. TABLE 16.5. |. 745. Analyzing and Forecasting Time-Series Data. Big Mountain Sales Output Using Dummy Variables. Quarter = t. x1 Winter Dummy. x2 Spring Dummy. x3 Summer Dummy. Season. Year. y Sales. Winter. 2006. 205. 1. 1. 0. 0. 96. 2. 0. 1. 0. Summer. 194. 3. 0. 0. 1. Fall. 102. 4. 0. 0. 0. 230. 5. 1. 0. 0. Spring. 105. 6. 0. 1. 0. Summer. 245. 7. 0. 0. 1. 120. 8. 0. 0. 0. 272. 9. 1. 0. 0. Spring. 110. 10. 0. 1. 0. Summer. 255. 11. 0. 0. 1. Fall. 114. 12. 0. 0. 0. 296. 13. 1. 0. 0. Spring. 130. 14. 0. 1. 0. Summer. 270. 15. 0. 0. 1. Fall. 140. 16. 0. 0. 0. Spring. Winter. 2007. Fall Winter. Winter. 2008. 2009. year; if you have monthly data, construct 11 dummy variables). Form dummy variables as follows: x1 1 if season is winter x2 1 if season is spring x3 1 if season is summer. x1 0 if not winter x2 0 if not spring x3 0 if not summer. Table 16.5 shows the revised data set for the Big Mountain Company. Next form a multiple regression model: Ft 0 1t 2x1 3x2 4x3 Note, this model formulation is an extension of the linear trend model where the seasonality is accounted for by adding the regression coefficient for the season to the linear trend fitted value. Figure 16.26 shows the Excel multiple regression output. The regression equation is Ft 71.0 4.8t 146.2(x1) 0.9(x2) 126.8(x3) The R-square value is very high at 0.9710, indicating the regression model fits the historical data quite well. The F-ratio of 92.07 is significant at any reasonable level of significance, indicating the overall regression model is statistically significant. However, the p-value for x2, the spring dummy variable, is .9359, indicating that variable is insignificant. Consequently, we will drop this variable and rerun the regression analysis with only three independent variables. The resulting model is Ft 71.5 4.8t 145.7x1 126.4x3 This overall model is significant, and all three variables are statistically significant at any reasonable level of alpha. The coefficients on the two dummy variables can be interpreted as the seasonal indexes for winter and summer. The indexes for spring and fall are incorporated into the intercept value. We can now use this model to develop forecasts for year 5 (periods 17–20) as follows: Winter (t 17): Ft 71.5 4.8(17) 145.7(1) 126.4(0) 298.80 $298,800 Spring (t 18): Ft 71.5 4.8(18) 145.7(0) 126.4(0) 157.90 $157,900 Summer (t 19): Ft 71.5 4.8(19) 145.7(0) 126.4(1) 289.10 $289,100 Fall (t 20): Ft 71.5 4.8(20) 145.7(0) 126.4(0) 167.50 $167,500.

(285) 746. CHAPTER 16. FIGURE 16.26. |. Analyzing and Forecasting Time-Series Data. |. Excel 2007 Regression Output with Dummy Variables Included—Big Mountain Example. Excel 2007 Instructions: 1. Open File: Big Mountain. xls. 2. Create three dummy variables for winter, spring, and summer. 3. Click on Data. 4. Select Data Analysis > Regression. 5. Specify y variable range and x variable range (time variable plus three dummies). 6. Click OK.. If you compare these forecasts to the ones we previously obtained using the multiplicative model approach, the forecasts for winter and summer are lower with the dummy variable model but higher for spring and fall. You could use the split-sample approach to test the two alternative approaches to see which, in this case, seems to provide more accurate forecasts based on MAD and MSE calculations. Both the multiplicative and the dummy variable approach have their advantages and both methods are commonly used by business forecasters.. MyStatLab. 16-2: Exercises Skill Development Problems 16-18 to 16-22 refer to Tran’s Furniture Store, which has maintained monthly sales records for the past 48 months, with the following results: Month. Sales ($). 1 (Jan.) 2 3 4 5 6 7 8 9 10. 23,500 21,700 18,750 22,000 23,000 26,200 27,300 29,300 31,200 34,200. Month 11 12 13 (Jan.) 14 15 16 17 18 19 20. Sales ($) 39,500 43,400 23,500 23,400 21,400 24,200 26,900 29,700 31,100 32,400. Month 21 22 23 24 25 (Jan.) 26 27 28 29 30 31 32 33 34. Sales ($). Month. Sales ($). 34,500 35,700 42,000 42,600 31,000 30,400 29,800 32,500 34,500 33,800 34,200 36,700 39,700 42,400. 35 36 37 (Jan) 38 39 40 41 42 43 44 45 46 47 48. 43,600 47,400 32,400 35,600 31,200 34,600 36,800 35,700 37,500 40,000 43,200 46,700 50,100 52,100.

(286) CHAPTER 16. 16-18. Based on the Durbin-Watson statistic, is there evidence of autocorrelation in these data? Use a linear trend model. 16-19. Using the multiplicative model, estimate the T C portion by computing a 12-month moving average and then the centered 12-month moving average. 16-20. Estimate the S I portion of the multiplicative model by finding the ratio-to-moving-averages for the timeseries data. Determine whether these ratio-to-movingaverages are stable from year to year. 16-21. Extract the irregular component by taking the normalized average of the ratio-to-moving-averages. Make a table that shows the normalized seasonal indexes. Interpret what the index for January means relative to the index for July. 16-22. Based on your work in the previous three problems, a. Determine a seasonally adjusted linear trend forecasting model. Compare this model with an unadjusted linear trend model. Use both models to forecast Tran’s sales for period 49. b. Which of the two models developed has the lower MAD and lower MSE? 16-23. Consider the following set of sales data, given in millions of dollars:. 2006. 2008. 1st quarter 152 2nd quarter 162 3rd quarter 157 4th quarter 167. 1st quarter 217 2nd quarter 209 3rd quarter 202 4th quarter 221. |. Analyzing and Forecasting Time-Series Data. 747. 16-24. Examine the following time series: t yt. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 52. 72. 58. 66. 68. 60. 46. 43. 17. 3. a. Produce a scatter plot of this time series. Indicate the appropriate forecasting model for this time series. b. Construct the equation for the forecasting model identified in part a. c. Produce forecasts for time periods 11, 12, 13, and 14. d. Obtain the forecast bias for the forecasts produced in part c if the actual time series values are -35, -41, -79, and -100 for periods 11–14, respectively. 16-25. Examine the following quarterly data: t yt. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 2. 12. 23. 20. 18. 32. 48. 41. 35. 52. 79. 63. a. Compute the four-period moving averages for this set of data. b. Compute the centered moving averages from the moving averages of part a. c. Compute the ratio-to-moving-averages values. d. Calculate the seasonal indexes. Normalize them if necessary. e. Deseasonalize the time series. f. Produce the trend line using the deseasonalized data. g. Produce seasonally adjusted forecasts for each of the time periods 13, 14, 15, and 16.. Business Applications 2007. 2009. 1st quarter 182 2nd quarter 192 3rd quarter 191 4th quarter 197. 1st quarter 236 2nd quarter 242 3rd quarter 231 4th quarter 224. a. Plot these data. Based on your visual observations, what time-series components are present in the data? b. Determine the seasonal index for each quarter. c. Fit a linear trend model to the data and determine the MAD and MSE values. Comment on the adequacy of the linear trend model based on these measures of forecast error. d. Provide a seasonally unadjusted forecast using the linear trend model for each quarter of the year 2010. e. Use the seasonal index values computed in part b to provide seasonal adjusted forecasts for each quarter of 2010.. 16-26. “The average college senior graduated this year with more than $19,000 in debt” was the beginning sentence of a recent article in USA Today. The majority of students have loans that are not due until the student leaves school. This can result in the student ignoring the size of debt that piles up. Federal loans obtained to finance college education are steadily mounting. The data given here show the amount of loans ($million) for the last 13 academic years, with year 20 being the most recent. Year. Amount. Year. Amount. Year. Amount. 1 2 3 4 5 6 7. 9,914 10,182 12,493 13,195 13,414 13,890 15,232. 8 9 10 11 12 13 14. 16,221 22,557 26,011 28,737 31,906 33,930 34,376. 15 16 17 18 19 20. 37,228 39,101 42,761 49,360 57,463 62,614. a. Produce a time-series plot of these data. Indicate the time-series components that exist in the data..

(287) 748. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. b. Conduct a test of hypothesis to determine if there exists a linear trend in these data. Use a significance level of 0.10 and the p-value approach. c. Provide a 90% prediction interval for the amount of federal loans for the 26th academic year. 16-27. The average monthly price of regular gasoline in Southern California is monitored by the Automobile Club of Southern California’s monthly Fuel Gauge Report. The prices of the time period July 2004 to June 2006 are given here. Month. Price ($). Month. Price ($). Month. Price ($). 7/04 8/04 9/04 10/04 11/04 12/04 1/05 2/05 3/05 4/05. 2.247 2.108 2.111 2.352 2.374 2.192 1.989 2.130 2.344 2.642. 5/05 6/05 7/05 8/05 9/05 10/05 11/05 12/05 1/06 2/06. 2.532 2.375 2.592 2.774 3.031 2.943 2.637 2.289 2.357 2.628. 3/06 4/06 5/06 6/06. 2.626 2.903 3.417 3.301. a. Produce a time-series plot of the average price of regular gas in Southern California. Identify any time-series components that exist in the data. b. Identify the recurrence period of the time series. Determine the seasonal index for each month within the recurrence period. c. Fit a linear trend model to the deseasonalized data. d. Provide a seasonally adjusted forecast using the linear trend model for July 2006 and July 2010. 16-28. Manuel Gutierrez correctly predicted the increasing need for home health care services due to the country’s aging population. Five years ago, he started a company offering meal delivery, physical therapy, and minor housekeeping services in the Galveston area. Since that time he has opened offices in seven additional Gulf State cities. Manuel is currently analyzing the revenue data from his first location for the first five years of operation. Revenue ($10,000s). January February March April May June July August September October November December. 2005. 2006. 2007. 2008. 2009. 23 34 45 48 46 49 60 65 67 60 71 76. 67 63 65 71 75 70 72 75 80 78 89 94. 72 64 64 77 79 72 71 77 79 78 87 92. 76 75 77 81 86 75 80 82 86 87 91 96. 81 72 71 83 85 77 79 84 91 86 94 99. a. Plot these data. Based on your visual observations, what time-series components are present in the data? b. Determine the seasonal index for each month. c. (1) Fit a linear trend model to the deseasonalized data for the years 2005–2009 and determine the MAD and MSE for forecasts for each of the months in 2010. (2) Conduct a test of hypothesis to determine if the linear trend model fits the existing data. (3) Comment on the adequacy of the linear trend model based on the measures of forecast error and the hypothesis test you conducted. d. Manuel had hoped to reach $2,000,000 in revenue by the time he had been in business for 10 years. From the results in part c, is this a feasible goal based on the historical data provided? Consider and comment on the size of the standard error for this prediction. What makes this value so large? How does it affect your conclusion? e. Use the seasonal index values computed in part b to provide seasonal adjusted forecasts for each month of the year 2010. 16-29. A major brokerage company has an office in Miami, Florida. The manager of the office is evaluated based on the number of new clients generated each quarter. The following data reflect the number of new customers added during each quarter between 2006 and 2009. 2006. 2007. 1st quarter 218 2nd quarter 190 3rd quarter 236 4th quarter 218. 1st quarter 250 2nd quarter 220 3rd quarter 265 4th quarter 241. 2008. 2009. 1st quarter 244 2nd quarter 228 3rd quarter 263 4th quarter 240. 1st quarter 229 2nd quarter 221 3rd quarter 248 4th quarter 231. a. Plot the time series and discuss the components that are present in the data. b. Referring to part a, fit the linear trend model to the data for the years 2006–2008. Then use the resulting model to forecast the number of new brokerage customers for each quarter in the year 2009. Compute the MAD and MSE for these forecasts and discuss the results. c. Using the data for the years 2006–2008, determine the seasonal indexes for each quarter. d. Develop a seasonally unadjusted forecast for the four quarters of year 2009. e. Using the seasonal indexes computed in part d, determine the seasonally adjusted forecast for each quarter for the year 2009. Compute the MAD and MSE for these forecasts..

(288) CHAPTER 16. f. Examine the values for the MAD and MSE in parts b and e. Which of the two forecasting techniques would you recommend the manager use to forecast the number of new clients generated each quarter? Support your choice by giving your rationale.. Computer Database Exercises 16-30. Logan Pickens is a plan/build construction company specializing in resort area construction projects. Plan/build companies typically have a cash flow problem since they tend to be paid in lump sums when projects are completed or hit milestones. However, their expenses, such as payroll, must be paid regularly. Consequently, such companies need bank lines of credit to finance their initial costs, but in 2009 lines of credit were difficult to negotiate. The data file LoganPickens contains month-end cash balances for the past 16 months. a. Plot the data as a time-series graph. Discuss what the graph implies concerning the relationship between cash balance and the time variable, month. b. Fit a linear trend model to the data. Compute the coefficient of determination for this model and show the trend line on the time-series graph. Discuss the appropriateness of the linear trend model. What are the strengths and weaknesses of the model? c. Referring to part b, compute the MAD and MSE for the 16 data points. d. Use the t2 transformation approach and recompute the linear model using the transformed time variable. Plot the new trend line against the transformed data. Discuss whether this model appears to provide a better fit than did the model without the transformation. Compare the coefficients of determination for the two models. Which model seems to be superior, using the coefficient of determination as the criterion? e. Refer to part d. Compute the MAD and MSE for the 16 data values. Discuss how these compare to those that were computed in part c, prior to transformation. Do the measures of fit (R2, MSE, or MAD) agree on the best model to use for forecasting purposes? 16-31. Refer to Problem 16-30. a. Use the linear trend model (without transformation) for the first 15 months and provide a cash balance forecast for month 16. Then make. |. Analyzing and Forecasting Time-Series Data. 749. the t2 transformation and develop a new linear trend forecasting model based on months 1–15. Forecast the cash balance for month 16. Now compare the accuracy of the forecasts with and without the transformation. Which of the two forecast models would you prefer? Explain your answer. b. Provide a 95% prediction interval for the cash balance forecast for month 16 using the linear trend model both with and without the transformation. Which interval has the widest width? On this basis, which procedure would you choose? 16-32. The federal funds rate is the interest rate charged by banks when banks borrow “overnight” from each other. The funds rate fluctuates according to supply and demand and is not under the direct control of the Federal Reserve Board, but is strongly influenced by the Fed’s actions. The file entitled The Fed contains the federal funds rates for the period 1955–2008. a. Produce a scatter plot of the federal funds rate for the period 1955–2008. Identify any time-series components that exist in the data. b. Identify the recurrence period of the time series. Determine the seasonal index for each month within the recurrence period. c. Fit a nonlinear trend model that uses coded years and coded years squared as predictors for the deseasonalized data. d. Provide a seasonally adjusted forecast using the nonlinear trend model for 2010 and 2012. e. Diagnose the model. 16-33. The Census Bureau of the Department of Commerce released the U.S. retail e-commerce sales (“Quarterly Retail E-Commerce Sales 1st Quarter 2006,” May 18, 2006) for the period of Fourth Quarter 1999–Fourth Quarter 2008. The file entitled E-Commerce contains those data. a. Produce a time-series plot of this data. Indicate the time-series components that exist in the data. b. Conduct a test of hypothesis to determine if there exists a linear trend in these data. Use a significance level of 0.10 and the p-value approach. c. Provide forecasts for the e-commerce retail sales for the next four quarters. d. Presume the next four quarters exhibit e-commerce retail sales of 35,916, 36,432, 35,096, and 36,807, respectively. Produce the forecast bias. Interpret this number in the context of this exercise.. END EXERCISES 16-2.

(289) 750. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 16.3 Forecasting Using Smoothing. Methods The trend-based forecasting technique introduced in the previous section is widely used and can be very effective in many situations. However, it has a disadvantage in that it gives as much weight to the earliest data in the time series as it does to the data that are close to the period for which the forecast is required. Also, this trend approach does not provide an opportunity for the model to “learn” or “adjust” to changes in the time series. A class of forecasting techniques called smoothing models is widely used to overcome these problems and to provide forecasts in situations in which there is no pronounced trend in the data. These models attempt to “smooth out” the random or irregular component in the time series by an averaging process. In this section we introduce two frequently used smoothingbased forecasting techniques: single exponential smoothing and double exponential smoothing. Double exponential smoothing offers a modification to the single exponential smoothing model that specifically deals with trends. Chapter Outcome 5.. Exponential Smoothing A time-series and forecasting technique that produces an exponentially weighted moving average in which each smoothing calculation or forecast is dependent on all previous observed values.. Exponential Smoothing The trend-based forecasting methods discussed in Section 16.2 are used in many forecasting situations. As we showed, the least squares trend line is computed using all available historical data. Each observation is given equal input in establishing the trend line, thus allowing the trend line to reflect all the past data. If the future pattern looks like the past, the forecast should be reasonably accurate. However, in many situations involving time-series data, the more recent the observation, the more indicative it is of possible future values. For example, this month’s sales are probably a better indicator of next month’s sales than would be sales from 20 months ago. However, the regression analysis approach to trend-based forecasting does not take this fact into account. The data from 20 periods ago will be given the same weight as data from the most current period in developing a forecasting model. This equal valuation can be a drawback to the trend-based forecasting approach. With exponential smoothing, current observations can be weighted more heavily than older observations in determining the forecast. Therefore, if in recent periods the time-series values are much higher (or lower) than those in earlier periods, the forecast can be made to reflect this difference. The extent to which the forecast reflects the current data depends on the weights assigned by the decision maker. We will introduce two classes of exponential smoothing models: single exponential smoothing and double exponential smoothing. Double smoothing is used when a time series exhibits a linear trend. Single smoothing is used when no linear trend is present in the time series. Both single and double exponential smoothing are appropriate for short-term forecasting and for time series that are not seasonal. Single Exponential Smoothing Just as its name implies, single exponential smoothing uses a single smoothing constant. Equations 16.16 and 16.17 represent two equivalent methods for forecasting using single exponential smoothing. Exponential Smoothing Model Ft1 Ft a(yt - Ft). (16.16). Ft1 ayt (1 - a)Ft. (16.17). or where: Ft1 Forecast value for period t 1 yt Actual value of the time series at time t Ft Forecast value for period t Alpha (smoothing constant 0 a 1) The logic of the exponential smoothing model is that the forecast made for the next period will equal the forecast made for the current period, plus or minus some adjustment factor. The.

(290) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 751. adjustment factor is determined by the difference between this period’s forecast and the actual value (yt - Ft), multiplied by the smoothing constant, . The idea is that if we forecast low, we will adjust next period’s forecast upward, by an amount determined by the smoothing constant. EXAMPLE 16-6 Excel and Minitab. tutorials. Excel and Minitab Tutorial. DEVELOPING A SINGLE EXPONENTIAL SMOOTHING MODEL. Dawson Graphic Design Consider the past 10 weeks of potential incoming customer sale calls for Dawson Graphic Design located in Orlando, Florida. These data and their line graph are shown in Figure 16.27. The data showing the number of incoming calls from potential customers are in the file Dawson. Suppose the current time period is the end of week 10 and we wish to forecast the number of incoming calls for week 11 using a single exponential smoothing model. The following steps can be used: Step 1 Specify the model. Because the data do not exhibit a pronounced trend and because we are interested in a short-term forecast (one period ahead), the single exponential smoothing model with a single smoothing constant can be used. Step 2 Fit the model. We start by selecting a value for , the smoothing constant, between 0.0 and 1.0. The closer is to 0.0, the less influence the current observations have in determining the forecast. Small values will result in greater smoothing of the time series. Likewise, when is near 1.0, the current observations have greater impact in determining the forecast and less smoothing will occur. There is no firm rule for selecting the value for the smoothing constant. However, in general, if the time series is quite stable, a small should be used to lessen the impact of random or irregular fluctuations. Because the time series shown in Figure 16.27 appears to be relatively stable, we will use 0.20 in this example.. FIGURE 16.27. |. Incoming Customer Sale Calls Data and Line Graph for Dawson Graphic Design. Excel 2007 Instructions: 1. Open data file: Dawson.xls. 2. Select the Calls data. 3. Click on Insert Line.. 4. Click on Layout. 5. Use Chart Titles and Axis Titles to provide appropriate labels..

(291) 752. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. The forecast value for period t 11 is found using Equation 16.17, as follows: F11 0.20y10 (1 - 0.20)F10 This demonstrates that the forecast for period 11 is a weighted average of the actual number of calls in period 10 and the forecast for period 10. Although we know the number of calls for period 10, we don’t know the forecast for period 10. However, we can determine it by F10 0.20y9 (1 - 0.20)F9 Again, this forecast is a weighted average of the actual number of calls in period 9 and the forecast calls for period 9. We would continue in this manner until we get to F2 0.20y1 (1 - 0.20)F1 This requires a forecast for period 1. Because we have no data before period 1 from which to develop a forecast, a rule often used is to assume that F1 y1.5 Forecast for period 1 Actual value in period 1 Because setting the starting value is somewhat arbitrary, you should obtain as much historical data as possible to “warm” the model and dampen out the effect of the starting value. In our example, we have 10 periods of data to warm the model before the forecast for period 11 is made. Note that when using an exponential smoothing model, the effect of the initial forecast is reduced by (1 - ) in the forecast for period 2, then reduced again for period 3, and so on. After sufficient periods, any error due to the arbitrary initial forecast should be very small. Figure 16.28 shows the results of using the single exponential smoothing equation and Excel for weeks 1 through 10. For week 1, F1 y1 400. Then, for week 2, we get F2 0.20y1 (1 - 0.20)F1 F2 (0.20)400 (1 - 0.20)400.00 400.00 For week 3, F3 0.20y2 (1 - 0.20)F2 F3 (0.20)430 (1 - 0.20)400.00 406.00 At the end of week 2, after seeing what actually happened to the number of calls in week 2, our forecast for week 3 is 406 calls. This is a 6-unit increase over the forecast for week 2 of 400 calls. The actual number of calls in week 2 was 430, rather than 400. The number of calls for week 2 was 30 units higher than the forecast for that time period. Because the actual calls were larger than the forecast, an adjustment must be made. The 6-unit adjustment is determined by multiplying the smoothing constant by the forecast error [0.20(30) 6], as specified in Equation 16.16. The adjustment compensates for the forecast error in week 2. Continuing for week 4 again using Equation 16.17, F4 0.20y3 (1 - 0.20)F3 F4 (0.20)420 (1 - 0.20)406.00 408.80 Recall that our forecast for week 3 was 406. However, actual calls were higher than forecast at 420, and we underforecast by 14 calls. The adjustment for week 4 is then 0.20(14) 2.80, and the forecast for week 4 is 406 2.80 408.80. This process continues through the data until we are ready to forecast week 11, as shown in Figure 16.28. F11 0.20y10 (1 - 0.20)F10 F11 (0.20)420 (1 - 0.20)435.70 432.56 5Another approach for establishing the starting value, F , is to use the mean value for some portion of the available 1 data. Regardless of the method used, the quantity of available data should be large enough to dampen out the impact of the starting value..

(292) CHAPTER 16. FIGURE 16.28. |. Analyzing and Forecasting Time-Series Data. 753. |. Dawson Graphic Design Single Exponential Smoothing—Excel Spreadsheet Minitab Instructions: (for similar results): 1. Open file: Dawson.MTW. 2. Choose Stat Time Series Single Exp Smoothing. 3. In Variable, enter time series column. 4. Under Weight to use in smoothing, select Use and insert . 5. Click on Storage and select Fits (one-periodahead forecasts). 6. Click OK. OK. Forecast for period 11 = 432.565 or 433 units. Excel 2007 Instructions: 1. Open data file: Dawson.xls. 2. Create and label two new columns. 3. Enter the smoothing constant in an empty cell (e.g. B14). 4. Enter initial forecast for period 1 in C3 (400).. 5. Use Equation 16.17 to create forecast for period t + 1 in D2. 6. Forecast for period t is set equal to forecast for period t + 1 from previous period. 7. Copy equations down.. Dawson Graphic Design managers would forecast incoming customer calls for week 11 of 432. If we wished to forecast week 12 calls, we would either use the week 11 forecast or wait until the actual week 11 calls are known and then update the smoothing equations to get a new forecast for week 12. Step 3 Diagnose the model. However, before we actually use the exponential smoothing forecast for decision-making purposes, we need to determine how successfully the model fits the historical data. Unlike the trend-based forecast, which uses least squares regression, there is no need to use split samples to test the forecasting ability of an exponential smoothing model, because the forecasts are “true forecasts.” The forecast for a given period is made before considering the actual value for that period. Figure 16.29 shows the MAD for the forecast model with 0.20 and a plot of the forecast values versus the actual call values. This plot shows the smoothing that has occurred. Note, we don’t include period 1 in the MAD calculation since the forecast is set equal to the actual value. Our next step would be to try different smoothing constants and find the MAD for each new . The forecast for period 11 would be made using the smoothing constant that generates the smallest MAD. Both Excel and Minitab have single exponential smoothing procedures, although Minitab’s procedure is much more extensive. Refer to the Excel and Minitab tutorials for instructions on each. Minitab provides optional methods for determining the initial forecast value for period 1 and a variety of useful graphs. Minitab also has an option for determining the optimal smoothing constant value.6 Figure 16.30 shows the output generated using Minitab. This shows that 6The. solver in Excel can be used to determine the optimal alpha level to minimize the MAD..

(293) 754. CHAPTER 16. FIGURE 16.29. |. Analyzing and Forecasting Time-Series Data. |. Excel 2007 Output for Dawson Graphic Design MAD Computation for Single Exponential Smoothing, = 0.20. Single smoothed. Excel 2007 Instructions: 1. Open data file: Dawson.xls. 2. Select the Calls data. 3. Click on Insert > Line. 4. Click on Layout. 5. Use Chart Titles and Axis Titles to provide appropriate labels. 6. Follow directions for Figure 16.28.. FIGURE 16.30. 7. Select the Forecast values (Ft) and copy and paste onto the line chart. 8. Create a column of forecast errors using an Excel equation. 9. Create a column of absolute forecast errors using Excel’s ABS function. 10. Compute the MAD by using the AVERAGE function for the absolute errors.. |. Optimal alpha = .524. Minitab Output for Dawson Graphic Design Single Exponential Smoothing Model. Minitab Instructions: 1. Open file: Humboldt.MTW. 2. Choose Stat Time Series Single Exp Smoothing. 3. In Variable, enter time series column.. 4. Under Weight to Use in Smoothing, select Optimal ARIMA. 5. Click OK..

(294) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 755. the best forecast is found using an 0.524. Note, MAD is decreased from 20.86 (when 0.20) to 17.321 when the optimal smoothing constant is used. END EXAMPLE. TRY PROBLEM 16-34 (pg. 759). A major advantage of the single exponential smoothing model is that it is easy to update. In Example 16-6, the forecast for week 12 using this model is found by simply plugging the actual data value for week 11, once it is known, into the smoothing formula. F12 ay11 (1 - a)F11 We do not need to go back and recompute the entire model, as would have been necessary with a trend-based regression model. Chapter Outcome 5.. Double Exponential Smoothing When the time series has an increasing or decreasing trend, a modification to the single exponential smoothing model is used to explicitly account for the trend. The resulting technique is called double exponential smoothing. The double exponential smoothing model is often referred to as exponential smoothing with trend. In double exponential smoothing, a second smoothing constant, beta (b), is included to account for the trend. Equations 16.18, 16.19, and 16.20 are needed to provide the forecasts. Double Exponential Smoothing Model Ct yt (1 - )(Ct-1 Tt-1) Tt (Ct - Ct-1) (1 - )Tt-1 Ft1 Ct Tt. (16.18) (16.19) (16.20). where: yt Value of the time series at time t Constant-process smoothing constant Trend-smoothing constant Ct Smoothed constant-process value for period t Tt Smoothed trend value for period t Ft1 Forecast value for period t 1 t Current time period Equation 16.18 is used to smooth the time-series data; Equation 16.19 is used to smooth the trend; and Equation 16.20 combines the two smoothed values to form the forecast for period t 1. EXAMPLE 16-7 Excel and Minitab. tutorials. Excel and Minitab Tutorial. DOUBLE EXPONENTIAL SMOOTHING. Billingsley Insurance Company The Billingsley Insurance Company has maintained data on the number of automobile claims filed at its Denver office over the past 12 months. These data, which are in the file Billingsley, are listed and graphed in Figure 16.31. The claims manager wants to forecast claims for month 13. A double exponential smoothing model can be developed using the following steps: Step 1 Specify the model. The time series contains a strong upward trend, so a double exponential smoothing model might be selected. As was the case with single exponential smoothing, we must select starting values. In the case of the double exponential smoothing model, we must select initial values for C0, T0, and the smoothing constants a and b. The choice of smoothing constant values ( and ) depends on the same issues as those discussed earlier for single exponential smoothing. That is, use larger smoothing constants when less smoothing is desired and values closer to 0 when more smoothing is desired. The larger the smoothing constant value, the more impact that current data will have on the forecast. Suppose we use a 0.20 and 0.30 in this example. There are several approaches for.

(295) 756. CHAPTER 16. FIGURE 16.31. |. Analyzing and Forecasting Time-Series Data. |. Excel 2007 Billingsley Insurance Company Data and Time-Series Plot. Excel 2007 Instructions: 1. Open file: Billingsley.xls. 2. Select Claims data. 3. Click on Insert Line Chart.. 4. Click on Layout Chart Title and enter desired title. 5. Click on Layout Axis Titles and enter horizontal and vertical axes titles.. selecting starting values for C0 and T0. The method we use here is to fit the least squares trend to the historical data, yˆt b0 b1t where the y intercept, b0, is used as the starting value, C0, and the slope, b1, is used as the starting value for the trend, T0. We can use the regression procedure in Excel or Minitab to perform these calculations, giving yˆt 34.273 4.1119(t ) So, C0 34.273. and T0 4.1119 Keep in mind that these are arbitrary starting values, and as with single exponential smoothing, their effect will be dampened out as you proceed through the sample data to the current period. The more historical data you have, the less impact the starting values will have in the forecast. Step 2 Fit the model. The forecast for period 1 made at the beginning of period 1 is F1 C0 T0 F1 34.273 4.1119 38.385 At the close of period 1, in which actual claims were 38, the smoothing equations are updated as follows. C1 0.20(38) (1 - 0.20)(34.273 4.1119) 38.308 T1 0.30(38.308 - 34.273) (1 - 0.30)(4.1119) 4.089 Next, the forecast for period 2 is F2 38.308 4.089 42.397 We then repeat the process through period 12 to find the forecast for period 13..

(296) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 757. Step 3 Diagnose the model. Figures 16.32 and 16.33 show the results of the computations and the MAD value. The forecast for period 13 is F13 C12 T12 F13 83.867 3.908 87.775 Based on this double exponential smoothing model, the number of claims for period 13 is forecast to be about 88. However, before settling on this forecast, we should try different smoothing constants to determine whether a smaller MAD can be found. END EXAMPLE. TRY PROBLEM 16-39 (pg. 760). As you can see, the computations required for double exponential smoothing are somewhat tedious and are ideally suited for your computer. Although Excel does not have a double exponential smoothing procedure, in Figure 16.32 we have used Excel formulas to develop. FIGURE 16.32. |. Excel 2007 Double Exponential Smoothing Spreadsheet for Billingsley Insurance. Month 13 forecast. Excel 2007 Instructions: 1. Open File: Billingsley.xls. 2. Create five new column headings as shown in Figure 16.32. 3. Place the smoothing constants (alpha and beta) in empty cells (B17 and B18). 4. Place the starting values for the constant process and the trend in empty cells (D17 and D18). 5. Use Equations 16.18 and 16.19 to create process columns.. 6. Use Equation 16.19 to create the forecast values in the forecast column. 7. Calculate the forecast error by subtracting the forecast column values from the y column values. 8. Calculate the absolute forecast errors using the Excel ABS function. 9. Calculate the MAD by using the Excel AVERAGE function..

(297) 758. CHAPTER 16. FIGURE 16.33. |. Analyzing and Forecasting Time-Series Data. |. Minitab Double Exponential Smoothing Spreadsheet for Billingsley Insurance. Forecast for month 13. MAD = 3.395. Minitab Instructions: 1. Open file: Billingsley.MTW. 2. Choose Stat Time Series Double Exp Smoothing. 3. In Variable, enter time series column.. 4. Check Generate forecasts and enter 1 in Number of forecasts and 12 in Starting from origin. 5. Click OK.. our model in conjunction with the regression tool for determining the starting values. Minitab does have a double exponential smoothing routine, as illustrated in Figure 16.33. The MAPE on the Minitab output is the Mean Absolute Percent Error, which is computed using Equation 16.21. The MAPE 5.7147, indicating that on average, the double exponential smoothing model produced a forecast that differed from the actual claims by 5.7%.. Mean Absolute Percent Error ∑ MAPE =. | yt − Ft | (100) yt n. where: yt Value of time series in time t Ft Forecast value for time period t n Number of periods of available data. (16.21).

(298) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. 759. MyStatLab. 16-3: Exercises Skill Development 16-34. The following table represents two years of data: Year 1 1st quarter 2nd quarter 3rd quarter 4th quarter. Year 2 242. 1st quarter. 272. 252 257 267. 2nd quarter 3rd quarter 4th quarter. 267 276 281. a. Prepare a single exponential smoothing forecast for the first quarter of year 3 using an alpha value of 0.10. Let the initial forecast value for quarter 1 of year 1 be 250. b. Prepare a single exponential smoothing forecast for the first quarter of year 3 using an alpha value of 0.25. Let the initial forecast value for quarter 1 of year 1 be 250. c. Calculate the MAD value for the forecasts you generated in parts a and b. Which alpha value provides the smaller MAD value at the end of the 4th quarter in year 2? 16-35. The following data represent enrollment in a major at your university for the past six semesters (Note: semester 1 is the oldest data; semester 6 is the most recent data): Semester 1 2 3 4 5 6. Enrollment 87 110 123 127 145 160. a. Prepare a graph of enrollment for the six semesters. b. Based on the graph you prepared in part a, does it appear that a trend is present in the enrollment figures? c. Prepare a single exponential smoothing forecast for semester 7 using an alpha value of 0.35. Assume that the initial forecast for semester 1 is 90. d. Prepare a double exponential smoothing forecast for semester 7 using an alpha value of 0.20 and a beta value of 0.25. Assume that the initial smoothed constant value for semester 1 is 80 and the initial smoothed trend value for semester 1 is 10. e. Calculate the MAD values for the simple exponential smoothing model and the double exponential smoothing model at the end of semester 6. Which. model appears to be doing the better job of forecasting course enrollment? Don’t include period 1 in the calculation. 16-36. The following data represent the average number of employees in outlets of a large consumer electronics retailer: Year. 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010. Number 20.6 17.3 18.6 21.5 23.2 19.9. 18.7. 15.6. 19.7. 20.4. a. Construct a time-series plot of this time series. Does it appear that a linear trend exists in the time series? b. Calculate forecasts for each of the years in the time series. Use a smoothing constant of 0.25 and single exponential smoothing. c. Calculate the MAD value for the forecasts you generated in part b. d. Construct a single exponential smoothing forecast for 2011. Use a smoothing constant of 0.25. 16-37. A brokerage company is interested in forecasting the number of new accounts the office will obtain next month. It has collected the following data for the past 12 months: Month. Accounts. 1 2 3 4 5 6 7 8 9 10 11 12. 19 20 21 25 26 24 24 21 27 30 24 30. a. Produce a time-series plot for these data. Specify the exponential forecasting model that should be used to obtain next month’s forecast. b. Assuming a double exponential smoothing model, fit the least squares trend to the historical data, to determine the smoothed constant-process value and the smoothed trend value for period 0. c. Produce the forecasts for periods 1 through 12 using 0.15, 0.25. Indicate the number of new accounts the company may expect to receive next month based on the forecast model. d. Calculate the MAD for this model..

(299) 760. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. determine the smoothed constant-process value and the smoothed trend value for period 0. c. Using data for periods 1 through 20 and using a 0.20 and 0.30, forecast the total student loan volume for the year 21. d. Calculate the MAD for this model. 16-40. The human resources manager for a medium-sized business is interested in predicting the dollar value of medical expenditures filed by employees of her company for the year 2011. From her company’s database she has collected the following information showing the dollar value of medical expenditures made by employees for the previous seven years:. Business Applications 16-38. With tax revenues declining in many states, school districts have been searching for methods of cutting costs without affecting classroom academics. One district has been looking at the cost of extracurricular activities ranging from band trips to athletics. The district business manager has gathered the past six months’ costs for these activities as shown here. Month. Expenditures ($) 23,586.41. September October November December January February. 23,539.22 23,442.06 23,988.71 23,727.13 23,799.69. Using this past history, prepare a single exponential smoothing forecast for March using an value of 0.25. 16-39. “The average college senior graduated this year with more than $19,000 in debt” was the beginning sentence of a recent article in USA Today. The majority of students have loans that are not due until the student leaves school. This can result in the student ignoring the size of debt that piles up. Federal loans obtained to finance college education are steadily mounting. The data given here show the amount of loans ($million) for the last 20 academic years, with year 20 being the most recent. Year. Amount. Year. Amount. Year. Amount. 1 2 3 4 5 6 7. 9,914. 8. 16,221. 15. 37,228. 10,182 12,493 13,195 13,414 13,890 15,232. 9 10 11 12 13 14. 22,557 26,011 28,737 31,906 33,930 34,376. 16 17 18 19 20. 39,101 42,761 49,360 57,463 62,614. a. Produce a time-series plot for these data. Specify the exponential forecasting model that should be used to obtain next year’s forecast. b. Assuming a double exponential smoothing model, fit the least squares trend to the historical data to. Year. Medical Claims. 2004 2005 2006 2007 2008 2009 2010. $405,642.43 $407,180.60 $408,203.30 $410,088.03 $411,085.64 $412,200.39 $414,043.90. a. Prepare a graph of medical expenditures for the years 2004–2010. Which forecasting technique do you think is most appropriate for this time series, single exponential smoothing or double exponential smoothing? Why? b. Use an a value of 0.25 and a b value of 0.15 to produce a double exponential forecast for the medical claims data. Use linear trend analysis to obtain the starting values for C0 and T0. c. Compute the MAD value for your model for the years 2004 to 2010. Also produce a graph of your forecast values. 16-41. Retail Forward, Inc., is a global management consulting and market research firm specializing in retail intelligence and strategies. One of its press releases (June Consumer Outlook: Spending Plans Show Resilience, June 1, 2006) divulged the result of the Retail Forward ShopperScape™ survey conducted each month from a sample of 4,000 U.S. primary household shoppers. A measure of consumer spending is represented by the following figure:. Retail Forward Future Spending IndexTM (December 2005 100) 110. 107.5 104.6 102.8 103.5. 105 100. 99.7. 99.1. 96.8. 97.3. 101.6 95.9. 94.0. 95. 101.3. 99.6. 90 Jun05. Jul05. Aug- Sep05 05. Oct05. Nov- Dec05 05. Jan06. Feb- Mar06 06. Apr- May- Jun06 06 06.

(300) CHAPTER 16. a. Construct a time-series plot of these data. Does it appear that a linear trend exists in the time series? b. Calculate forecasts for each of the months in the time series. Use a smoothing constant of 0.25. c. Calculate the MAD value for the forecasts you generated in part b. d. Construct a single exponential smoothing forecast for July 2006. Use a smoothing constant of 0.25.. Computer Database Exercises 16-42. The National Association of Theatre Owners is the largest exhibition trade organization in the world, representing more than 26,000 movie screens in all 50 states and in more than 20 countries worldwide. Its membership includes the largest cinema chains and hundreds of independent theater owners. It publishes statistics concerning the movie sector of the economy. The file entitled Flicks contains data on average U.S. ticket prices ($). One concern is the rapidly increasing price of tickets. a. Produce a time-series plot for these data. Specify the exponential forecasting model that should be used to obtain next year’s forecast. b. Assuming a double exponential smoothing model, fit the least squares trend to the historical data to determine the smoothed constant-process value and the smoothed trend value for period 0. c. Use a 0.20 and b 0.30 to forecast the average yearly ticket price for the year 2010. d. Calculate the MAD for this model. 16-43. Inflation is a fall in the market value or purchasing power of money. Measurements of inflation are prepared and published by the Bureau of Labor Statistics of the Department of Labor, which measures average changes in prices of goods and services. The file entitled CPI contains the monthly CPI and inflation rate for the period January 2000–December 2005. a. Construct a plot of this time series. Does it appear that a linear trend exists in the time series? Specify the exponential forecasting model that should be used to obtain next month’s forecast. b. Assuming a single exponential smoothing model, calculate forecasts for each of the months in the time series. Use a smoothing constant of 0.15. c. Calculate the MAD value for the forecasts you generated in part b. d. Construct a single exponential smoothing forecast for January 2006. Use a smoothing constant of 0.15.. |. Analyzing and Forecasting Time-Series Data. 761. 16-44. The sales manager at Grossmieller Importers in New York City needs to determine a monthly forecast for the number of men’s golf sweaters that will be sold so that he can order an appropriate amount of packing boxes. Grossmieller ships sweaters to retail stores throughout the United States and Canada. Shirts are packed six to a box. Data for the past 12 months are contained in the data file called Grossmieller. a. Plot the sales data using a time-series plot. Based on the graph, what time series components are present? Discuss. b. (1) Use a single exponential smoothing model with a 0.30 to forecast sales for month 17. Assume that the initial forecast for period 1 is 36,000. (2) Compute the MAD for this model. (3) Graph the smoothing-model-fitted values on the time-series plot. c. (1) Referring to part b, try different alpha levels to determine which smoothing constant value you would recommend. (2) Indicate why you have selected this value and then develop the forecast for month 17. (3) Compare this to the forecast you got using a 0.30 in part b. 16-45. Referring to Problem 16-44, in which the sales manager for Grossmieller Imports of New York City needs to forecast monthly sales, a. Discuss why a double exponential smoothing model might be preferred over a single exponential smoothing model. b. (1) Develop a double exponential smoothing model using a 0.20 and b 0.30 as smoothing constants. To obtain the starting values, use the regression trend line approach discussed in this section. (2) Determine the forecast for month 17. (3) Also compute the MAD for this model. (4) Graph the fitted values on the time-series graph. c. Compare the results for this double exponential smoothing model with the “best” single exponential smoothing model developed in part c of Exercise 16-44. Discuss which model is preferred. d. Referring to part b, try different alpha and beta values in an attempt to determine an improved forecast model for monthly sales. For each model, show the forecast for period 17 and the MAD. Write a short report that compares the different models. e. Referring to part d and to part c for Exercise 16-44, write a report to the Grossmieller sales manager that indicates your choice for the forecasting model, complete with your justification for the selection. END EXERCISES 16-3.

(301) 762. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. Visual Summary Chapter 16: Organizations must operate effectively in the environment they face today, but also plan to continue to effectively operate in the future. To plan for the future organizations must forecast. This chapter introduces the two basic types of forecasting: qualitative forecasting and quantitative forecasting. Qualitative forecasting techniques are based on expert opinion and judgment. Quantitative forecasting techniques are based on statistical methods for analyzing quantitative historical data. The chapter focuses on quantitative forecasting techniques. Numerous techniques exist, often determined by the forecasting horizon. Forecasts are often divided into four phases, immediate forecasts of one month or less, short term of one to three months, medium term of three months to two years and long term of two years of more. The forecasting technique used is often determined by the length of the forecast, called the forecasting horizon. The model building issues discussed in Chapter 15 involving model specification, model fitting, and model diagnosis also apply to forecasting models.. 16.1 Introduction to Forecasting, Time-Series Data, and Index Numbers (pg. 710–723) Summary Quantitative forecasting techniques rely on data gathered in the past to forecast what will happen in the future. Time series analysis is a commonly used quantitative forecasting technique. Time series analysis involves looking for patterns in the past data that will hopefully continue into the future. It involves looking for four components, trend, seasonal, cyclical and random. A trend is the long-term increase or decrease in a variable being measured over time and can be linear or nonlinear. A seasonal component is present if the data shows a repeating pattern over time. If when observing time-series data you see sustained periods of high values followed by periods of lower values and the recurrence period of these fluctuations is larger than a year, the data are said to contain a cyclical component. Although not all time series possess a trend, seasonal, or cyclical component, virtually all time series will have a random component.The random component is often referred to as “noise” in the data. When analyzing time-series data, you will often compare one value measured at one point in time with other values measured at different points in time. A common procedure for making relative comparisons is to begin by determining a base period index to which all other data values can be fairly compared. The simplest index is an unweighted aggregate index. More complicated weighted indexes include the Paasche and Lespeyres indexes.. Outcome 1. Identify the components present in a time series. Outcome 2. Understand and compute basic index numbers.. 16.2 Trend-Based Forecasting Techniques (pg. 724–749) Summary Trend-based forecasting techniques begin by identifying and modeling that trend. Once the trend model has been defined, it is used to provide forecasts for future time periods. Regression analysis is often used to identify the trend component. How well the trend fits the actual data can be determined by the Mean Squared Error (MSE) or Mean Absolute Deviation (MAD). In general the smaller the MSE and MAD the better the model fits the actual data. Using regression analysis to determine the trend carries some risk, one of which is that the error terms in the analysis are not independent. Related error terms indicate autocorrelation in the data and is tested for using the Durbin-Watson Statistic. Seasonality is often found in trend based forecasting models and if found is dealt with by computing seasonal indexes. While alternate methods are used to compute seasonal indexes this section concentrates on the ratio-to-moving-average method. Once the seasonal indexes are determined, they are used to deseasonalize the data to allow for a better trend forecast. The indexes are then used to determine a seasonally adjusted forecast. Determining the trend and seasonal components to a time series model allows the cyclical and random components to be better determined. Outcome 3. Apply the fundamental steps in developing and implementing forecasting models. Outcome 4. Apply trend-based forecasting models, including linear trend, nonlinear trend, and seasonally adjusted trend.. Conclusion 16.3 Forecasting Using Smoothing Methods (pg. 750–761) Summary A disadvantage of trend based forecasting is that it gives as much weight to the earliest data in the time series as it does to the data that are close to the period for which the forecast is required. It does not therefore allow model to “learn” or “adjust” to changes in the time series. This section introduces exponential smoothing models. With exponential smoothing, current observations can be weighted more heavily than older observations in determining the forecast. Therefore, if in recent periods the time-series values are much higher (or lower) than those in earlier periods, the forecast can be made to reflect this difference. The section discusses single exponential smoothing models and double exponential smoothing models. Single exponential smoothing models are used when only random fluctuations are seen in the data while double exponential smoothing models are used if the data seems to combine both random variations with a trend. Both models weigh recent data more heavily than past data. As with all forecast models the basic steps of model building: specification, fitting and diagnosing the model are followed.. Outcome 5. Use smoothing-based forecasting models, including single and double exponential smoothing. While both qualitative and quantitative forecasting techniques are used, this chapter has emphasized quantitative techniques. Quantitative forecasting techniques require historical data for the variable to be forecasted. The success of a quantitative model is determined by how well the model fits the historical time-series data and how closely the future resembles the past. Forecasting is as much an art as it is a science. The more experience you have in a given situation, the more effective you likely will be at identifying and applying the appropriate forecasting tool. You will find that the techniques introduced in this chapter are used frequently as an initial basis for a forecast. However, in most cases, the decision maker will modify the forecast based on personal judgment and other qualitative inputs that are not considered by the quantitative model..

(302) CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. Equations (16.1) Simple Index Number pg. 714. It =. (16.11) Durbin-Watson Statistic pg. 729 n. yt 100 y0. ∑ (et − et −1 )2. d t 2. (16.2) Unweighted Aggregate Price Index pg. 716. It . ∑ pt (100) ∑ p0. (16.12) Forecast Bias pg. 733. Forecast bias . ∑ qt pt (100) ∑ qt p0. (16.14) Ratio-to-Moving-Average pg. 741. St × I t =. (16.5) Deflation Formula pg. 721. t. yt (100) It. Tt Ct I t . yt b0 blt t ∑ t ∑ yt n b1 (∑ t )2 2 ∑t − n b0 =. ∑ yt ∑t − b1 n n. (16.9) Mean Squared Error pg. 727. MSE . ∑ ( yt − Ft )2 n. yt St. (16.16) Exponential Smoothing Model pg. 750. Ft1 Ft (yt - Ft). (16.7) Least Squares Equations Estimates pg. 725. ∑ tyt −. yt Tt × Ct. (16.15) Deseasonalization pg. 742. (16.6) Linear Trend Model pg. 725. (16.8). n. yt Tt St Ct It. ∑ q0 pt (100) ∑ q0 p0. yadj =. ∑ ( yt − Ft ). (16.13) Multiplicative Time-Series Model pg. 739. (16.4) Laspeyres Index pg. 718. It . ∑ et2 t 1. (16.3) The Paasche Index pg. 717. It . n. or (16.17). Ft1 yt (1 - )Ft. (16.18) Double Exponential Smoothing Model pg. 755. Ct yt (1 - )(Ct-1 Tt-1) (16.19). Tt (Ct - Ct-1) (1 - )Tt-1. (16.20). Ft1 Ct Tt. (16.21) Mean Absolute Percent Error pg. 758. (16.10) Mean Absolute Deviation pg. 727. ∑. ∑ | yt − Ft | MAD n. MAPE . | yt − Ft | (100) yt n. Key Terms Aggregate price index pg. 715 Autocorrelation pg. 728 Base period index pg. 714 Cyclical component pg. 713 Exponential smoothing pg. 752 Forecasting horizon pg. 710. Forecasting interval pg. 710 Forecasting period pg. 710 Linear trend pg. 711 Model diagnosis pg. 710 Model fitting pg. 710 Model specification pg. 710. Moving average pg. 739 Random component pg. 713 Seasonal component pg. 712 Seasonal index pg. 739 Seasonally unadjusted forecast pg. 743. 763.

(303) 764. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. Chapter Exercises Conceptual Questions 16-46. Go to the library or use the Internet to find data showing your state’s population for the past 20 years. Plot these data and indicate which of the time-series components are present. 16-47. A time series exhibits the pattern stated below. Indicate the type of time-series component described. a. The pattern is “wavelike” with a recurrence period of nine months. b. The time series is steadily increasing. c. The pattern is “wavelike” with a recurrence period of two years. d. The pattern is unpredictable. e. The pattern steadily decreases, with a “wavelike” shape which reoccurs every 10 years. 16-48. Identify the businesses in your community that might be expected to have sales that exhibit a seasonal component. Discuss. 16-49. Discuss the difference between a cyclical component and a seasonal component. Which component is more predictable, seasonal or cyclical? Discuss and illustrate with examples. 16-50. In the simple linear regression model, confidence and prediction intervals are utilized to provide interval estimates for an average and a particular value, respectively, of the dependent variable. The linear trend model in time series is an application of simple linear regression. This being said, discuss whether a confidence or a prediction interval is the relevant interval estimate for a linear trend model’s forecast.. Business Applications Problems 16-51 through 16-54 refer to Malcar Autoparts Company, which has started producing replacement control microcomputers for automobiles. This has been a growth industry since the first control units were introduced in 1985. Sales data since 1994 are as follows: Year. Sales ($). Year. Sales ($). 1994 1995 1996 1997 1998 1999 2000 2001. 240,000. 2002. 1,570,000. 218,000 405,000 587,000 795,000 762,000 998,000 1,217,000. 2003 2004 2005 2006 2007 2008 2009. 1,947,000 2,711,000 3,104,000 2,918,000 4,606,000 5,216,000 5,010,000. MyStatLab 16-51. As a start in analyzing these data, a. Graph these data and indicate whether they appear to have a linear trend. b. Develop a simple linear regression model with time as the independent variable. Using this regression model, describe the trend and the strength of the linear trend over the 16 years. Is the trend line statistically significant? Plot the trend line against the actual data. c. Compute the MAD value for this model. d. Provide the Malcar Autoparts Company an estimate of its expected sales for the next 5 years. e. Provide the maximum and minimum sales Malcar can expect with 90% confidence for the year 2014. 16-52. Develop a single exponential smoothing model using a 0.20. Use as a starting value the average of the first 6 years’ data. Determine the forecasted value for year 2010. a. Compute the MAD for this model. b. Plot the forecast values against the actual data. c. Use the same starting value but try different smoothing constants (say, 0.05, 0.10, 0.25, and 0.30) in an effort to reduce the MAD value. d. Is it possible to answer part d of Problem 16.51 using this forecasting technique? Explain your answer. 16-53. Develop a double exponential smoothing model using smoothing constants a 0.20 and b 0.40. As starting values, use the least squares trend line slope and intercept values. a. Compute the MAD for this model. b. Plot the forecast values against the actual data. c. Use the same starting values but try different smoothing constants [say, (a, b) (0.10, 0.50), (0.30, 0.30), and (0.40, 0.20)] in an effort to reduce the MAD value. 16-54. Using whatever diagnostic tools you are familiar with, determine which of the three forecasting methods utilized to forecast sales for Malcar Autoparts Company in the previous three problems provides superior forecasts. Explain the reasons for your choice. 16-55. Amazon.com has become one of the most successful online merchants. Two measures of its success are sales and net income/loss figures. They are given here..

(304) CHAPTER 16. Year. Net Income/Loss. 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007. Sales. -0.3. 0.5. -5.7 -27.5 -124.5 -719.9 -1,411.2 -567.3 -149.1 35.3 588.5 359 190 476. 15.7 147.7 609.8 1,639.8 2,761.9 3,122.9 3,933 5,263.7 6,921 8,490 10,711 14,835. a. Produce a time-series plot for these data. Specify the exponential forecasting model that should be used to obtain the following years’ forecasts. b. Assuming a double exponential smoothing model, fit the least squares trend to the historical data to determine the smoothed constant-process value and the smoothed trend value for period 0. c. Produce the forecasts for periods 1 through 13 using a 0.10 and b 0.20. Indicate the sales Amazon should expect for 2008 based on the forecast model. d. Calculate the MAD for this model. 16-56. College tuition has risen at a pace faster than inflation for more than two decades, according to an article in USA Today. The following data indicate the average college tuition (in 2003 dollars) for public colleges: Period 1983–1984 1988–1989 1993–1994 1998–1999 2003–2004 2008–2009 Public. 2,074. 2,395. 3,188. 3,632. 4,694. 5,652. a. Produce a time-series plot of these data. Indicate the time-series components that exist in the data. b. Provide a forecast for the average tuition for public colleges in the academic year 2013–2014. (Hint: One time-series time period represents five academic years.) c. Provide an interval of plausible values for the average tuition change after five academic periods have gone by. Use a confidence level of 0.90.. Computer Database Exercises 16-57. HSH® Associates, financial publishers, is the nation’s largest publisher of mortgage and consumer loan information. Every week it collects current data from 2,000 mortgage lenders across the nation. It tracks a variety of adjustable rate mortgage (ARM) indexes and makes them available on its Web site. The file ARM. |. Analyzing and Forecasting Time-Series Data. 765. contains the national monthly average one-year ARM for the time period January 2004 to December 2008. a. Produce a scatter plot of the federal ARM for the time period January 2004 to December 2008. Identify any time-series components that exist in the data. b. Identify the recurrence period of the time series. Determine the seasonal index for each month within the recurrence period. c. Fit a nonlinear trend model containing coded years and coded years squared as predictors for the deseasonalized data. d. Provide a seasonally adjusted forecast using the nonlinear trend model for January 2009. e. Diagnose the model. 16-58. DataNet is an Internet service where clients can find information and purchase various items such as airline tickets, stereo equipment, and listed stocks. DataNet has been in operation for four years. Data on monthly calls for service for the time that the company has been in business are in the data file called DataNet. a. Plot these data in a time-series graph. Based on the graph, what time-series components are present in the data? b. Develop the seasonal indexes for each month. Describe what the seasonal index for August means. c. Fit a linear trend model to the deseasonalized data for months 1–48 and determine the MAD value. Comment on the adequacy of the linear trend model based on these measures of forecast error. d. Provide a seasonally unadjusted forecast using the linear trend model for each month of the year. e. Use the seasonal index values computed in part b to provide seasonal adjusted forecasts for months 49–52. 16-59. Referring to Problem 16-58, the managers of DataNet, the Internet company where users can purchase products like airline tickets, need to forecast monthly call volumes in order to have sufficient capacity. Develop a single exponential smoothing model using a 0.30. Use as a starting value the average of the first six months’ data. a. Compute the MAD for this model. b. Plot the forecast values against the actual data. c. Use the same starting value but try different smoothing constants (say, 0.10, 0.20, 0.40, and 0.50) in an effort to reduce the MAD value. d. Reflect on the type of time series for which the single exponential smoothing model is designed to provide forecasts. Does it surprise you that the MAD for this method is relatively large for these data? Explain your reasoning. 16-60. Continuing with the DataNet forecasting problems, develop a double exponential smoothing model using smoothing constants a 0.20 and b 0.20. As.

(305) 766. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. starting values, use the least squares trend line slope and intercept values. a. Compute the MAD for this model. b. Plot the forecast values against the actual data. c. Compare this with a linear trend model. Which forecast method would you use? Explain your rationale. d. Use the same starting values but try different smoothing constants [say, (, ) (0.10, 0.30), (0.15, 0.25), and (0.30, 0.10)] in an effort to reduce the MAD value. Prepare a short report that summarizes your efforts. 16.61. The College Board, administrator of the SAT test for college entrants, has made several changes to the test in recent years. One recent change occurred between years 2005 and 2006. In a press release the College Board announced SAT scores for students in the class of 2005, the last to take the former version of the SAT featuring math and verbal sections. The board indicated that for the class of 2005, the average SAT. video. math scores continued their strong upward trend, increasing from 518 in 2004 to 520 in 2005, 14 points higher than 10 years ago and an all-time high. The file entitled MathSAT contains the math SAT scores for the interval 1967 to 2005. a. Produce a time-series plot for the combined gender math SAT scores for the period 1980 to 2005. Indicate the time-series components that exist in the data. b. Conduct a test of hypothesis to determine if the average SAT math scores of students continued to increase in the period indicated in part a. Use a significance level of 0.10 and the test statistic approach. c. Produce a forecast for the average SAT math scores for 2010. d. Beginning with the March 12, 2005, administration of the exam, the SAT Reasoning Test, was modified and lengthened. How does this affect the forecast produced in part c? What statistical concept is exhibited by producing the forecast in part c?. Video Case 2. Restaurant Location and Re-imaging Decisions @ McDonald’s In the early days of his restaurant company’s growth, McDonald’s founder Ray Kroc knew that finding the right location was key. He had a keen eye for prime real estate locations. Today, the company is more than 30,000 restaurants strong. When it comes to picking prime real estate locations for its restaurants and making the most of them, McDonald’s is way ahead of the competition. In fact, when it comes to global real estate holdings, no corporate entity has more. From urban office and airport locations, to Wal-Mart stores and the busiest street corner in your town, McDonald’s has grown to become one of the world’s most recognized brands. Getting there hasn’t been just a matter of buying all available real estate on the market. Instead, the company has used the basic principles and process Ray Kroc believed in to investigate and secure the best possible sites for its restaurants. Factors such as neighborhood demographics, traffic patterns, competitor proximity, workforce, and retail shopping center locations all play a role. Many of the company’s restaurant locations have been in operation for decades. And although the restaurants have adapted to changing times—including diet fads and reporting nutrition information, staff uniform updates, and menu innovations such as Happy Meals, Chicken McNuggets, and premium salads—there’s more to bringing customers back time and again than an updated menu and a good location. Those same factors that played a role in the original location decision need to be periodically examined to learn what’s changed and, as a result, what changes the local McDonald’s needs to consider. Beginning in 2003, McDonald’s started work on “re-imaging” its existing restaurants while continuing to expand the brand. globally. More than 6,000 restaurants have been re-imaged to date. Sophia Galassi, vice president of U.S. Restaurant Development, is responsible for the new look nationwide. According to Sophia, reimaging is more than new landscaping and paint. In some cases, the entire store is torn down and rebuilt with redesigned drive-thru lanes to speed customers through faster, interiors with contemporary colors and coffee-house seating, and entertainment zones with televisions, and free Wi-Fi. “We work very closely with our owner/operators to collect solid data about their locations, and then help analyze them so we can present the business case to them,” says Sophia. Charts and graphs, along with the detailed statistical results, are vital to the decision process. One recent project provides a good example of how statistics supported the re-imaging decision. Dave Traub, owner/operator, had been successfully operating a restaurant in Midlothian, Virginia, for more than 30 years. The location was still prime, but the architecture and décor hadn’t kept up with changing times. After receiving the statistical analysis on the location from McDonald’s, Dave had the information he needed to make the decision to invest in re-imaging the restaurant. With revenues and customer traffic up, he has no regrets. “We’ve become the community’s gathering place. The local senior citizens group now meets here regularly in the mornings,” he says. The re-imaging effort doesn’t mean the end to new restaurant development for the company. “As long as new communities are developed and growth continues in neighborhoods across the country, we’ll be analyzing data about them to be sure our restaurants are positioned in the best possible locations,” states Sophia. Ray Kroc would be proud..

(306) CHAPTER 16. Discussion Questions: 1. Sophia Galassi, vice president of U.S. Restaurant Development for McDonald’s, indicated that she and her staff work very closely with owner/operators to collect data about McDonald’s restaurant locations. Describe some of the kinds of data that Sophia’s staff would collect and the respective types of charts that could be used to present their findings to the owner/operators. 2. At the end of 2001, Sophia Galassi and her team led a remodel and re-imaging effort for the McDonald’s franchises in a major U.S. city. This entailed a total change in store layout and design and a renewed emphasis on customer service. Once this work had been completed, the company put in place a comprehensive customer satisfaction data collection and tracking system. The data in the file called McDonald’s Customer Satisfaction consist of the overall percentage of customers at the franchise McDonald’s in this city who have rated the customer service as Excellent or Very Good during each quarter since the re-imaging and remodeling was completed. Develop a line chart and discuss what time-series components appear to be contained in these data. 3. Referring to question 2, based on the available historical data, develop a seasonally adjusted forecast for the percentage of customers who will rate the stores as Excellent or Very Good for Quarter 3 and Quarter 4 of 2006. Discuss the process you used to arrive at these forecasts.. |. Analyzing and Forecasting Time-Series Data. 767. 4. Referring to questions 2 and 3, use any other forecasting method discussed in this chapter to arrive at a forecast for Quarters 3 and 4 of 2006. Compare your chosen model with the seasonally adjusted forecast model specified in question 3. Use appropriate measures of forecast error. Prepare a short report outlining your forecasting attempts along with your recommendation of which method McDonald’s should use in this case. 5. Prior to remodeling or re-imaging a McDonald’s store, extensive research is conducted. This includes the use of “mystery shoppers,” who are people hired by McDonald’s to go to stores as customers to observe various attributes of the store and the service being provided. The file called McDonald’s Mystery Shopper contains data pertaining to the “cleanliness” rating provided by the mystery shoppers who visited a particular McDonald’s location each month between January 2004 and June 2006. The values represent the average rating on a 0–100 percent scale provided by five shoppers. A score of 100% is considered perfect. Using these time-series data, develop a line chart and discuss what timeseries components are present in these data. 6. Referring to question 5, develop a double exponential smoothing model to forecast the rating for July 2006 (use alpha 0.20 and beta 0.30 smoothing constants.) Compare the results of this forecasting approach with a simple linear trend forecasting approach. Write a short report describing the methods you have used and the results. Use linear trend analysis to obtain the starting values for C0 and T0.. Case 16.1 Park Falls Chamber of Commerce Masao Sugiyama is the recently elected president of the Chamber of Commerce in Park Falls, Wisconsin. He is the long-time owner of the only full-service hardware store in this small farming town. Being president of the Chamber of Commerce has been considered largely a ceremonial post because business conditions have not changed in Park Falls for as long as anyone can remember. However, Masao has just read an article in The Wall Street Journal that has made him think he needs to take a more active interest in the business conditions of his town. The article concerned Wal-Mart, the largest retailer in the United States. Wal-Mart has expanded primarily by locating in small towns and avoiding large suburban areas. The Park Falls merchants have not had to deal with either Lowes or Home Depot because these companies have located primarily in large urban centers. In addition, a supplier has recently told Masao that both Lowes and Home Depot are considering locating stores in smaller towns. Sugiyama knows that Wal-Mart has moved into the outskirts of metropolitan areas and now is considering stores for smaller, untapped markets. He also has heard that Lowes and Home Depot have recently had difficulty. Masao decided he needs to know more about all three retailers. He asked the son of a friend to locate the following sales data, which are also in a file called Park Falls.. Quarterly Sales Values in Millions of Dollars. Fiscal 1999. Fiscal 2000. Fiscal 2001. Fiscal 2002. Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4. Lowes. Home Depot. Wal-Mart. $ 3,772 $ 4,435 $ 3,909 $ 3,789 $ 4,467 $ 5,264 $ 4,504 $ 4,543 $ 5,276 $ 6,127 $ 5,455 $ 5,253 $ 6,470 $ 7,488 $ 6,415 $ 6,118. $ 8,952 $10,431 $ 9,877 $ 9,174 $11,112 $12,618 $11,545 $10,463 $12,200 $14,576 $13,289 $13,488 $14,282 $16,277 $14,475 $13,213. $29,819 $33,521 $33,509 $40,785 $34,717 $38,470 $40,432 $51,394 $42,985 $46,112 $45,676 $56,556 $48,052 $52,799 $52,738 $64,210. (continued ).

(307) 768. CHAPTER 16. |. Analyzing and Forecasting Time-Series Data. Quarterly Sales Values in Millions of Dollars Lowes. Home Depot. Wal-Mart. Fiscal 2003. Q1 Q2 Q3 Q4. $ $ $ $. 7,118 8,666 7,802 7,252. $15,104 $17,989 $16,598 $15,125. $51,705 $56,271 $55,241 $66,400. Fiscal 2004. Q1 Q2 Q3 Q4. $ 8,681 $10,169 $ 9,064 $ 8,550. $17,550 $19,960 $18,772 $16,812. $56,718 $62,637 $62,480 $74,494. Quarterly Sales Values in Millions of Dollars. Fiscal 2005. Q1 Q2 Q3 Q4. Lowes. Home Depot. Wal-Mart. $ 9,913 $11,929 $10,592 $10,808. $18,973 $22,305 $30,744 $19,489. $64,763 $69,722 $68,520 $82,216. Masao is interested in what all these data tell him. How much faster has Wal-Mart grown than the other two firms? Is there any evidence Wal-Mart’s growth has leveled off? Does Lowes seem to be rebounding, based on sales? Are seasonal fluctuations an issue in these sales figures? Is there any evidence that one firm is more affected by the cyclical component than the others? He needs some help in analyzing these data.. Case 16.2 The St. Louis Companies An irritated Roger Hatton finds himself sitting in the St. Louis airport after hearing that his flight to Chicago has been delayed— and, if the storm in Chicago continues, possibly cancelled. Because he must get to Chicago if at all possible, Roger is stuck at the airport. He decides he might as well try to get some work done, so he opens his laptop computer and calls up the Claimnum file. Roger was recently assigned as an analyst in the worker’s compensation section of the St. Louis Companies, one of the biggest issuers of worker’s compensation insurance in the country. Until this year, the revenues and claim costs for all parts of the company were grouped together to determine any yearly profit or loss. Therefore, no one really knew if an individual department was profitable. Now, however, the new president is looking at each part of the company as a profit center. The clear implication is that money-losing departments may not have a future unless they develop a clear plan to become profitable. When Roger asked the accounting department for a listing, by client, of all policy payments and claims filed and paid, he was told that the information is available but he may have to wait two or three months to get it. He was able to determine, however, that the department has been keeping track of the clients who file frequent (at least one a month) claims and the total number of firms that. purchase workers’ compensation insurance. Using the data from this report, Roger divides the number of clients filing frequent claims by the corresponding number of clients. These ratios, in the file Claimnum, are as follows: Year. Ratio (%). Year. Ratio (%). 1 2. 3.8. 12. 6.1. 3.6. 13. 7.8. 3. 3.5. 14. 7.1. 4. 4.9. 15. 7.6. 5. 5.9. 16. 9.7. 6. 5.6. 17. 9.6. 7. 4.9. 18. 7.5. 8. 5.6. 19. 7.9. 9. 8.5. 20. 8.3. 10. 7.7. 21. 8.4. 11. 7.1. Staring at these figures, Roger feels there should be some way to use them to project what the next several years may hold if the company doesn’t change its underwriting policies.. Case 16.3 Wagner Machine Works Mary Lindsey has recently agreed to leave her upper-level management job at a major paper manufacturing firm and return to her hometown to take over the family machine-products business. The U.S. machine-products industry had a strong position of world dominance until recently, when it was devastated by foreign competition, particularly from Germany and Japan. Among the many problems facing the American industry is that it is made up of many small firms that must compete with foreign industrial giants.. Wagner Machine Works, the company Mary is taking over, is one of the few survivors in its part of the state, but it, too, faces increasing competitive pressure. Mary’s father let the business slide as he approached retirement, and Mary sees the need for an immediate modernization of their plant. She has arranged for a loan from the local bank, but now she must forecast sales for the next three years to ensure that the company has enough cash flow to repay the debt. Surprisingly, Mary finds that her father has no.

(308) CHAPTER 16. forecasting system in place, and she cannot afford the time, or money, to install a system like that used at her previous company. Wagner Machine Works’ quarterly sales (in millions of dollars) for the past 15 years are as follows: Quarter Year. 1. 2. 3. 4. 1995 1996. 10,490. 11,130. 10,005. 11,058. 11,424. 12,550. 10,900. 12,335. 1997. 12,835. 13,100. 11,660. 13,767. 1998. 13,877. 14,100. 12,780. 14,738. 1999. 14,798. 15,210. 13,785. 16,218. 2000. 16,720. 17,167. 14,785. 17,725. 2001. 18,348. 18,951. 16,554. 19,889. 2002. 20,317. 21,395. 19,445. 22,816. 2003. 23,335. 24,179. 22,548. 25,029. 2004. 25,729. 27,778. 23,391. 27,360. 2005. 28,886. 30,125. 26,049. 30,300. 2006. 30,212. 33,702. 27,907. 31,096. 2007. 31,715. 35,720. 28,554. 34,326. 2008. 35,533. 39,447. 30,046. 37,587. 2009. 39,093. 44,650. 30,046. 37,587. |. Analyzing and Forecasting Time-Series Data. 769. While looking at these data, Mary wonders whether they can be used to forecast sales for the next three years. She wonders how much, if any, confidence she can have in a forecast made with these data. She also wonders if the recent increase in sales is due to growing business or just to inflationary price increases in the national economy.. Required Tasks: 1. Identify the central issue in the case. 2. Plot the quarterly sales for the past 15 years for Wagner Machine Works. 3. Identify any patterns that are evident in the quarterly sales data. 4. If a seasonal pattern is identified, estimate quarterly seasonal factors. 5. Deseasonalize the data using the quarterly seasonal factors developed. 6. Run a regression model on the deseasonalized data using the time period as the independent variable. 7. Develop a seasonally adjusted forecast for the next three years. 8. Prepare a report that includes graphs and analysis.. References Armstrong, J. Scott, “Forecasting by Extrapolation: Conclusions from 25 Years of Research.” Interfaces, 14, no. 6 (1984). Bails, Dale G., and Larry C. Peppers, Business Fluctuations: Forecasting Techniques and Applications, 2nd ed. (Englewood Cliffs, NJ: Prentice Hall, 1992). Berenson, Mark L., and David M. Levine, Basic Business Statistics: Concepts and Applications, 11th ed. (Upper Saddle River, NJ: Prentice Hall, 2008). Bowerman, Bruce L., and Richard T. O’Connell, Forecasting and Time Series: An Applied Approach, 4th ed. (North Scituate, MA: Duxbury Press, 1993). Brandon, Charles, R. Fritz, and J. Xander, “Econometric Forecasts: Evaluation and Revision.” Applied Economics, 15, no. 2 (1983). Cryer, Jonathan D., Time Series Analysis (Boston: Duxbury Press, 1986). Frees, Edward W., Data Analysis Using Regression Models: The Business Perspective (Englewood Cliffs, NJ: Prentice Hall, 1996). Granger, C. W. G., Forecasting in Business and Economics, 2nd ed. (New York: Academic Press, 1989). Kutner, Michael H., Christopher J. Nachtshein, John Neter, and William Li, Applied Linear Statistical Models, 5th ed. (New York: McGraw-Hill Irwin, 2005). Makridakis, Spyros, Steven C. Wheelwright, and Rob J. Hyndman, Forecasting: Methods and Applications, 3rd ed. (New York: John Wiley & Sons, 1998). McLaughlin, Robert L., “Forecasting Models: Sophisticated or Naive?” Journal of Forecasting, 2, no. 3 (1983). Microsoft Excel 2007 (Redmond,WA: Microsoft Corp., 2007). Minitab for Windows Version 15 (State College, PA: Minitab, 2007). Montgomery, Douglas C., and Lynwood A. Johnson, Forecasting and Time Series Analysis, 2nd ed. (New York: McGraw-Hill, 1990). Nelson, C. R., Applied Time Series Analysis for Managerial Forecasting (San Francisco: Holdon-Day, 1983). The Ombudsman: “Research on Forecasting—A Quarter-Century Review, 1960–1984.” Interfaces, 16, no. 1 (1986). Willis, R. E., A Guide to Forecasting for Planners (Englewood Cliffs, NJ: Prentice Hall, 1987). Wonnacott, T. H., and R. J. Wonnacott, Econometrics, 2nd ed. (New York: John Wiley & Sons, 1979)..

(309) chapter 17. Chapter 17 Quick Prep Links • Review the concepts associated with hypothesis testing for a single population mean using the t-distribution in Chapter 9.. • Make sure you are familiar with the steps involved in testing hypotheses for the difference between two population means discussed in Chapter 10.. • Review the concepts and assumptions associated with analysis of variance in Chapter 12.. Introduction to Nonparametric Statistics 17.1 The Wilcoxon Signed Rank Test for One Population Median. Outcome 1. Recognize when and how to use the Wilcoxon signed rank test for a population median.. (pg. 771–776). 17.2 Nonparametric Tests for Two Population Medians (pg. 776–789). Outcome 2. Recognize the situations for which the Mann–Whitney U-test for the difference between two population medians applies and be able to use it in a decision-making context. Outcome 3. Know when to apply the Wilcoxon matchedpairs signed rank test for related samples.. 17.3 Kruskal–Wallis One-Way Analysis of Variance. Outcome 4. Perform nonparametric analysis of variance using the Kruskal–Wallis one-way ANOVA.. (pg. 789–796). Why you need to know Housing prices are particularly important when a company considers potential locations for a new manufacturing plant because the company would like affordable housing to be available for employees who transfer to the new location. A company that is in the midst of relocation has taken a sample of real estate listings from the four cities in contention for the new plant and would like to make a statistically valid comparison of home prices based on this sample information. Another company is considering changing to a group-based, rather than an individual-based, employee evaluation system. As a part of its analysis, the firm has gathered questionnaire data from employees who were asked to rate their satisfaction with the evaluation system on a five-point scale: very satisfied, satisfied, of no opinion, dissatisfied, or very dissatisfied. In previous chapters, you were introduced to a wide variety of statistical techniques that would seem to be useful tools for these companies. However, many of the techniques discussed earlier may not be appropriate for these situations. For instance, in the plant relocation situation, the analysis of variance (ANOVA) F-test introduced in Chapter 12 would seem appropriate. However, this test is based on the assumptions that all populations are normally distributed and have equal variances. Unfortunately, housing prices are generally not normally distributed because most cities have home prices that are highly right skewed with most home prices clustered around the median price with a few very expensive houses that pull the mean value up. In the employee questionnaire situation, answers were measured on an ordinal, not on an interval or ratio, scale, and interval or ratio data is required to use a t or F test. To handle cases where interval or ratio data is not available, a class of statistical tools called nonparametric statistics has been developed.. 770.

(310) CHAPTER 17. |. Introduction to Nonparametric Statistics. 771. 17.1 The Wilcoxon Signed Rank Test. for One Population Median Up to this point, the text has presented a wide array of statistical tools for describing data and for drawing inferences about a population based on sample information from that population. These tools are widely used in decision-making situations. However, you will also encounter decision situations in which major departures from the required assumptions exist. For example, many populations, such as family income levels and house prices, are highly skewed. In other instances, the level of data measurement will be too low (ordinal or nominal) to warrant use of the techniques presented earlier. In such cases, the alternative is to employ a nonparametric statistical procedure that has been developed to meet specific inferential needs. Such procedures have fewer restrictive assumptions concerning data level and underlying probability distributions. There are a great many nonparametric procedures that cover a wide range of applications. The purpose of this chapter is to introduce you to the concept of nonparametric statistics and illustrate some of the more frequently used methods.. Chapter Outcome 1.. The Wilcoxon Signed Rank Test—Single Population Chapter 9 introduced examples that involved testing hypotheses about a single population mean. Recall that if the data were interval or ratio level and the population was normally distributed, a t-test was used to test whether a population mean had a specified value. However the t-test is not appropriate in cases in which the data level is ordinal or when populations are not believed to be approximately normally distributed. To overcome data limitation issues, a nonparametric statistical technique known as the Wilcoxon signed rank test can be used. This test makes no highly restrictive assumption about the shape of the population distribution. The Wilcoxon test is used to test hypotheses about a population median rather than a population mean. The basic logic of the Wilcoxon test is straightforward. Because the median is the midpoint in a population, allowing for sampling error, we would expect approximately half the data values in a random sample to be below the hypothesized median and about half to be above it. The hypothesized median will be rejected if the actual data distribution shows too large a departure from this expectation.. BUSINESS APPLICATION. APPLYING THE WILCOXON SIGNED RANK TEST. UNIVERSITY UNDERGRADUATE STARTING SALARIES The university placement office is interested in testing to determine whether the median starting salary distribution for undergraduates exceeds $35,000. People in the office believe the salary distribution is highly skewed to the right, so the center of the distribution should be measured by the median. Therefore the t-test from Chapter 9, which requires that the population be normally distributed, is not appropriate. A simple random sample of n 10 graduates is selected. The Wilcoxon signed rank test can be used to test whether the population median exceeds $35,000. As with all tests, we start by stating the appropriate null and alternative hypotheses. The null and alternative hypotheses for the one-tailed test are. H 0: ~ m $35, 000 HA : ~ m $35, 000 The test will be conducted using a 0.05 For small samples, the hypothesis is tested using a W-test statistic determined by the following steps: Step 1 Collect the sample data. Step 2 Compute di, the deviation between each value and the hypothesized median..

(311) 772. CHAPTER 17. |. Introduction to Nonparametric Statistics. TABLE 17.1. |. Wilcoxon Ranking Table for Starting Salaries. Example Salary xi ($). di xi $35,000. |di |. Rank. 36,400 38,500 27,000 35,000 29,000 40,000 52,000 34,000 38,900 41,000. 1,400 3,500 8,000 0 6,000 5,000 17,000 1,000 3,900 6,000. 1,400 3,500 8,000 0 6,000 5,000 17,000 1,000 3,900 6,000. 2 3 8. R. R. 2 3. 6.5 5 5 9 9 1 4 4 6.5 6.5 Total W 29.5. 8 6.5. 1 ___ 15.5. Step 3 Convert the di values to absolute differences. Step 4 Determine the ranks for each di value, eliminating any zero di values. The lowest di value receives a rank of 1. If observations are tied, assign the average rank of the tied observations to each of the tied values. Step 5 For any data value greater than the hypothesized median, place the rank in an R+ column. For data values less than the hypothesized median, place the rank in an R– column. Step 6 The test statistic W is the sum of the ranks in the R column. For a lower tail test use the sum in the R– column. For an equal to hypothesis use either Table 17.1 shows the results for a random sample of 10 starting salaries. The hypothesis is tested comparing the calculated W-value with the critical values for the Wilcoxon signed rank test that are shown in Appendix P. Both upper and lower critical values are shown, corresponding to n 5 to n 20 for various levels of alpha. Note that n equals the number of nonzero di values. In this example, we have n 9 nonzero di values. The lower critical value for n 9 and a one-tailed a 0.05 is 8. The corresponding upper-tailed critical value is 37. Because this is an upper-tail test, we are interested only in the upper critical value W0.05. Therefore, the decision rule is If W 37, reject H0. Because W 29.5 37, we do not reject the null hypothesis and are unable to conclude that the median starting salary for university graduates exceeds $35,000. Although neither Excel nor PHStat has a Wilcoxon signed rank test, Minitab does. Figure 17.1 illustrates the Minitab output for this example. Note that the p-value 0.221 a 0.05, which reinforces the conclusion that the null hypothesis should not be rejected. The starting salary example illustrates how the Wilcoxon signed rank test is used when the sample sizes are small. The W-test statistic approaches a normal distribution as n increases. Therefore, for sample sizes 20, the Wilcoxon test can be approximated using the normal distribution where the test statistic is a z-value, as shown in Equation 17.1. Large-Sample Wilcoxon Signed Rank Test Statistic z. n(n + 1) 4 n(n 1)(2n 1) 24 W−. where: W Sum of the R ranks n Number of nonzero di values. (17.1).

(312) CHAPTER 17. FIGURE 17.1. |. Introduction to Nonparametric Statistics. 773. |. Minitab Output—Wilcoxon Ranked Sum Test for Starting Salaries. Minitab Instructions: 1. Enter data in column. 2. Choose Stat Nonparametric 1-Samples Wilcoxon. 3. In Variable, enter the data column.. 4. Choose Test median and enter median value tested. 5. From Alternative, select appropriate hypothesis. 6. Click OK.. WILCOXON SIGNED RANK TEST, ONE SAMPLE, n 20. EXAMPLE 17-1. Executive Salaries A recent article in the business section of a regional newspaper indicated that the median salary for C-level executives (CEO, CFO, CIO, etc.) in the United States is less than $276,200. A shareholder advocate group has decided to test this assertion. A random sample of 25 C-level executives was selected. Since we would expect that executive salaries are highly rightskewed, a t-test is not appropriate. Instead a large-sample Wilcoxon signed rank test can be conducted using the following steps: Step 1 Specify the null and alternative hypotheses. In this case, the null and alternative hypotheses are H 0: ~ m

(313) $276, 200 HA : ~ m $276, 200 (claim) Step 2 Determine the significance level for the test. The test will be conducted using a 0.01 Step 3 Collect the sample data and compute the W-test statistic. Using the steps outlined on pages 771–772, we manually compute the W-test statistic as shown in Table 17.2. Step 4 Compute the z-test statistic. The z-test statistic using the sum of the positive ranks is z. n(n 1) 4 n(n 1)(2n 1) 24. W−. 25(25 1) 4 −2.49 25(25 1)(2(25) 1) 24 70 −.

(314) 774. CHAPTER 17. |. Introduction to Nonparametric Statistics. TABLE 17.2. |. Wilcoxon Ranking Table for Executive Salaries Example. Salary xi ($) 273,000 269,900 263,500 260,600 259,200 257,200 256,500 255,400 255,200 297,750 254,200 300,750 249,500 303,000 304,900 245,900 243,500 237,650 316,250 234,500 228,900 217,000 212,400 204,500 202,600. di. |di |. Rank. 3,200 6,300 12,700 15,600 17,000 19,000 19,700 20,800 21,000. 3,200 6,300 12,700 15,600 17,000 19,000 19,700 20,800 21,000 21,550 22,000 24,550 26,700 26,800 28,700 30,300 32,700 38,550 40,050 41,700 47,300 59,200 63,800 71,700 73,600. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25. 21,550 22,000 24,550 26,700 26,800 28,700 30,300 32,700 38,550 40,050 41,700 47,300 59,200 63,800 71,700 73,600. R. R 1 2 3 4 5 6 7 8 9. 10 11 12 13 14 15 16 17 18 19. 70. 20 21 22 23 24 25 255. Step 5 Reach a decision. The critical value for a one-tailed test for alpha 0.01 from the standard normal distribution is 2.33. Because z 2.49 2.33, we reject the null hypothesis. Step 6 Draw a conclusion. Thus, based on the sample data, the shareholder group should conclude the median executive salary is less than $276,200. >>END EXAMPLE. TRY PROBLEM 17-1 (pg. 774). MyStatLab. 17-1: Exercises Skill Development. 17-2. Consider the following set of observations:. 17-1. Consider the following set of observations: 10.21 13.65 12.30 9.51 11.32 12.77 6.16 8.55 11.78 12.32 9.0 15.6 21.1 11.1 13.5 9.2 13.6 15.8 12.5 18.7 18.9. You should not assume these data come from a normal distribution. Test the hypothesis that the median of these data is greater than or equal to 14.. You should not assume these data come from a normal distribution. Test the hypothesis that these data come from a distribution with a median less than or equal to 10..

(315) CHAPTER 17. 17-3. Consider the following set of observations: 3.1. 4.8. 2.3. 5.6. 2.8. 2.9. 4.4. You should not assume these data come from a normal distribution. Test the hypothesis that these data come from a distribution with a median equal to 4. Use a 0.10.. Business Applications 17-4. Sigman Corporation makes batteries that are used in highway signals in rural areas. The company managers claim that the median life of a battery exceeds 4,000 hours. To test this claim, they have selected a random sample of n 12 batteries and have traced their life spans between installation and failure. The following data were obtained: 1,973 4,459. 4,838 4,098. 3,805 4,722. 4,494 5,894. 4,738 3,322. 194 278 302 140 245 234 268 208 102 190 220 255. a. Construct the appropriate null and alternative hypotheses. b. Based on the sample data, what should the operations manager conclude? Test at the 0.05 significance level. 17-6. A recent trade newsletter reported that during the initial 6-month period of employment, new sales personnel in an insurance company spent a median of 119 hours per month in the field. A random sample of 20 new salespersons was selected. The numbers of hours spent in the field by members in a randomly chosen month are listed here: 163 147 189 142. 103 102 126 111. 112 95 135 103. 96 134 114 89. 134 126 129 115. Introduction to Nonparametric Statistics. 775. Do the data support the trade newsletter’s claim? Conduct the appropriate hypothesis test with a significance level of 0.05. 17-7. At Hershey’s, the chocolate maker, a particular candy bar is supposed to weigh 11 ounces. However, the company has received complaints that the bars are under weight. To assess this situation, the company has conducted a statistical study that concluded that the average weight of the candy is indeed 11 ounces. However, a consumer organization, while acknowledging the finding that the mean weight is 11 ounces, claims that more than 50% of the candy bars weigh less than 11 ounces and that a few heavy bars pull the mean up, thereby cheating a majority of customers. A sample of 20 candy bars was selected. The data obtained follow: 10.9 11.5 10.6 10.7. 5,249 4,800. a. State the appropriate null and alternative hypotheses. b. Assuming that the test is to be conducted using a 0.05 level of significance, what conclusion should be reached based on these sample data? Be sure to examine the required normality assumption. 17-5. A cable television customer call center has a goal that states that the median time for each completed call should not exceed four minutes. If calls take too long, productivity is reduced and other customers have to wait too long on hold. The operations manager does not want to incorrectly conclude that the goal isn’t being satisfied unless sample data justify that conclusion. A sample of 12 calls was selected, and the following times (in seconds) were recorded:. |. 11.7 10.8 10.9 10.8. 10.5 11.2 11.6 10.5. 11.8 11.8 11.2 11.3. 10.2 10.7 11.0 10.1. Test the consumer organization’s claim at a significance level of 0.05. 17-8. Sylvania’s quality control division is constantly monitoring various parameters related to its products. One investigation addressed the life of incandescent light bulbs (in hours). Initially, they were satisfied with examining the average length of life. However, a recent sample taken from the production floor gave them pause for thought. The data follow: 1,100 1,460 1,150 1,770. 1,140 1,940 1,260 1,270. 1,550 2,080 1,760 1,210. 1,210 1,350 1,250 1,230. 1,280 1,150 1,500 1,230. 840 730 1,560 2,100. 1,620 2,410 1,210 1,630. 1,500 1,060 1,440 500. Their initial efforts indicated that the average length of life of the light bulbs was 1,440 hours. a. Construct a box and whisker plot of these data. On this basis, draw a conclusion concerning the distribution of the population from which this sample was drawn. b. Conduct a hypothesis test to determine if the median length of life of the light bulbs is longer than the average length of life. Use a 0.05. 17-9. The Penn Oil Company wished to verify the viscosity of its premium 30-weight oil. A simple random sample of specimens taken from automobiles running at normal temperatures was obtained. The viscosities observed were as follows: 25 25 35 27. 24 35 29 31. 21 38 30 32. 35 32 27 30. 25 36 28 30.

(316) 776. CHAPTER 17. |. Introduction to Nonparametric Statistics. Determine if the median viscosity at normal running temperatures is equal to 30 as advertised for Penn’s premium 30-weight oil. (Use a 0.05.). Computer Database Exercises 17-10. The Cell Tone Company sells cellular phones and airtime in several states. At a recent meeting, the marketing manager stated that the median age of its customers is less than 40. This came up in conjunction with a proposed advertising plan that is to be directed toward a young audience. Before actually completing the advertising plan, Cell Tone decided to randomly sample customers. Among the questions asked in a survey of 50 customers in the Jacksonville, Florida, area was the customers’ ages. The data are in the file Cell Phone Survey. a. Examine the sample data. Is the variable being measured a discrete or a continuous variable? Does it seem feasible that these data could have come from a normal distribution? b. The marketing manager must support his statement concerning customer age in an upcoming board. meeting. Using a significance level of 0.10, provide this support for the marketing manager. 17-11. The Wilson Company uses a great deal of water in the process of making industrial milling equipment. To comply with federal clean-water laws, it has a water purification system that all wastewater goes through before being discharged into a settling pond on the company’s property. To determine whether the company is complying with federal requirements, sample measures are taken every so often. One requirement is that the median pH level must be less than 7.4. A sample of 95 pH measures has been taken. The data for these measures are shown in the file Wilson Water. a. Carefully examine the data. Use an appropriate procedure to determine if the data could have been sampled from a normal distribution. (Hint: Review the goodness-of-fit test in Chapter 13.) b. Based on the sample data of pH level, what should the company conclude about its current status on meeting federal requirements? Test the hypothesis at the 0.05 level. END EXERCISES 17-1. 17.2 Nonparametric Tests for Two. Population Medians Chapters 9 through 12 introduced a variety of hypothesis-testing tools and techniques. Included were tests involving two or more population means. These tests carried with them several assumptions and requirements. For some situations in which you are testing about the difference between two population means, the student t-distribution is employed. One of the assumptions for the t-distribution is that the two populations are normally distributed. Another is that the data are interval or ratio level. Although in many situations these assumptions and the data requirements will be satisfied, you will often encounter situations in which this is not the case. In this section we introduce two nonparametric techniques that do not require such stringent assumptions and data requirements: the Mann–Whitney U-test1 and the Wilcoxon matched-pairs signed rank test. Both tests can be used with ordinal (ranked) data, and neither requires that the populations be normally distributed. The Mann–Whitney U-test is used when the samples are independent, whereas the Wilcoxon matched-pairs signed rank test is used when the design has paired samples. Chapter Outcome 2.. The Mann–Whitney U-Test BUSINESS APPLICATION. TESTING TWO POPULATION MEDIANS. BLAINE COUNTY HIGHWAY DISTRICT The workforce of the Blaine County Highway District (BCHD) is made up of the rural and urban divisions. A few months ago, several rural division supervisors began claiming that the urban division employees waste gravel from the county gravel pit. The supervisors claimed the urban division uses more gravel per mile of road maintenance than the rural division. In response to these. 1An. equivalent test to the Mann–Whitney U-test is the Wilcoxon rank-sum test..

(317) CHAPTER 17. |. Introduction to Nonparametric Statistics. 777. claims, the BCHD materials manager performed a test. He selected a random sample from the district’s job-cost records of jobs performed by the urban (U) division and another sample of jobs performed by the rural (R) division. The yards of gravel per mile for each job are recorded. Even though the data are ratio-level, the manager is not willing to make the normality assumptions necessary to employ the two-sample t-test (discussed in Chapter 10). However, the Mann–Whitney U-test will allow him to compare the gravel use of the two divisions. The Mann–Whitney U-test is one of the most commonly used nonparametric tests to compare samples from two populations in those cases when the following assumptions are satisfied: Assumptions. 1. 2. 3. 4.. The two samples are independent and random. The value measured is a continuous variable. The measurement scale used is at least ordinal. If they differ, the distributions of the two populations will differ only with respect to central location.. The fourth point is instrumental in setting your null and alternative hypotheses. We are interested in determining whether two populations have the same or different medians. The test can be performed using the following steps: Step 1 State the appropriate null and alternative hypotheses. In this situation, the variable of interest is cubic yards of gravel used. This is a ratio-level variable: However, the populations are suspected to be skewed, so the material manager has decided to test the following hypotheses, stated in terms of the population medians: H0 : ~ mU ~ mR ( Median urban gravel use is less than or equal to median rural use.) ~ ~ HA: mU mR (Urban median exceeds rural med dian.) Step 2 Specify the desired level of significance. The decision makers have determined that the test will be conducted using a 0.05 Step 3 Select the sample data and compute the appropriate test statistic. Computing the test statistic manually requires several steps: 1. Combine the raw data from the two samples into one set of numbers, keeping track of the sample from which each value came. 2. Rank the numbers in this combined set from low to high. Note that we expect no ties to occur because the values are considered to have come from continuous distributions. However, in actual situations ties will sometimes occur. When they do, we give tied observations the average of the rank positions for which they are tied. For instance, if the lowest four data points were each 460, each of the four 460s would receive a rank of (1 2 3 4)/4 10/4 2.5.2 3. Separate the two samples, listing each observation with the rank it has been assigned. This leads to the rankings shown in Table 17.3. The logic of the Mann–Whitney U-test is based on the idea that if the sum of the rankings of one sample differs greatly from the sum of the rankings of the second sample, we should conclude that there is a difference in the population medians.. 2Noether provides an adjustment when ties occur. He, however, points out that using the adjustment has little effect unless a large proportion of the observations are tied or there are ties of considerable extent. See the References at the end of this chapter..

(318) 778. CHAPTER 17. |. Introduction to Nonparametric Statistics. | Ranking of Yards of Gravel per Mile for the Blaine County Highway District Example TABLE 17.3. Urban (n1 12) Yards of Gravel 460 830 720 930 500 620 703 407 1,521 900 750 800. Rural (n2 12). Rank. Yards of Gravel. 2 16 12 20 4 8 11 1 24 17 13 14 R1 142. Rank. 600 652 603 594 1,402 1,111 902 700 827 490 904 1,400. 6 9 7 5 23 21 18 10 15 3 19 22 R2 158. 4. Calculate a U-value for each sample, as shown in Equations 17.2 and 17.3. U Statistics U1 n1n2 . U 2 n1n2 . n1 (n1 1). − ∑ R1. (17.2). − ∑ R2. (17.3). 2 n2 (n2 1) 2. where: n1 and n2 Sample sizes from populations 1 and 2 R1 and R2 Sum of ranks for samples 1 and 2 For our example using the ranks in Table 17.3, U1 12(12) 80 U 2 12(12) 64. 12(13) − 142 2 12(13) − 158 2. Note that U1 U2 = n1n2. This is always the case, and it provides a good check on the correctness of the rankings in Table 17.3. 5. Select the U-value to be the test statistic. The Mann–Whitney U tables in Appendices L and M give the lower tail of the U-distribution. For one-tailed tests such as our Blaine County example, you need to look at the alternative hypothesis to determine whether U1 or U2 should be selected as the test statistic. Recall that H : ~ ~ A. U. R. If the alternative hypothesis indicates that population 1 has a higher median, as in this case, then U1 is selected as the test statistic. If population 2 is expected to have a higher median, then U2 should be selected as the test statistic. The reason is that the population with the larger median should have the larger sum of ranked values, thus producing the smaller U-value..

(319) CHAPTER 17. |. Introduction to Nonparametric Statistics. 779. It is very important to note that this logic must be made in terms of the alternative hypothesis and not on the basis of the U-values obtained from the samples. Now, we select the U-value that the alternative hypothesis indicates should be the smaller and call this U. Because population 1 (Urban) should have the smaller U-value (larger median) if the alternative hypothesis is true, the sample data give a U 80 This is actually larger than the U-value for the rural population, but we still use it as the test statistic because the alternative hypothesis indicates that m% U m% R .3 Step 4 Determine the critical value for the Mann–Whitney U-test. For sample sizes less than 9, use the Mann–Whitney U table in Appendix L for the appropriate sample size. For sample sizes from 9 to 20, as in this example, the null hypothesis can be tested by comparing U with the appropriate critical value given in the Mann–Whitney U table in Appendix M. We begin by locating the part of the table associated with the desired significance level. In this case, we have a one-tailed test with a 0.05 Go across the top of the Mann–Whitney U table to locate the value corresponding to the sample size from population 2 (Rural) and down the left side of the table to the sample size from population 1 (Urban). In the Blaine County example, both sample sizes are 12, so we will use the Mann–Whitney table in Appendix M for a 0.05. Go across the top of the table to n2 12 and down the left-hand side to n1 12 The intersection of these column and row values gives a critical value of U0.05 42 We can now form the decision rule as follows: If U 42, reject H0. Otherwise, do not reject H0. Step 5 Reach a decision. Now because U 80 42 we do not reject the null hypothesis. Step 6 Draw a conclusion. Therefore, based on the sample data, there is not sufficient evidence to conclude that the median yards of gravel per mile used by the urban division is greater than that for the rural division. Neither Excel nor the PHStat add-ins contain a Mann–Whitney U-test. PHStat does have the equivalent Wilcoxon Rank Sum Test. The Wilcoxon test uses, as its test statistic, the sum of the ranks from the population that is supposed to have the larger median. Referring to Table 17.3, Urban is supposed to have the larger median, and the sum of the ranks is 142. Minitab, on the 3For a two-tailed test, you should select the smaller U-value as your test statistic. This will force you toward the lower tail. If the U-value is smaller than the critical value in the Mann–Whitney U table, you will reject the null hypothesis..

(320) 780. CHAPTER 17. FIGURE 17.2. |. Introduction to Nonparametric Statistics. |. Minitab Output— Mann–Whitney U-Test for the Blaine County Example. Minitab Instructions: 1. Enter data in columns. 2. Choose Stat Nonparametric Mann–Whitney. 3. In First Sample, enter one data column.. 4. In Second Sample, enter the other data column. 5. In Alternative, select greater than. 6. Click OK.. other hand, contains the Mann–Whitney test but not the Wilcoxon Rank Sum Test. The test statistic used in the Mann–Whitney test is a function of the rank sums produced in the Wilcoxon test and the two tests are equivalent.. Mann–Whitney U-Test—Large Samples When you encounter a situation with sample sizes in excess of 20, the previous approaches to the Mann–Whitney U-test cannot be used because of table limitations. However, the U statistic approaches a normal distribution as the sample sizes increase, and the Mann–Whitney U-test can be conducted using a normal approximation approach, where the mean and standard deviation for the U statistic are as given in Equations 17.4 and 17.5, respectively. Mean and Standard Deviation for U Statistic nn m 1 2 2 s. (n1 )(n2 )(n1 n2 1) 12. (17.4). (17.5). where: n1 and n2 Sample sizes from populations 1 and 2 Equations 17.4 and 17.5 are used to form the U-test statistic in Equation 17.6. Mann–Whitney U-Test Statistic z. U −. n1n2. 2 (n1 )(n2 )(n1 n2 1) 12. (17.6).

(321) CHAPTER 17. BUSINESS APPLICATION. Excel and Minitab. tutorials. Excel and Minitab Tutorial. |. Introduction to Nonparametric Statistics. 781. LARGE SAMPLE TEST OF TWO POPULATION MEDIANS. FUTURE-VISION Consider an application involving the managers for a local network television affiliate who are preparing for a national television advertising conference. The theme of the presentation is to center around the advantage for businesses to advertise on network television rather than on cable. The managers believe that median household income for cable subscribers is less than the median for those who do not subscribe. Therefore, by advertising on network stations, businesses could reach a higher income audience. The managers are concerned with the median (as opposed to the mean) income because data such as household income are notorious for having large outliers. In such cases, the median, which is not sensitive to outliers, is a preferable measure of the center of the data. The large outliers are also an indication that the data do not have a symmetric (such as the normal) distribution—another reason to use a nonparametric procedure such as the Mann–Whitney test. The managers can use the Mann–Whitney U-test and the following steps to conduct a test about the median incomes for cable subscribers versus nonsubscribers. Step 1 Specify the null and alternative hypotheses. Given that the managers believe that median household income for cable subscribers (C) is less than the median for those who do not subscribe (NC) the null and alternative hypotheses to be tested are. H0 : ~ mC

(322) ~ m NC ~ H A: mC ~ m NC (claim) Step 2 Specify the desired level of significance. The test is to be conducted using a 0.05 Step 3 Select the random sample and compute the test statistic. In the spirit of friendly cooperation, the network managers joined forces with the local cable provider, Future-Vision, to survey a total of 548 households (144 nonsubscribers and 404 cable subscribers) in the market area. The results of the survey are contained in the file Future-Vision. Because of the sample size, we can use the large-sample approach to the Mann–Whitney U-test. To compute the test statistic shown in Equation 17.6, use the following steps: 1. The income data must be converted to ranks. The sample data and ranks are in a file called Future-Vision-Ranks. Note that when data are tied in value, they share the same average rank. For example, if four values are tied for the fifth position, each one is assigned the average of rankings 5, 6, 7, and 8, or (5 6 7 8)/4 6.5. 2. Next, we compute the U-value. The sum of the ranks for noncable subscribers is R1 41,204 and the sum of the ranks for cable subscribers is R2 109,222 3. Based on sample sizes of n1 144 noncable subscribers and n2 404.

(323) 782. CHAPTER 17. |. Introduction to Nonparametric Statistics. cable subscribers, we compute the U-values using Equations 17.2 and 17.3. U1 144(404) . 144(145) − 41, 204 27, 412 2. U 2 144(404) . 404(405) − 109, 222 30, 764 2. Because the alternative hypothesis predicts that noncable subscribers will have a higher median, U1 is selected to be U. Thus, U 27,412 4. We now substitute appropriate values into Equations 17.4 and 17.5. m . n1n2. . 2. (144)(404) 29, 088 2. and s. (n1 )(n2 )(n1 n2 1) 12. . 4 1) (144)(404)(144 404 1, 631.43 12. 5. The test statistic is computed using Equation 17.6.. z. U −. n1n2. 2 (n1 )(n2 )(n1 n2 1). 12 −1, 676 − 1.027 1, 631.43. 27, 412 − 29, 088 (144)(404)(144 404 1) 12. Step 4 Determine the critical value for the test. Based on a one-tailed test with a 0.05, the critical value from the standard normal distribution table is z0.05 1.645 Step 5 Reach a decision. Since z 1.027 1.645, the null hypothesis cannot be rejected. Step 6 Draw a conclusion. This means that the claim that noncable families have higher median incomes than cable families is not supported by the sample data. Chapter Outcome 3.. Assumptions. The Wilcoxon Matched-Pairs Signed Rank Test The Mann–Whitney U-test is a very useful nonparametric technique. However, as discussed in the Blaine County Highway District example, its use is limited to those situations in which the samples from the two populations are independent. As we discussed in Chapter 10, you will encounter decision situations in which the samples will be paired and, therefore, are not independent. The Wilcoxon matched-pairs signed rank test has been developed for situations in which you have related samples and are unwilling or unable (due to data-level limitations) to use the paired-sample t-test. It is useful when the two related samples have a measurement scale that allows us to determine not only whether the pairs of observations differ but also the magnitude of any difference. The Wilcoxon matched-pairs test can be used in those cases in which the following assumptions are satisfied:. 1. The differences are measured on a continuous variable. 2. The measurement scale used is at least interval. 3. The distribution of the population differences is symmetric about their median..

(324) CHAPTER 17. EXAMPLE 17-2. |. Introduction to Nonparametric Statistics. 783. SMALL-SAMPLE WILCOXON TEST. Financial Systems Associates Financial Systems Associates develops and markets financial planning software. To differentiate its products from the other packages on the market, Financial Systems has built many macros into its software. According to Financial Systems, once a user learns the macro keystrokes, complicated financial computations become much easier to perform. As part of its product-development testing program, software engineers at Financial Systems have selected a focus group of seven people who frequently use spreadsheet packages. Each person is given complicated financial and accounting data and is asked to prepare a detailed analysis. The software tracks the amount of time each person takes to complete the task. Once the analysis is complete, these same seven individuals are given a training course in Financial Systems add-ons. After the training course, they are given a similar set of data and are asked to do the same analysis. Again, the systems software determines the time needed to complete the analysis. You should recognize that the samples in this application are not independent because the same subjects are used in both cases. If the software engineers performing the analysis are unwilling to make the normal distribution assumption required of the paired-sample t-test, they can use the Wilcoxon matched-pairs signed rank test. This test can be conducted using the following steps: Step 1 Specify the appropriate null and alternative hypotheses. The null and alternative hypotheses being tested are H0 : ~ mb ~ ma H :~ m ~ m ( Median time will be leess after the training.) A. b. a. Step 2 Specify the desired level of significance. The test will be conducted using a 0.025 Step 3 Collect the sample data and compute the test statistic. The data are shown in Table 17.4. First, we convert the data in Table 17.4 to differences. The column of differences, d, gives the “before minus after” differences. The next column is the rank of the d-values from low to high. Note that the ranks are determined without considering the sign on the d-value. However, once the rank is determined, the original sign on the d-value is attached to the rank. For example, d 13 is given a rank of 7, whereas d 4 has a rank of 3. The final column is titled “Ranks with Smallest Expected Sum.” To determine the values in this column, we take the absolute values of either the positive or the negative ranks, depending on which group has the smallest expected sum of absolute-valued ranks. We look to the alternative hypothesis, which is HA : ~ mb ~ ma TABLE 17.4. |. Financial Systems Associates Ranked Data. Subject. Before Training. After Training. 1 2 3 4 5 6 7. 24 20 19 20 13 28 15. 11 18 23 15 16 22 8. d. Rank of d. Ranks with Smallest Expected Sum. 13 2 4. 7 1 3. 3. 5 3. 4 2. 2. 6 7. 5 6 T5.

(325) 784. CHAPTER 17. |. Introduction to Nonparametric Statistics. Because the before median is predicted to exceed the after median, we would expect the positive differences to exceed the negative differences. Therefore, the negative ranks should have the smaller sum, and therefore should be used in the final column, as shown in Table 17.4. The test statistic, T, is equal to the sum of absolute values of these negative ranks. Thus, T 5. Step 4 Determine the critical value. To determine whether T is sufficiently small to reject the null hypothesis, we consult the Wilcoxon table of critical T-values in Appendix N. If the calculated T is less than or equal to the critical T from the table, the null hypothesis is rejected. For instance, with a 0.025 for our one-tailed test and n 7, we get a critical value of T0.025 2 Step 5 Reach a decision. Because T 5 2, do not reject H0. Step 6 Draw a conclusion. Based on these sample data, Financial Systems Associates does not have a statistical basis for stating that its product will reduce the median time required to perform complicated financial analyses. >>END EXAMPLE. TRY PROBLEM 17-20 (pg. 786). Ties in the Data If the two measurements of an observed data pair have the same values and, therefore, a d-value of 0, that case is dropped from the analysis and the sample size is reduced accordingly. You should note that this procedure favors rejecting the null hypothesis because we are eliminating cases in which the two sample points have exactly the same values. If two or more d-values have the same absolute values, we assign the same average rank to each one using the same approach as with the Mann–Whitney U-test. For example, if we have two d-values that tie for ranks 4 and 5, we average them as (4 5)/2 4.5 and assign both a rank of 4.5. Studies have shown that this method of assigning ranks to ties has little effect on the Wilcoxon test results. For a more complete discussion of the effect of ties on the Wilcoxon matched-pairs signed rank test, please see the text by Marascuilo and McSweeney referenced at the end of this chapter. Large-Sample Wilcoxon Test If the sample size (number of matched pairs) exceeds 25, the Wilcoxon table of critical T-values in Appendix N cannot be used. However, it can be shown that for large samples, the distribution of T-values is approximately normal, with a mean and standard deviation given by Equations 17.7 and 17.8, respectively.. Wilcoxon Mean and Standard Deviation n(n 1) 4. (17.7). n(n 1)(2n 1) 24. (17.8). m. s where:. n Number of paired values.

(326) CHAPTER 17. |. Introduction to Nonparametric Statistics. 785. The Wilcoxon test statistic is given by Equation 17.9. Wilcoxon Test Statistic. z. n(n 1) 4 n(n 1)(2n 1) 24 T −. (17.9). Then, the z-value is compared to the critical value from the standard normal table in the usual manner.. MyStatLab. 17-2: Exercises Skill Development 17-12. For each of the following tests, determine which of the two U statistics (U1 or U2) you would choose, the appropriate test statistic, and the rejection region for the Mann–Whitney test: a. HA: ~ m1 ~ m2 , a 0.05, n1 5, and n2 10 H : m1 ~ m2 , a 0.05, n1 15, and n2 12 b. A ~ c. HA: ~ m1 ≠ ~ m 2 , a 0.10, n1 12, and n2 17 ~ m2 , a 0.05, n1 22, and n2 25 d. HA: m 1 ~ m1 ≠ ~ m2 , a 0.10, n1 = 44, and n2 = 15 e. HA: ~ 17-13. The following sample data have been collected from two independent samples from two populations. Test the claim that the second population median will exceed the median of the first population. Sample 1. Sample 2. 12 21 15 10. 9 18 16 17. 11 14 12 8. 20 7 12 19. a. State the appropriate null and alternative hypotheses. b. If you are unwilling to assume that the two populations are normally distributed, based on the sample data, what should you conclude about the null hypothesis? Test using a 0.05. 17-14. The following sample data have been collected from independent samples from two populations. The claim is that the first population median will be larger than the median of the second population. Sample 1. Sample 2. 4.4 2.7 1.0 3.5 2.8. 3.7 3.5 4.0 4.9 3.1. 2.6 2.4 2.0 2.8. 4.2 5.2 4.4 4.3. a. State the appropriate null and alternative hypotheses. b. Using the Mann–Whitney U-test, based on the sample data, what should you conclude about the null hypothesis? Test using a 0.05. 17-15. The following sample data have been collected from two independent random samples from two populations. Test the claim that the first population median will exceed the median of the second population.. Sample 1. Sample 2. 50 47 44 48 40 36. 38 44 38 37 43 44. 43 46 72 40 55 38. 31 38 39 54 41 40. a. State the appropriate null and alternative hypotheses. b. Using the Mann–Whitney U-test, based on the sample data, what should you conclude about the null hypothesis? Test using a significance level of 0.01. 17-16. Determine the rejection region for the Mann–Whitney U-test in each of the following cases: m1 ~ m2 , 0.05, n1 3, and n2 15 a. HA: ~ m1 ≠ ~ m2 , 0.10, n1 5, and n2 20 b. HA: ~ m1 ~ m2 , 0.025, n1 9, and n2 12 c. HA: ~ ~ d. HA: 1 ≠ ~ 2 , 0.10, n1 124, and n2 25 17-17. The following sample data have been collected from independent samples from two populations. Do the populations have different medians? Test at a significance level of 0.05. Use the Mann–Whitney U-test..

(327) 786. CHAPTER 17. |. Introduction to Nonparametric Statistics. Sample 1 550 483 379 438 398 582 528 502 352 488 400 451 571 382 588 465 492 384 563 506. 489 480 433 436 540 415 532 412 572 579 556 383 515 501 353 369 475 470 595 361. Sample 2 594 542 447 466 560 447 526 446 573 542 473 418 511 510 577 585 436 461 545 441. 538 505 486 425 497 511 576 558 500 467 556 383 515 501 353. 17-18. For each of the following tests, determine which of the two sums of absolute ranks (negative or positive) you would choose, the appropriate test statistic, and the rejection region for the Wilcoxon matched-pairs signed rank test: a. HA: ~ m1 ~ m2 , a 0.025, n 15 ~ 2 , a 0.01, n 12 b. HA: 1 ~ 1 ≠ ~ 2 , a 0.05, n 9 c. HA: ~ 1 ~ 2 , a 0.05, n 26 d. HA: ~ 1 ≠ ~ 2 , a 0.10, n 44 e. HA: ~ 17-19. You are given two paired samples with the following information:. Item. Sample 1. Sample 2. 1 2 3 4 5 6 7 8. 3.4 2.5 7.0 5.9 4.0 5.0 6.2 5.3. 2.8 3.0 5.5 6.7 3.5 5.0 7.5 4.2. a. Based on these paired samples, test at the a 0.05 level whether the true median paired difference is 0. b. Answer part a assuming data given here were sampled from normal distributions with equal variances.. 17-20. You are given two paired samples with the following information: Item. Sample 1. Sample 2. 1 2 3 4 5 6 7 8 9 10. 19.6 22.1 19.5 20.0 21.5 20.2 17.9 23.0 12.5 19.0. 21.3 17.4 19.0 21.2 20.1 23.5 18.9 22.4 14.3 17.8. Based on these paired samples, test at the a 0.05 level whether the true median paired difference is 0. 17-21. You are given two paired samples with the following information: Item. Sample 1. Sample 2. 1 2 3 4 5 6 7. 1,004 1,245 1,360 1,150 1,300 1,450 900. 1,045 1,145 1,400 1,000 1,350 1,350 1,140. Based on these paired samples, test at the a 0.05 level whether the true median paired difference is 0. 17-22. From a recent study we have collected the following data from two independent random samples: Sample 1. Sample 2. 405 450 290 370 345 460 425 275 380 330 500 215. 300 340 400 250 270 410 435 390 225 210 395 315. Suppose we do not wish to assume normal distributions. Use the appropriate nonparametric test to determine whether the populations have equal medians. Test at a 0.05..

(328) CHAPTER 17. 17-23. You are given two paired samples with the following information: Item. Sample 1. Sample 2. 1 2 3 4 5 6. 234 221 196 245 234 204. 245 224 194 267 230 198. Based on these paired samples, test at the a 0.05 level whether the true median paired difference is 0. 17-24. Consider the following data for two paired samples: Case #. Sample 1. Sample 2. 1 2 3 4 5 6 7 8. 258 197 400 350 237 400 370 130. 304 190 500 340 250 358 390 100. a. Test the following null and alternative hypotheses at an a 0.05 level: H0: There is no difference between the two population distributions. HA: There is a difference between the two populations. b. Answer part a as if the samples were independent samples from normal distributions with equal variances.. Business Applications 17-25. National Reading Academy claims that graduates of its program have a higher median reading speed per minute than people who do not take the course. An independent agency conducted a study to determine whether this claim was justified. Researchers from the agency selected a random sample of people who had taken the speed reading course and another random sample of people who had not taken the course. The agency was unwilling to make the assumption that the populations were normally distributed. Therefore, a nonparametric test was needed. The following summary data were observed: With Course. Without Course. n7. n5. Sum of ranks 42. Sum of ranks 36. |. Introduction to Nonparametric Statistics. 787. Assuming that higher ranks imply more words per minute being read, what should the testing agency conclude based on the sample data? Test at an a 0.05 level. 17-26. The makers of the Plus 20 Hardcard, a plug-in hard disk unit on a PC board, have recently done a marketing research study in which they asked two independently selected groups to rate the Hardcard on a scale of 1 to 100, with 100 being perfect satisfaction. The first group consisted of professional computer programmers. The second group consisted of home computer users. The company hoped to be able to say that the product would receive the same median ranking from each group. The following summary data were recorded: Professionals. Home Users. n 10. n8. Sum of ranks 92. Sum of ranks 79. Based on these data, what should the company conclude? Test at the a 0.02 level. 17-27. Property taxes are based on assessed values of property. In most states, the law requires that assessed values be “at or near” market value of the property. In one Washington county, a tax protest group has claimed that assessed values are higher than market values. To address this claim, the county tax assessor, together with representatives from the protest group, has selected 15 properties at random that have sold within the past six months. Both parties agree that the sales price was the market value at the time of the sale. The assessor then listed the assessed values and the sales values side by side, as shown. House. Assessed Value ($). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15. 302,000 176,000 149,000 198,500 214,000 235,000 305,000 187,500 150,000 223,000 178,500 245,000 167,000 219,000 334,000. Market Value ($) 198,000 182,400 154,300 198,500 218,000 230,000 298,900 190,000 149,800 222,000 180,000 250,900 165,200 220,700 320,000. a. Assuming that the population of assessed values and the population of market values have the same distribution shape and that they may differ only with respect to.

(329) 788. CHAPTER 17. |. Introduction to Nonparametric Statistics. medians, state the appropriate null and alternative hypotheses. b. Test the hypotheses using an a 0.01 level. c. Discuss why one would not assume that the samples were obtained from normal distributions for this problem. What characteristic about the market values of houses would lead you to conclude that these data were not normally distributed? 17-28. The Kansas Tax Commission recently conducted a study to determine whether there is a difference in median deductions taken for charitable contributions depending on whether a tax return is filed as a single or a joint return. A random sample from each category was selected, with the following results: Single. Joint. n6. n8. Sum of ranks 43. Sum of ranks 62. If higher scores are better, use the Wilcoxon matchedpairs signed rank test to test whether this tape program produces quick improvement in reading ability. Use an a 0.025. 17-31. The Montgomery Athletic Shoe Company has developed a new shoe-sole material it thinks provides superior wear compared with the old material the company has been using for its running shoes. The company selected 10 cross-country runners and supplied each runner with a pair of shoes. Each pair had one sole made of the old material and the other made of the new material. The shoes were monitored until the soles wore out. The following lifetimes (in hours) were recorded for each material: Runner. Based on these data, what should the tax commission conclude? Use an a 0.05 level. 17-29. A cattle feedlot operator has collected data for 40 matched pairs of cattle showing weight gain on two different feed supplements. His purpose in collecting the data is to determine whether there is a difference in the median weight gain for the two supplements. He has no preconceived idea about which supplement might produce higher weight gain. He wishes to test using an a 0.05 level. Assuming that the T-value for these data is 480, what should be concluded concerning which supplement might produce higher weight gain? Use the largesample Wilcoxon matched-pairs signed rank test normal approximation. Conduct the test using a p-value approach. 17-30. Radio advertisements have been stressing the virtues of an audiotape program to help children learn to read. To test whether this tape program can cause a quick improvement in reading ability, 10 children were given a nationally recognized reading test that measures reading ability. The same 10 children were then given the tapes to listen to for 4 hours spaced over a 2-day period. The children then were tested again. The test scores were as follows: Child. Before. After. 1 2 3 4 5 6 7 8 9 10. 60 40 78 53 67 88 77 60 64 75. 63 38 77 50 74 96 80 70 65 75. 1 2 3 4 5 6 7 8 9 10. Old Material 45.5 50.0 43.0 45.5 58.5 49.0 29.5 52.0 48.0 57.5. New Material 47.0 51.0 42.0 46.0 58.0 50.5 39.0 53.0 48.0 61.0. a. If the populations from which these samples were taken could be considered to have normal distributions, determine if the soles made of the new material have a longer mean lifetime than those made from the old material. Use a significance level of 0.025. b. Suppose you were not willing to consider that the populations have normal distributions. Make the determination requested in part a. c. Given only the information in this problem, which of the two procedures indicated in parts a and b would you choose to use? Give reasons for your answer.. Computer Database Exercises 17-32. For at least the past 20 years, there has been a debate over whether children who are placed in child-care facilities while their parents work suffer as a result. A recent study of 6,000 children discussed in the March 1999 issue of Developmental Psychology found “no permanent negative effects caused by their mothers’ absence.” In fact, the study indicated that there might be some positive benefits from the day-care experience. To investigate this premise, a nonprofit organization called Child Care Connections conducted a small study in which children were observed playing in neutral settings (not at home or at a day-care center). Over a period of 20 hours of observation, 15 children who did not go to day care and 21 children who had spent much time in day care were observed. The.

(330) CHAPTER 17. variable of interest was the total minutes of play in which each child was actively interacting with other students. Child Care Connections leaders hoped to show that the children who had been in day care would have a higher median time in interactive situations than the stay-at-home children. The file Children contains the results of the study. a. Conduct a hypothesis test to determine if the hopes of the Child Care Connections leaders can be substantiated. Use a significance level of 0.05, and write a short statement that describes the results of the test. b. Based on the outcome of the hypothesis test, which statistical error might have been committed? 17-33. The California State Highway Patrol recently conducted a study on a stretch of interstate highway south of San Francisco to determine whether the mean speed for California vehicles exceeded the mean speed for Nevada vehicles. A total of 140 California cars were included in the study, and 75 Nevada cars were included. Radar was used to measure the speed. The file Speed-Test contains the data collected by the California Highway Patrol. a. Past studies have indicated that the speeds at which both Nevada and California drivers drive have normal distributions. Using a significance level equal to 0.10, obtain the results desired by the California Highway Patrol. Use a p-value approach to conduct the relevant hypothesis test. Discuss the results of this test in a short written statement. b. Describe, in the context of this problem, what a Type I error would be. 17-34. The Sunbeam Corporation makes a wide variety of appliances for the home. One product is a digital blood pressure gauge. For obvious reasons, the blood pressure readings made by the monitor need to be accurate. When a new model is being designed, one of the steps is to test it. To do this, a sample of people is selected. Each person has his or her systolic blood pressure taken by a highly respected physician. They then immediately have their systolic blood pressure. |. Introduction to Nonparametric Statistics. 789. taken using the Sunbeam monitor. If the mean blood pressure is the same for the monitor as it is as determined by the physician, the monitor is determined to pass the test. In a recent test, 15 people were randomly selected to be in the sample. The blood pressure readings for these people using both methods are contained in the file Sunbeam. a. Based on the sample data and a significance level equal to 0.05, what conclusion should the Sunbeam engineers reach regarding the latest blood pressure monitor? Discuss your answer in a short written statement. b. Conduct the test as a paired t-test. c. Discuss which of the two procedures in parts a and b is more appropriate to analyze the data presented in this problem. 17-35. The Hersh Corporation is considering two wordprocessing systems for its computers. One factor that will influence its decision is the ease of use in preparing a business report. Consequently, nine typists were selected from the clerical pool and asked to type a typical report using both word-processing systems. The typists then rated the systems on a scale of 0 to 100. The resulting ratings are in the file Hersh. a. Which measurement level describes the data collected for this analysis? b. (1) Could a normal distribution describe the population distribution from which these data were sampled? (2) Which measure of central tendency would be appropriate to describe the center of the populations from which these data were sampled? c. Choose the appropriate hypothesis procedure to determine if there is a difference in the measures of central tendency you selected in part b between these two word-processing systems. Use a significance level of 0.01. d. Which word-processing system would you recommend the Hersh Corporation adopt? Support your answer with statistical reasoning. END EXERCISES 17-2. Chapter Outcome 4.. 17.3 Kruskal–Wallis One-Way Analysis. of Variance Section 17.2 showed that the Mann–Whitney U-test is a useful nonparametric procedure for determining whether two independent samples are from populations with the same median. However, as discussed in Chapter 12, many decisions involve comparing more than two populations. Chapter 12 introduced one-way analysis of variance and showed how, if the assumptions of normally distributed populations with equal variances are satisfied, the F-distribution can be used to test the hypothesis of equal population means. However, what if decision makers are not willing to assume normally distributed populations? In that case, they can turn to a.

(331) 790. CHAPTER 17. |. Introduction to Nonparametric Statistics. nonparametric procedure to compare the populations. Kruskal–Wallis One-Way Analysis of Variance is the nonparametric counterpart to the one-way ANOVA procedure. It is applicable any time the variables in question satisfy the following conditions:. Assumptions. 1. 2. 3. 4.. They have a continuous distribution. The data are at least ordinal. The samples are independent. The samples come from populations whose only possible difference is that at least one may have a different central location than the others.. BUSINESS APPLICATION. USING THE KRUSKAL-WALLIS ONE-WAY ANOVA TEST. WESTERN STATES OIL AND GAS Western States Oil and Gas is considering outsourcing its information systems activities, including demand-supply analysis, general accounting, and billing. On the basis of cost and performance standards, the company’s information systems manager has reduced the possible suppliers to three, each using different computer systems. One critical factor in the decision is downtime (the time when the system is not operational). When the system goes down, the online applications stop and normal activities are interrupted. The information systems manager received from each supplier a list of firms using its service. From these lists, the manager selected random samples of nine users of each service. In a telephone interview, she found the number of hours of downtime in the previous month for each service. At issue is whether the three computer downtime populations have the same or different centers. If the manager is unwilling to make the assumptions of normality and equal variances required for the one-way ANOVA technique introduced in Chapter 12 she can implement the Kruskal–Wallis nonparametric test using the following steps. Step 1 Specify the appropriate null and alternative hypotheses to be tested. In this application the information systems manager is interested in determining whether a difference exists between median downtime for the three systems. Thus, the null and alternative hypotheses are H 0: ˜ A ˜ B ˜C HA: Not all population medianns are equal Step 2 Specify the desired level of significance for the test. The test will be conducted using a significance level equal to a 0.10 Step 3 Collect the sample data and compute the test statistic. The data represent a random sample of downtimes from each service. The samples are independent. To use the Kruskal–Wallis ANOVA, first replace each downtime measurement by its relative ranking within all groups combined. The smallest downtime is given a rank of 1, the next smallest a rank of 2, and so forth, until all downtimes for the three services have been replaced by their relative rankings. Table 17.5 shows the sample data and the rankings for the 27 observations. Notice that the rankings are summed for each service. The Kruskal–Wallis test will determine whether these sums are so different that it is not likely that they came from populations with equal medians. If the samples actually do come from populations with equal medians (that is, the three services have the same per-month median downtime), then the H statistic, calculated by Equation 17.10, will be approximately distributed as a chi-square variable with k 1 degrees of freedom, where k equals the number of populations (systems in this application) under study..

(332) CHAPTER 17. TABLE 17.5. |. |. 791. Introduction to Nonparametric Statistics. Sample Data and Rankings of System Downtimes for the Western States Gas and Oil Example. Service A. Service B. Service C. Data. Ranking. Data. Ranking. Data. Ranking. 4.0. 11. 6.9. 19. 0.5. 1. 3.7. 10. 11.3. 23. 1.4. 4. 5.1. 15. 21.7. 27. 1.0. 2. 2.0. 6. 9.2. 20. 1.7. 5. 4.6. 12. 6.5. 17. 3.6. 9. 9.3. 21. 4.9. 14. 5.2. 16. 2.7. 8. 12.2. 25. 1.3. 3. 2.5. 7. 11.7. 24. 6.8. 18. 4.8 13 Sum of ranks 103. 10.5 22 Sum of ranks 191. 14.1 26 Sum of ranks 84. H Statistic H. k. Ri2. ∑ ni. 12 N ( N 1). − 3( N 1). (17.10). i1. where: N Sum of the sample sizes from all populations k Number of populations Ri Sum of ranks in the sample from the ith population ni Size of the sample from the ith population. Using Equation 17.10, the H statistic is H . 12 N ( N 1). k. Ri2. ∑ ni. − 3( N 1). i1. ⎡ 1032 12 1912 84 2 ⎤ ⎥ − 3(27 1) 11.50 ⎢ 27(27 1) ⎣ 9 9 9 ⎦. Step 4 Determine the critical value from the chi-square distribution. If H is larger than x2 from the chi-square distribution with k 1 degrees of freedom in Appendix G, the hypothesis of equal medians should be rejected. The critical value for a 0.10 and k1312 degrees of freedom is. 2 0.10 4.6052. Step 5 Reach a decision. Since H 11.50 4.6052, reject the null hypothesis based on these sample data. Step 6 Draw a conclusion. The Kruskal–Wallis one-way ANOVA shows the information systems manager should conclude, based on the sample data, that the three services do not have equal median downtimes. From this analysis, the supplier with system B would most likely be eliminated from consideration unless other factors such as price or service offset the apparent longer downtimes..

(333) 792. CHAPTER 17. |. Introduction to Nonparametric Statistics. Excel and Minitab. EXAMPLE 17-3. USING KRUSKAL–WALLIS ONE-WAY ANOVA. Amalgamated Sugar Amalgamated Sugar has recently tutorials. Excel and Minitab Tutorial. begun a new effort called total productive maintenance (TPM). The TPM concept is to increase the overall operating effectiveness of the company’s equipment. One component of the TPM process attempts to reduce unplanned machine downtime. The first step is to gain an understanding of the current downtime situation. To do this, a sample of 20 days has been collected for each of the three shifts (day, swing, and graveyard). The variable of interest is the minutes of unplanned downtime per shift per day. The minutes are tabulated by summing the downtime minutes for all equipment in the plant. The Kruskal–Wallis test can be performed using the following steps: Step 1 State the appropriate null and alternative hypotheses. The Kruskal–Wallis one-way ANOVA procedure can test whether the medians are equal, as follows:. H 0 : % 1 % 2 % 3 HA : Not all population medians are equal Step 2 Specify the desired significance level for the test. The test will be conducted using an a 0.05 Step 3 Collect the sample data and compute the test statistic. The sample data are in the Amalgamated file. Both Minitab and Excel (using the PHStat add-ins) can be used to perform Kruskal–Wallis nonparametric ANOVA tests.4 Figures 17.3a and 17.3b illustrate the Excel and Minitab outputs for these sample data. The calculated H statistic is H 0.1859 FIGURE 17.3A. |. Excel 2007 (PHStat) Kruskal–Wallis ANOVA Output for Amalgamated Sugar. Test statistic p-value Conclusion. Excel 2007 Instructions: 1. Open File: Amalgamated.xls. 5. Specify significance level (0.05). 2. Select Add-Ins. 6. Define data range including labels. 3. Select PHStat. 7. Click OK. 4. Select Multiple Sample Test Kruskal–Wallis Rank Test.. 4In Minitab, the variable of interest must be in one column. A second column contains the population identifier. In Excel, the data are placed in separate columns by population. The column headings identify the population..

(334) CHAPTER 17. FIGURE 17.3B. |. Introduction to Nonparametric Statistics. 793. |. Minitab Kruskal–Wallis ANOVA Output for Amalgamated Sugar. p-value Test statistic Conclusion: Do not reject H0.. Minitab Instructions: 1. Open file: Amalgamated.MTW. 2. Choose Data Stack Columns. 3. In Stack the following columns, enter data columns. 4. In Store the stacked data in select Column of current worksheet, and enter column name: Downtime.. 5. In Store subscripts in, enter column name: Shifts. Click OK. 6. Choose Stat Nonparametrics Kruskal–Wallis. 7. In Response, enter data column: Downtime. 8. In Factor, enter factor levels column: Shifts. 9. Click OK.. Step 4 Determine the critical value from the chi-square distribution. The critical value for a 0.05 and k 1 2 degrees of freedom is 20.05 = 5.9915 Step 5 Reach a decision. Because H 0.1859 5.9915 we do not reject the null hypothesis. Both the PHStat and Minitab output provide the p-value associated with the H statistic. The p-value of 0.9112 far exceeds an alpha of 0.05. Step 6 Draw a conclusion. Based on the sample data, the three shifts do not appear to differ with respect to median equipment downtime. The company can now begin to work on steps that will reduce the downtime across the three shifts. >>END EXAMPLE. TRY PROBLEM 17-37 (pg. 795). Limitations and Other Considerations The Kruskal–Wallis one-way ANOVA does not require the assumption of normality and is, therefore, often used instead of the ANOVA technique discussed in Chapter 12. However, the Kruskal–Wallis test as discussed here applies only if the sample size from each population is at least 5, the samples are independently selected, and each population has the same distribution except for a possible difference in central location..

(335) 794. CHAPTER 17. |. Introduction to Nonparametric Statistics. When ranking observations, you will sometimes encounter ties. When ties occur, each observation is given the average rank for which it is tied. The H statistic is influenced by ties and should be corrected by dividing Equation 17.10 by Equation 17.11. Correction for Tied Rankings—Kruskal–Wallis Test g. 1−. ∑ (ti3 − ti ) i1. (17.11). N3 − N. where: g Number of different groups of ties ti Number of tied observations in the ith tied group of scores N Total number of observations The correct formula for calculating the Kruskal–Wallis H statistic when ties are present is Equation 17.12. Correcting for ties increases H. This makes rejecting the null hypothesis more likely than if the correction is not used. A rule of thumb is that if no more than 25% of the observations are involved in ties, the correction factor is not required. Note that if you use Minitab to perform the Kruskal–Wallis test, the adjusted H statistic is provided. The PHStat add-in to Excel for performing the Kruskal–Wallis test does not provide the adjusted H statistic. However, the adjustment is only necessary when the null hypothesis is not rejected and the H statistic is “close” to the rejection region. In that case, making the proper adjustment could lead to rejecting the null hypothesis. H Statistic Corrected for Tied Rankings. H. 12 N ( N 1). 1−. k. Ri2. ∑ ni. − 3( N 1). t1 g. (17.12). ∑ (ti3 − ti ) i1. N3 − N. MyStatLab. 17-3: Exercises Skill Development 17-36. Given the following sample data: Group 1. Group 2. Group 3. 21 25 36 35 33 23 31 32. 17 15 34 22 16 19 30 20. 29 38 28 27 14 26 39 36. a. State the appropriate null and alternative hypotheses to test whether there is a difference in the medians of the three populations. b. Based on the sample data and a significance level of 0.05, what conclusion should be reached about the medians of the three populations if you are not willing to make the assumption that the populations are normally distributed? c. Test the hypothesis stated in part a, assuming that the populations are normally distributed with equal variances. d. Which of the procedures described in parts b and c would you select to analyze the data? Explain your reasoning..

(336) CHAPTER 17. Group 2. 10 9 11 12 13 12. 8 6 8 9 10 10. Group 3 13 12 12 11 13 15. a. State the appropriate null and alternative hypotheses for determining whether a difference exists in the median value for the three populations. b. Based on the sample data, use the Kruskal–Wallis ANOVA procedure to test the null hypothesis using a 0.05. What conclusion should be reached? 17-38. Given the following data: Group 1. Group 2. Group 3. Group 4. 20 27 26 22 25 30 23. 28 26 21 29 30 25. 17 15 18 20 14. 21 23 19 17 20. a. State the appropriate null and alternative hypotheses for determining whether a difference exists in the median value for the four populations. b. Based on the sample data, use the Kruskal–Wallis one-way ANOVA procedure to test the null hypothesis. What conclusion should be reached using a significance level of 0.10? Discuss. c. Determine the H-value adjusted for ties. d. Given the results in part b, is it necessary to use the H-value adjusted for ties? If it is, conduct the hypothesis test using this adjusted value of H. If not, explain why not. 17-39. A study was conducted in which samples were selected independently from four populations. The sample size from each population was 20. The data were converted to ranks. The sum of the ranks for the data from each sample is as follows:. Sum of ranks. Introduction to Nonparametric Statistics. 795. Business Applications. 17-37. Given the following sample data: Group 1. |. Sample 1. Sample 2. Sample 3. Sample 4. 640. 780. 460. 1,360. a. State the appropriate null and alternative hypotheses if we wish to determine whether the populations have equal medians. b. Use the information in this exercise to perform a Kruskal–Wallis one-way ANOVA.. 17-40. The American Beef Growers Association is trying to promote the consumption of beef products. The organization performs numerous studies, the results of which are often used in advertising campaigns. One such study involved a quality perception test. Three grades of beef were involved: choice, standard, and economy. A random sample of people was provided pieces of choice-grade beefsteak and was asked to rate its quality on a scale of 1 to 100. A second sample of people was given pieces of standard-grade beefsteak, and a third sample was given pieces of economy-grade beefsteak, with instructions to rate the beef on the 100-point scale. The following data were obtained: Choice. Standard. Economy. 78 87 90 87 89 90. 67 80 78 80 67 70. 65 62 70 66 70 73. a. What measurement level do these data possess? Would it be appropriate to assume that such data could be obtained from a normal distribution? Explain your answers. b. Based on the sample data, what conclusions should be reached concerning the median quality perception scores for the three grades of beef? Test using an a 0.01. 17-41. A study was conducted by the sports department of a national network television station in which the objective was to determine whether a difference exists between median annual salaries of National Basketball Association (NBA) players, National Football League (NFL) players, and Major League Baseball (MLB) players. The analyst in charge of the study believes that the normal distribution assumption is violated in this study. Thus, she thinks that a nonparametric test is in order. The following summary data have been collected: NBA. NFL. MLB. n 20. n 30. n 40. ∑Ri 1,655. ∑ Ri 1,100. ∑ Ri 1,340. a. Why would the sports department address the median as the parameter of interest in this analysis, as opposed to the mean? Explain your answer. b. What characteristics of the salaries of professional athletes suggest that such data are not normally distributed? Explain. c. Based on these data, what can be concluded about the median salaries for the three sports? Test at an a 0.05. Assume no ties..

(337) 796. CHAPTER 17. |. Introduction to Nonparametric Statistics. 17-42. Referring to Exercise 17-41, suppose that there were 40 ties at eight different salary levels. The following shows how many scores were ties at each salary level: Level. t. 1 2 3 4 5 6 7. 2 3 2 4 8 10 6 5. a. Given the results in the previous exercise, is it necessary to use the H-value adjusted for ties? b. If your answer to part a is yes, conduct the test of hypothesis using this adjusted value of H. If it is not, explain why not. 17-43. Suppose as part of your job you are responsible for installing emergency lighting in a series of state office buildings. Bids have been received from four manufacturers of battery-operated emergency lights. The costs are about equal, so the decision will be based on the length of time the lights last before failing. A sample of four lights from each manufacturer has been tested, with the following values (time in hours) recorded for each manufacturer: Type A. Type B. Type C. Type D. 1,024 1,121 1,250 1,022. 1,270 1,325 1,426 1,322. 1,121 1,201 1,190 1,122. 923 983 1,087 1,121. Using a 0.01, what conclusion for the four manufacturers should you reach about the median length of time the lights last before failing? Explain.. Computer Database Exercises 17-44. As purchasing agent for the Horner-Williams Company, you have primary responsibility for securing high-quality raw materials at the best possible prices. One particular material the Horner-Williams Company uses a great deal of is aluminum. After careful study, you have been able to reduce the prospective vendors to three. It is unclear whether these three vendors produce aluminum that is equally durable. To compare durability, the recommended procedure is to put pressure on aluminum until it cracks. The vendor whose aluminum requires the highest median pressure will be judged to provide the most durable product. To carry out this test, 14 pieces from each vendor have been selected. These data are in the file Horner-Williams. (The data are pounds per square inch pressure.) Using a 0.05, what should the company conclude about whether there is a difference in the median strength of the three vendors’ aluminum? 17-45. A large metropolitan police force is considering changing from full-size to mid-size cars. The police force sampled cars from each of three manufacturers. The number sampled represents the number that the manufacturer was able to provide for the test. Each car was driven for 5,000 miles, and the operating cost per mile was computed. The operating costs, in cents per mile, for the 12 cars are provided in the file Police. Perform the appropriate ANOVA test on these data. Assume a significance level of 0.05. State the appropriate null and alternative hypotheses. Do the experimental data provide evidence that the median operating costs per mile for the three types of police cars are different? 17-46. A nationwide moving company is considering five different types of nylon tie-down straps. The purchasing department randomly selected straps from each company and determined their breaking strengths in pounds. The sample data are contained in the file Nylon. Based on your analysis, with a Type I error rate of 0.05, can you conclude that a difference exists among the median breaking strengths of the types of nylon ropes? END EXERCISES 17-3.

(338) CHAPTER 17. |. Introduction to Nonparametric Statistics. Visual Summary Chapter 17: Previous chapters introduced a wide variety of commonly used statistical techniques all of which rely on underlying assumptions about the data used. The t-distribution assumes the population from which the sample is selected is normally distributed. Analysis of Variance is based on the assumptions that all populations are normally distributed and have equal variances. In addition, each of these techniques requires the data measurement level for the variables of interest to be either interval or ratio level. In decision-making situations, you will encounter situations in which either the level of data measurement is too low or the distribution assumptions are clearly violated. To handle such cases as these, a class of statistical tools called nonparametric statistics has been developed. While many different nonparametric statistics tests exist, this chapter introduces some of the more commonly used: The Wilcoxon Signed Rank Test for one population median, two nonparametric tests for two population medians, The Mann-Whitney U Test and the Wilcoxon Matched Pairs Signed Rank Test and finally the Kruskal-Wallis One Way Analysis of Variance test.. 17.1 The Wilcoxon Signed Rank Test for One Population Median (pg. 771–776) Summary In chapter 9 we introduced the t-test which is used to test whether a population mean has a specified value. However the t-test is not appropriate if the data is ordinal or when populations are not believed to be approximately normally distributed. In these cases, the Wilcoxon signed rank test can be used. This test makes no highly restrictive assumption about the shape of the population distribution. The Wilcoxon test is used to test hypotheses about a population median rather than a population mean. The logic of the Wilcoxon test is because the median is the midpoint in a population, we would expect approximately half the data values in a random sample to lie below the hypothesized median and about half to lie above it. The hypothesized median will be rejected if the actual data distribution shows too large a departure from a 50-50 split.. Outcome 1. Recognize when and how to use the Wilcoxon signed rank test for a population median.. 17.2 Nonparametric Tests for Two Population Medians (pg. 776–789) Summary Chapter 10 discussed testing the difference between two population means using the student t-distribution. Again, the t-distribution assumes the two populations are normally distributed and the data are restricted to being interval or ratio level. Although in many situations these assumptions and the data requirements will be satisfied, you will often encounter situations where they are not. This section introduces two nonparametric techniques that do not require the distribution and data level assumptions of the t-test: the Mann–Whitney U-test and the Wilcoxon matched-pairs signed rank test. Both tests can be used with ordinal (ranked) data, and neither requires that the populations be normally distributed. The Mann–Whitney U-test is used when the samples are independent, whereas the Wilcoxon matched-pairs signed rank test is used when the design has paired samples. Outcome 2. Recognize the situations for which the Mann–Whitney U-test for the difference between two population medians applies and be able to use it in a decision-making context. Outcome 3. Know when to apply the Wilcoxon matched-pairs signed rank test for related samples.. Conclusion 17.3 Kruskal-Wallis One-Way Analysis of Variance (pg. 789–796) Summary Decision makers are often faced with deciding between three or more alternatives. Chapter 12 introduced one-way analysis of variance and showed how, if the assumptions of normally distributed populations with equal variances are satisfied, the F-distribution can be used to test the hypothesis of equal population means. If the assumption of normally distributed populations can not be made, the Kruskal–Wallis One-Way Analysis of Variance is the nonparametric counterpart to the one-way ANOVA procedure presented in Chapter 12. However, it has its own set of assumptions: 1. 2. 3. 4.. The distributions are continuous. The data are at least ordinal. The samples are independent. The samples come from populations whose only possible difference is that at least one may have a different central location than the others.. Outcome 4. Perform nonparametric analysis of variance using the Kruskal–Wallis one-way ANOVA. Many statistical techniques discussed in this book are based on the assumptions the data being analyzed are interval or ratio and the underlying populations are normal. If these assumptions come close to being satisfied, many of the tools discussed before this chapter apply and are useful. However, in many practical situations these assumptions just do not apply. In such cases nonparametric statistical tests may be appropriate. While this chapter introduced some common nonparametric tests, many other nonparametric statistical techniques have been developed for specific applications. Many are aimed at situations involving small samples. Figure 17.4 may help you determine which nonparametric test to use in different situations.. 797.

(339) 798. CHAPTER 17. FIGURE 17.4. |. Introduction to Nonparametric Statistics. |. Nonparametric Tests Introduced in Chapter 17. Number of Samples. One. Two. Three or Independent More Samples Kruskal– Wallis One-Way ANOVA. Wilcoxon Signed Rank Test Independent Samples. Paired Samples Wilcoxon Signed Rank Test. Mann– Whitney U Test. Other Commonly Used Nonparametric Tests: Friedman Test: randomized block ANOVA Sign Test: test for randomness Runs Test: test for randomness Spearman Rank Correlation: measure of the linear relationship between two variables. Equations (17.1) Large-Sample Wilcoxon Signed Rank Test Statistic pg. 772. n(n 1) 4 n(n 1)(2n 1) 24. W−. z. (17.2) U Statistics pg. 778. z. U−. n1n2. 2 (n1 )(n2 )(n1 n2 1) 12. (17.7) Wilcoxon Mean and Standard Deviation pg. 784. U1 n1n2 . (17.3). (17.6) Mann–Whitney U-Test Statistic pg. 780. U 2 n1n2 . n1 (n11) 2. n2 (n21) 2. . − ∑ R1. − ∑ R2 (17.8). . n(n 1) 4. n(n 1)(2n 1) 24. (17.4) Mean and Standard Deviation for U Statistic pg. 780 (17.9) Wilcoxon Test Statistic pg. 785. . (17.5). . n1n2 2. (n1 )(n2 )(n1 n21) 12. z. n(n 1) 4 n(n 1)(2n 1) 24 T −.

(340) CHAPTER 17. (17.10) H Statistic pg. 791. H. |. Introduction to Nonparametric Statistics. (17.12) H Statistic Corrected for Tied Rankings pg. 794. 12 N ( N 1). k. Ri2. ∑ ni. − 3( N 1). i1. H. 12 N ( N 1). (17.11) Correction for Tied Rankings—Kruskal–Wallis Test pg. 794. 1−. g. 1−. 799. k. R2. ∑ nii. − 3( N 1). i1 g. ∑ (ti3 − ti ) i1. N3 − N. ∑ (ti3 − ti ) i1. N3 − N. Chapter Exercises Conceptual Questions 17-47. Find an organization you think would be interested in data that would violate the measurement scale or known distribution assumptions necessary to use the statistical tools found in Chapters 10–12 (retail stores are a good candidate). Determine to what extent this organization considers these problems and whether it uses any of the techniques discussed in this chapter. 17-48. Discuss the data conditions that would lead you to use the Kruskal–Wallis test as opposed to the ANOVA procedure introduced in Chapter 12. Present an example illustrating these conditions. 17-49. In the library, locate two journal articles that use one of the nonparametric tests discussed in this chapter. Prepare a brief outline of the articles, paying particular attention to the reasons given for using the particular test. 17-50. As an example of how the sampling distribution for the Mann–Whitney Test is derived, consider two samples with sample sizes n1 2 and n2 3. The distribution is obtained under the assumption that the two variables, say x and y, are identically distributed. Under this assumption, each measurement is equally likely to obtain one of the ranks between 1 and n1 n2. a. List all the possible sets of two ranks that could be obtained from five ranks. Calculate the Mann–Whitney U-value for each of these sets of two ranks. b. The number of ways in which we may choose n1 ranks from n1 n2 is given by (n1 n2)! n1! n2!. Calculate this value for n1 2 and n2 3. Now calculate the probability of any one of the possible Mann–Whitney U-values. c. List all the possible Mann–Whitney U-values you obtained in part a. Then using part b, calculate the probability that each of these U-values occurs, thereby producing the sampling distribution for the Mann–Whitney U statistic when n1 2 and n2 3. 17-51. Let us examine how the sampling distribution of the Wilcoxon test statistic is obtained. Consider the. MyStatLab sampling distributions of the positive ranks from a sample size of 4. The ranks to be considered are, therefore, 1, 2, 3, and 4. Under the null hypothesis, the differences to be ranked are distributed symmetrically about zero. Thus, each difference is just as likely to be positively as negatively ranked. a. For a sample size of four, there are 24 16 possible sets of signs associated with the four ranks. List the 16 possible sets of ranks that could be positive— that is, (none), (1), (2) . . . (1, 2, 3, 4). Each of these sets of positive ranks (under the null hypothesis) has the same probability of occurring. b. Calculate the sum of the ranks of each set specified in part a. c. Using parts a and b, produce the sampling distribution for the Wilcoxon test statistic when n 4.. Business Applications 17-52. Students attending West Valley Community College buy their textbooks online from one of two different book sellers because the college does not have a bookstore. The following data represent sample amounts that students spend on books per term:. Company 1 ($) 246 211 235 270 411 310 450 502 311 200. Company 2 ($) 300 305 308 325 340 295 320 330 240 360.

(341) 800. CHAPTER 17. |. Introduction to Nonparametric Statistics. a. Do these data indicate a difference in mean textbook prices for the two companies? Apply the Mann–Whitney U test with a significance level of 0.10. b. Apply the t-test to determine whether the data indicate a difference between the mean amount spent on books at the two companies. Use a significance level of 0.10. Indicate what assumptions must be made to apply the t-test. 17-53. The Hunter Family Corporation owns roadside diners in numerous locations across the country. For the past few months, the company has undertaken a new advertising study. Initially, company executives selected 22 of its retail outlets that were similar with respect to sales volume, profitability, location, climate, economic status of customers, and experience of store management. Each of the outlets was randomly assigned one of two advertising plans promoting a new sandwich product. The accompanying data represent the number of new sandwiches sold during the specific test period at each retail outlet. Hunter executives want you to determine which of the two advertising plans leads to the largest average sales levels for the new product. They are not willing to make the assumptions necessary for you to use the t-test. They do not wish to have an error rate of more than 0.05. Advertising Plan 1 ($) 1,711 1,915 1,905 2,153 1,504 1,195 2,103 1,601 1,580 1,475 1,588. Advertising Plan 2 ($) 2,100 2,210 1,950 3,004 2,725 2,619 2,483 2,520 1,904 1,875 1,943. 17-54. The Miltmore Corporation performs consulting services for companies that think they have image problems. Recently, the Bluedot Beer Company approached Miltmore. Bluedot executives were concerned that the company’s image, relative to its two closest competitors, had diminished. Miltmore conducted an image study in which a random sample of 8 people was asked to rate Bluedot’s image. Five people were asked to rate competitor A’s image, and 10 people were asked to rate competitor B’s image. The image ratings were made on a 100-point scale, with 100 being the best possible rating. Here are the results of the sampling.. Bluedot. Competitor A. Competitor B. 40 60 70 40 55 90 20 20. 95. 50. 53 55 92 90. 80 82 87 93 51 63 72 96 88. a. Based on these sample results, should Bluedot conclude there is an image difference among the three companies? Use a significance level equal to 0.05. b. Should Bluedot infer that its image has been damaged by last year’s federal government recall of its product? Discuss why or why not. c. Why might the decision maker wish to use parametric ANOVA rather than the corresponding nonparametric test? Discuss. 17-55. The Style-Rite Company of Atlanta makes windbreaker jackets for people who play golf and who are active outdoors during the spring and fall months. The company recently developed a new material and is in the process of test-marketing jackets made from the material. As part of this test-marketing effort, 10 people were each supplied with a jacket made from the original material and were asked to wear it for two months, washing it at least twice during that time. A second group of 10 people was each given a jacket made from the new material and asked to wear it for two months with the same washing requirements. Following the two-month trial period, the individuals were asked to rate the jackets on a scale of 0 to 100, with 0 being the worst performance rating and 100 being the best. The ratings for each material are shown as follows:. Original Material 76 34 70 23 45 80 10 46 67 75. New Material 55 90 72 17 56 69 91 95 86 74.

(342) CHAPTER 17. The company expects that, on the average, the performance ratings will be superior for the new material. a. Examine the data given. What characteristics of these data sets would lead you to reject the assumption that the data came from populations that had normal distributions and equal variances? b. Do the sample data support this belief at a significance level of 0.05? Discuss. 17-56. A study was recently conducted by the Bonneville Power Association (BPA) to determine attitudes regarding the association’s policies in western U.S. states. One part of the study asked respondents to rate the performance of the BPA on its responsiveness to environmental issues. The following responses were obtained for a sample of 12 urban residents and 10 rural residents. The ratings are on a 1 to 100 scale, with 100 being perfect.. Urban 76 90 86 60 43 96 50 20 30 82 75 84. |. 801. Introduction to Nonparametric Statistics. Based on the 18 customer balances sampled, is there enough evidence to allow you to conclude the median balance has changed? Test at the 0.05 level of significance. 17-58. During the production of a textbook, there are many steps between when the author begins preparing the manuscript and when the book is finally printed and bound. Tremendous effort is made to minimize the number of errors of any type in the text. One type of error that is especially difficult to eliminate is the typographical error that can creep in when the book is typeset. The Prolythic Type Company does contract work for many publishers. As part of its quality control efforts, it charts the number of corrected errors per page in its manuscripts. In one particularly difficult to typeset book, the following data were observed for a sample of 15 pages (in sequence):. Rural 55 80 94 40 85 92 77 68 35 59. a. Based on the sample data, should the BPA conclude that there is no difference between the urban and rural residents with respect to median environmental rating? Test using a significance level of 0.02. b. Perform the appropriate parametric statistical test and indicate the assumptions necessary to use this test that were not required by the Mann–Whitney tests. Use a significance level of 0.02. (Refer to Chapter 10, if needed.) 17-57. The manager of credit card operations for a small regional bank has determined that last year’s median credit card balance was $1,989.32. A sample of 18 customer balances this year revealed the following figures, in dollars: Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 1,827.85 1,992.75 2,012.35 1,955.64 2,023.19 1,998.52 Sample 7 Sample 8 Sample 9 Sample 10 Sample 11 Sample 12 2,003.75 1,752.55 1,865.32 2,013.13 2,225.35 2,100.35 Sample 13 Sample 14 Sample 15 Sample 16 Sample 17 Sample 18 2,002.02 1,850.37 1,995.35 2,001.18 2,252.54 2,035.75. 1 Page Errors 2. 2. 3. 4. 5. 6. 7. 8. 9 10 11 12 13 14 15. 4. 1. 0. 6. 7. 4. 2. 9. 4. 3. 6. 2. 4. 2. Is there sufficient evidence to conclude the median number of errors per page is greater than 6? 17-59. A Vermont company is monitoring a process that fills maple syrup bottles. When the process is filling correctly, the median average fill in an 8-ounce bottle of syrup is 8.03 ounces. The last 15 bottles sampled revealed the following levels of fill: 7.95 8.04. 8.02. 8.07. 8.06. 8.05. 8.04. 7.97. 8.05. 8.08. 8.11. 7.99. 8.00. 8.02. 8.01. a. Formulate the null and alternative hypotheses needed in this situation. b. Do the sample values support the null or alternative hypothesis?. Computer Database Exercises 17-60. A major car manufacturer is experimenting with three new methods of pollution control. The testing lab must determine whether the three methods produce equal pollution reductions. Readings from a calibrated carbon monoxide meter are taken from groups of engines randomly equipped with one of the three control units. The data are in the file Pollution. Determine whether the three pollution-control methods will produce equal results. 17-61. A business statistics instructor at State University has been experimenting with her testing procedure. This term, she has taken the approach of giving two tests over each section of material. The first test is a problem-oriented exam, in which students have to set up and solve applications problems. The exam is worth 50 points. The second test, given a day later, is a.

(343) 802. CHAPTER 17. |. Introduction to Nonparametric Statistics. multiple-choice test, covering the concepts introduced in the section of the text covered by the exam. This exam is also worth 50 points. In one class of 15 students, the observed test scores over the first section of material in the course are contained in the file State University. a. If the instructor is unwilling to make the assumptions for the paired-sample t-test, what should she conclude based on these data about the distribution of scores for the two tests if she tests at a significance level of 0.05? b. In the context of this problem, define a Type II error. 17-62. Two brands of tires are being tested for tread wear. To control for vehicle and driver variation, one tire of each brand is put on the front wheels of 10 cars. The cars are driven under normal driving conditions for a total of 15,000 miles. The tread wear is then measured using a very sophisticated instrument. The data that were observed are in the file Tread Wear. (Note that the larger the number, the less wear in the tread.) a. What would be the possible objection in this case for employing the paired-sample t-test? Discuss. b. Assuming that the decision makers in this situation are not willing to make the assumptions required to perform the paired-sample t-test, what decision should be reached using the appropriate nonparametric test if a significance level of 0.05 is used? Discuss. 17-63. High Fuel Company markets a gasoline additive for automobiles that it claims will increase a car’s miles per gallon (mpg) performance. In an effort to determine whether High Fuel’s claim is valid, a consumer testing agency randomly selected eight makes of automobiles. Each car’s tank was filled with gasoline and driven around a track until empty. Then the car’s tank was refilled with gasoline and the additive, and the car was driven until the gas. tank was empty again. The miles per gallon were measured for each car with and without the additive. The results are reported in the file High Fuel. The testing agency is unwilling to accept the assumption that the underlying probability distribution is normally distributed, but it would still like to perform a statistical test to determine the validity of High Fuel’s claim. a. What statistical test would you recommend the testing agency use in this case? Why? b. Conduct the test that you believe to be appropriate. Use a significance level of 0.025. c. State your conclusions based on the test you have just conducted. Is High Fuel’s claim supported by the test’s findings? 17-64. A company assembles remote controls for television sets. The company’s design engineers have developed a revised design that they think will make it faster to assemble the controls. To test whether the new design leads to faster assembly, 14 assembly workers were randomly selected and each worker was asked to assemble a control using the current design and then asked to assemble a control using the revised design. The times in seconds to assemble the controls are shown in the file Remote Control. The company’s engineers are unable to assume that the assembly times are normally distributed, but they would like to test whether assembly times are lower using the revised design. a. What statistical test do you recommend the company use? Why? b. State the null and alternative hypotheses of interest to the company. c. At the 0.025 level of significance, is there any evidence to support the engineers’ belief that the revised design reduces assembly time? d. How might the results of the statistical test be used by the company’s management?. Case 17.A Bentford Electronics—Part 2 On Saturday morning, Jennifer Bentford received a call at her home from the production supervisor at Bentford Electronics Plant 1. The supervisor indicated that she and the supervisors from Plants 2, 3, and 4 had agreed that something must be done to improve company morale and, thereby, increase the production output of their plants. Jennifer Bentford, president of Bentford Electronics, agreed to set up a Monday morning meeting with the supervisors to see if they could arrive at a plan for accomplishing these objectives.. By Monday each supervisor had compiled a list of several ideas, including a 4-day work week and interplant competition of various kinds. After listening to the discussion for some time, Jennifer Bentford asked if anyone knew if there was a difference in average daily output for the four plants. When she heard no positive response, she told the supervisors to select a random sample of daily production reports from each plant and test whether there was a difference. They were to meet again on Wednesday afternoon with test results. By Wednesday morning, the supervisors had collected the following data on units produced:.

(344) CHAPTER 17. Plant 1. Plant 2. Plant 3. Plant 4. 4,306 2,852 1,900 4,711 2,933 3,627. 1,853 1,948 2,702 4,110 3,950 2,300. 2,700 2,705 2,721 2,900 2,650 2,480. 1,704 2,320 4,150 3,300 3,200 2,975. The supervisors had little trouble collecting the data, but they were at a loss about how to determine whether there was a. |. Introduction to Nonparametric Statistics. 803. difference in the output of the four plants. Jerry Gibson, the company’s research analyst, told the supervisors that there were statistical procedures that could be used to test hypotheses regarding multiple samples if the daily output was distributed in a bell shape (normal distribution) at each plant. The supervisors expressed dismay because no one thought his or her output was normal. Jerry Gibson indicated that there were techniques that did not require the normality assumption, but he did not know what they were. The meeting with Jennifer Bentford was scheduled to begin in 3 hours, so he needed some statistical-analysis help immediately.. References Berenson, Mark L., and David M. Levine, Basic Business Statistics: Concepts and Applications, 11th ed. (Upper Saddle River, NJ: Prentice Hall, 2009). Conover,W. J., Practical Nonparametric Statistics, 3rd ed. (New York: John Wiley and Sons, 1999). Dunn, O. J., “Multiple Comparisons Using Rank Sums,” Technometrics, 6 (1964), pp. 241–252. Marascuilo, Leonard A., and Maryellen McSweeney, Nonparametric and Distribution-Free Methods for Social Sciences (Pacific Grove, CA: Brooks/Cole, 1977). Microsoft Excel 2007 (Redmond,WA: Microsoft Corp., 2007). Minitab for Windows Version 15 (State College, PA: Minitab, 2007). Noether, G. E., Elements of Nonparametric Statistics (New York: John Wiley & Sons, 1967)..

(345) chapter 18. Chapter 18 Quick Prep Links Modern quality control is based on many of the statistical concepts you have covered up to now, so to adequately understand the material, you need to review many previous topics.. • Review how to construct and interpret line. • Review how to determine the mean and. charts, covered in Chapter 2. • Make sure you are familiar with the steps involved in determining the mean and standard deviation of the binomial and Poisson distributions, covered in Chapter 5.. standard deviation of samples and the meaning of the Central Limit Theorem from Chapter 7. • Finally, become familiar again with how to determine a confidence interval estimate and test a hypothesis of a single population parameter as covered in Chapters 8 and 9.. Introduction to Quality and Statistical Process Control 18.1 Quality Management and Tools for Process Improvement (pg. 805–808). Outcome 1. Use the seven basic tools of quality.. 18.2 Introduction to Statistical Process Control Charts. Outcome 2. Construct and interpret x-charts and R-charts.. (pg. 808–830). Outcome 3. Construct and interpret p-charts. Outcome 4. Construct and interpret c-charts.. Why you need to know Organizations across the United States and around the world have turned to quality management in an effort to meet the competitive challenges of the international marketplace. Although there is no set approach for implementing quality management, a commonality among most organizations is for employees at all levels to be brought into the effort as members of process improvement teams. Successful organizations, such as General Electric and HewlettPackard, realize that thrusting people together in teams and then expecting process improvement to occur will generally lead to disappointment. They know that their employees need to understand how to work together as a team. In many instances, teams are formed to improve a process so that product or service quality is enhanced. However, teamwork and team building must be combined with training and education in the proper tools if employees are to be successful at making lasting process improvements. Over the past several decades, a number of techniques and methods for process improvement have been developed and used by organizations. As a group, these are referred to as the Tools of Quality. Many of these tools are based on statistical procedures and data analysis. One set of quality tools known as statistical process control charts is so prevalent in business today, and is so closely linked to the material in Chapters 5 through 9, that its coverage merits a separate chapter. Today, successful managers must have an appreciation of, and familiarity with, the role of quality in process improvement activities. This chapter is designed to introduce you to the fundamental tools and techniques of quality management and to show you how to construct and interpret statistical process control charts.. 804.

(346) CHAPTER 18. |. Introduction to Quality and Statistical Process Control. 805. 18.1 Quality Management and Tools. for Process Improvement. Total Quality Management A journey to excellence in which everyone in the organization is focused on continuous process improvement directed toward increased customer satisfaction.. Pareto Principle 80% of the problems come from 20% of the causes.. From the end of World War II through the mid-1970s, industry in the United States was kept busy meeting a pent-up demand for its products both at home and abroad. The emphasis in most companies was on getting the “product out the door.” The U.S. effort to produce large quantities of goods and services led to less emphasis on quality. During this same time, Dr. W. Edwards Deming and Dr. Joseph Juran were consulting with Japanese business leaders to help them rebuild their economic base after World War II. Deming, a statistician, and Juran, an engineer, emphasized that quality was the key to being competitive and that quality could be best achieved by improving the processes that produced the products and delivered the services. Employing the process improvement approach was a slow, but effective, method for improving quality, and by the early 1970s Japanese products began to exceed those of the United States in terms of quality. The impact was felt by entire industries, such as the automobile and electronics industries. Whereas Juran focused on quality planning and helping businesses drive costs down by eliminating waste in processes, Deming preached a new management philosophy, which has become known as Total Quality Management, or TQM. There are about as many definitions of TQM as there are companies who have attempted to implement it. In the early 1980s, U.S. business leaders began to realize the competitive importance of providing high-quality products and services, and a quality revolution was under way in the United States. Deming’s 14 points (see Table 18.1) reflected a new philosophy of management that emphasized the importance of leadership. The numbers attached to each point do not indicate an order of importance; rather, the 14 points collectively are seen as necessary steps to becoming a world-class company. Juran’s role in the quality movement was also important. Juran is noted for many contributions to TQM, including his 10 steps to quality improvement, which are outlined in Table 18.2. Note that Juran and Deming differed with respect to the use of goals and targets. Juran is also credited with applying the Pareto principle to quality. Juran urges managers to use the Pareto principle to focus on the vital few sources of problems and to separate the vital few from the trivial many. A form of a bar chart, a Pareto chart is used to display data in a way that helps managers find the most important problem issues. There have been numerous other individuals who have played significant roles in the quality movement. Among these are Philip B. Crosby, who is probably best known for his. TABLE 18.1. |. Deming’s 14 Points. 1. Create a constancy of purpose toward the improvement of products and services in order to become competitive, stay in business, and provide jobs. 2. Adopt the new philosophy. Management must learn that it is in a new economic age and awaken to the challenge, learn its responsibilities, and take on leadership for change. 3. Stop depending on inspection to achieve quality. Build in quality from the start. 4. Stop awarding contracts on the basis of low bids. 5. Improve continuously and forever the system of production and service to improve quality and productivity, and thus constantly reduce costs. 6. Institute training on the job. 7. Institute leadership. The purpose of leadership should be to help people and technology work better. 8. Drive out fear so that everyone may work effectively. 9. Break down barriers between departments so that people can work as a team. 10. Eliminate slogans, exhortations, and targets for the workforce. They create adversarial relationships. 11. Eliminate quotas and management by objectives. Substitute leadership. 12. Remove barriers that rob employees of their pride of workmanship. 13. Institute a vigorous program of education and self-improvement. 14. Make the transformation everyone’s job and put everyone to work on it..

(347) 806. CHAPTER 18. |. Introduction to Quality and Statistical Process Control. TABLE 18.2. |. Juran’s 10 Steps to Quality Improvement. 1. Build awareness of both the need for improvement and the opportunity for improvement. 2. Set goals for improvement. 3. Organize to meet the goals that have been set. 4. Provide training. 5. Implement projects aimed at solving problems. 6. Report progress. 7. Give recognition. 8. Communicate the results. 9. Keep score. 10. Maintain momentum by building improvement into the company’s regular systems.. book Quality Is Free, in which he emphasized that in the long run, the costs of improving quality are more than offset by the reductions in waste, rework, returns, and unsatisfied customers. Kauro Ishikawa is credited with developing and popularizing the application of the fishbone diagram that we will discuss shortly in the section on the Basic 7 Tools of Quality. There is also the work of Masaaki Imai, who popularized the philosophy of kaizen, or people-based continuous improvement. Finally, we must not overlook the contributions of many different managers at companies such as Hewlett-Packard, General Electric, Motorola, Toyota, and Federal Express. These leaders synthesized and applied many different quality ideas and concepts to their organizations in order to create world-class corporations. By sharing their successes with other firms, they have inspired and motivated others to continually seek opportunities in which the tools of quality can be applied to improve business processes. Chapter Outcome 1.. The Tools of Quality for Process Improvement Once U.S. managers realized that their businesses were engaged in a competitive battle with companies around the world, they reacted in many ways. Some managers ignored the challenge and continued to see their market presence erode. Other managers realized that they needed a system or approach for improving their firms’ operations and processes. The Deming Cycle, which is illustrated in Figure 18.1, has been effectively used by many organizations as a guide to their quality improvement efforts. The approach taken by the Deming Cycle is that problems should be identified and solved based on data.. FIGURE 18.1. |. The Deming Cycle Plan. Act. The Deming Cycle. Study. Do.

(348) CHAPTER 18. |. Introduction to Quality and Statistical Process Control. 807. Over time, a collection of tools and techniques known as the Basic 7 Tools has been developed for quality and process improvement. Some of these tools have already been introduced at various points throughout this text. However, we will briefly discuss all Basic 7 Tools in this section. Section 18.2 will explore one of these tools—Statistical Process Control Charts—in greater depth. Process Flowcharts A flowchart is a diagram that illustrates the steps in a process. Flowcharts provide a visualization of the process and are good beginning points in planning a process improvement effort. Brainstorming Brainstorming is a tool that is used to generate ideas from the members of the team. Employees are encouraged to share any idea that comes to mind, and all ideas are listed with no ideas being evaluated until all ideas are posted. Brainstorming can be either structured or unstructured. In structured brainstorming, team members are asked for their ideas, in order, around the table. Members may pass if they have no further ideas. With unstructured brainstorming, members are free to interject ideas at any point. Fishbone Diagram Kauro Ishikawa from Japan is credited with developing the fishbone diagram, which is also called the cause-and-effect diagram or the Ishikawa diagram. The fishbone diagram can be applied as a simple graphical brainstorming tool in which team members are given a problem and several categories of possible causes. They then brainstorm possible causes in any or all of the cause categories. Histograms You were introduced to histograms in Chapter 2 as a method for graphically displaying quantitative data. Recall that histograms are useful for identifying the center, spread, and shape of a distribution of measurements. As a tool of quality, histograms are used to display measurements to determine whether the output of a process is centered on the target value and whether the process is capable of meeting specifications. Trend Charts In Chapter 2 we illustrated the use of a line chart to display time-series data. A trend chart is a line chart that is used to track output from a process over time. Scatter Plots There are many instances in quality improvement efforts in which you will want to examine the relationship between two quantitative variables. A scatter plot is an excellent tool for doing this. You were first introduced to scatter plots in Section 2.3 in Chapter 2. Scatter plots were also used in developing regression models in Chapters 14 and 15. Statistical Process Control Charts One of the most frequently used Basic 7 Tools is the statistical process control (SPC) chart. SPC charts are a special type of trend chart. In addition to the data, the charts display the process average and the upper and lower control limits. These control limits define the range of random variation expected in the output of a process. SPC charts are used to provide early warnings when a process has gone out of control. There are several types of control charts depending on the type of data generated by the process of interest. Section 18.2 presents an introductory discussion of why control charts work and how to develop and interpret some of the most commonly used SPC charts. You will have the opportunity to use the techniques presented in this chapter and throughout this text as you help your organization meet its quality challenges.. MyStatLab. 18-1: Exercises Skill Development 18-1. Discuss the similarities and differences between Dr. Deming’s 14 points and Dr. Juran’s 10 steps to quality improvement.. 18-2. Deming is opposed to setting quotas or specific targets for workers. a. Use the library or the Internet to locate information that explains his reasoning..

(349) 808. CHAPTER 18. |. Introduction to Quality and Statistical Process Control. b. Discuss whether you agree or disagree with him. c. If possible, cite one or more examples based on your own experience to support your position. 18-3. Philip Crosby wrote the book Quality Is Free. In it he argues that it is possible to have high quality and low price. Do you agree? Provide examples that support your position.. Business Applications 18-4. Develop a process flowchart that describes the registration process at your college or university. 18-5. Generate a process flowchart showing the steps you have to go through to buy a concert ticket at your school. 18-6. Assume that you are a member of a team at your university charged with the task of improving student, faculty, and staff parking. Use brainstorming to generate a list of problems with the current parking plan. After you have generated the list, prioritize the list of problems based on their importance to you.. 18-7. Refer to Exercise 18-6. a. Use brainstorming to generate a list of solutions for the top-rated parking problem. b. Order these possible solutions separately according to each of the following factors: cost to implement, time to implement, and easiest to gain approval. c. Did your lists come out in a different order? Why? 18-8. Suppose the computer lab manager at your school has been receiving complaints from students about long waits to use a computer. The “effect” is long wait times. The categories of possible causes are people, equipment, methods/rules, and the environment. Brainstorm specific ideas for each of these causes. 18-9. The city bus line is consistently running behind schedule. Brainstorm possible causes organized by such cause categories as people, methods, equipment, and the environment. Once you have finished, develop a priority order based on which cause is most likely to be the root cause of the problem. END EXERCISES 18-1. 18.2 Introduction to Statistical Process. Control Charts As we stated in Section 18.1, one of the most important tools for quality improvement is the statistical process control (SPC) chart. In this section we provide an overview of SPC charts. As you will see, SPC is actually an application of hypothesis testing.. The Existence of Variation After studying the material in Chapters 1 through 17, you should be well aware of the importance of variation in business decision making. Variation exists naturally in the world around us. In any process or activity, the day-to-day outcomes are rarely the same. As a practical example, think about the time it takes you to travel to the university each morning. You know it’s about a 15-minute trip, and even though you travel the same route, your actual time will vary somewhat from day to day. You will notice this variation in many other daily occurrences. The next time you renew your car license plates, notice that some people seem to get through faster than others. The same is true at a bank, where the time to cash your payroll check varies each payday. Even in instances when variation is hard to detect, it is present. For example, when you measure a stack of 4-foot by 8-foot sheets of plywood using a tape measure, they will all appear to be 4 feet wide. However, when the stack is measured using an engineer’s scale, you may be able to detect slight variations among sheets, and using a caliper you can detect even more (see Figure 18.2). Therefore, three concepts to remember about variation are 1. Variation is natural; it is inherent in the world around us. 2. No two products or service experiences are exactly the same. 3. With a fine-enough gauge, all things can be seen to differ. Sources of Variation What causes variation? Variation in the output of a process comes from variation in the inputs to the process. Let’s go back to your travel time to school. Why isn’t it always the same? Your travel time depends on many factors, such as what route you take, how much traffic you encounter, whether you are in a hurry, how your car is running, and so on..

(350) CHAPTER 18. FIGURE 18.2. |. Introduction to Quality and Statistical Process Control. 809. |. Plywood Variation. 8'. Measuring Device 4'. 4'. 4'. 4'. 4'. 4'. 4'. 4'. Tape Measure. 4.01'. 3.99'. 4.01'. 4.0'. Engineer Scale. 4.009'. 3.987'. 4.012'. 4.004'. Caliper. 4.00913'. 3.98672'. 4.01204'. 4.00395'. Electronic Microscope. The six most common sources of variation are 1. 2. 3. 4. 5. 6.. People Machines Materials Methods Measurement Environment. Types of Variation Although variation is always present, we can define two major types that occur. The first is called common cause variation, which means it is naturally occurring or expected in the system. Other terms people use for common cause variation include normal, random, chance occurrence, inherent, and stable variation. The other type of variation is called special cause variation. This type of variation is abnormal, indicating that something out of the ordinary has happened. This type of variation is also called nonrandom, unstable, and assignable cause variation. In our example of travel time to school, there are common causes of variation such as traffic lights, traffic patterns, weather, and departure time. On the days when it takes you significantly more or less time to arrive at work, there are also special causes of variation occurring. These may be factors such as accidents, road construction detours, or needing to stop for gas. Examples of the two types of variation and some sources are as follows: Sources of Common Cause Variation. Sources of Special Cause Variation. Weather conditions Inconsistent work methods Machine wear Temperature Employee skill levels Computer response times. Equipment not maintained and cleaned Poor training Worker fatigue Procedures not followed Misuse of tools Incorrect data entry.

(351) 810. CHAPTER 18. |. Introduction to Quality and Statistical Process Control. In any process or system, the total process variation is a combination of common cause and special cause factors. This can be expressed by Equation 18.1. Variation Components Total process variation Common cause variation Special cause variation. (18.1). In process improvement efforts, the goal is first to remove the special cause variation and then to reduce the common cause variation in a system. Removing special cause variation requires the source of a variation be identified and its cause eliminated. The Predictability of Variation: Understanding the Normal Distribution Dr. W. Edwards Deming said there is no such thing as consistency. However, there is such a thing as a constant-cause system. A system that contains only common cause variations is very predictable. Though the outputs vary, they exhibit an important feature called stability. This means that some percentage of the output will continue to lie within given limits hour after hour, day after day, so long as the constant-cause system is operating. When a process exhibits stability, it is in control. The reason that the outputs vary in a predictable manner is because measurable data, when subgrouped and pictured in a histogram, tend to cluster around the average and spread out symmetrically on both sides. This tendency is a function of the Central Limit Theorem that you first encountered in Chapter 7. This means that the frequency distribution of most processes will begin to resemble the shape of the normal distribution as the values are collected and grouped into classes. The Concept of Stability We showed in Chapter 6 that the normal distribution can be divided into six sections, the sum of which includes 99.7% of the data values. The width of each of these sections is called the standard deviation. The standard deviation is the primary way the spread (or dispersion) of the distribution is measured. Thus, we expect virtually all (99.7%) of the data in a stable process to fall within plus or minus 3 standard deviations of the mean. Generally speaking, as long as the measurements fall within the 3-standard-deviation boundary, we consider the process to be stable. This concept provides the basis for statistical process control charts.. Introducing Statistical Process Control Charts Most cars are equipped with a temperature gauge that measures engine temperature. We come to rely on the gauge to let us know if “everything is all right.” As long as the gauge points to the normal range, we conclude that there is no problem. However, if the gauge moves outside the normal range toward the hot mark, it’s a signal that the engine is overheating and something is wrong. If the gauge moves out of the normal range toward the cold mark, it’s also a signal of potential problems. Under typical driving conditions, engine temperature will fluctuate. The normal range on the car’s gauge defines the expected temperature variation when the car is operating properly. Over time, we come to know what to expect. If something changes, the gauge is designed to give us a signal. The engine temperature gauge is analogous to a process control chart. Like the engine gauge, process control charts are used in business to define the boundaries that represent the amount of variation that can be considered normal. Figure 18.3 illustrates the general format of a process control chart. The upper and lower control limits define the normal operating region for the process. The horizontal axis reflects the passage of time, or order of production. The vertical axis corresponds to the variable of interest. There are a number of different types of process control charts. In this section, we introduce four of the most commonly used process control charts: 1. 2. 3. 4.. x -chart R-chart (range chart) p-chart c-chart.

(352) CHAPTER 18. FIGURE 18.3. |. Introduction to Quality and Statistical Process Control. 811. |. Process Control Chart Format. Special Cause Variation Upper Control Limit (UCL). Common Cause Variation. UCL. Average. 99.7%. LC L Lower Control Limit (LCL) Special Cause Variation Only 0.3% outside control limits UCL = Average + 3 Standard Deviations LC L = Average – 3 Standard Deviations. Each of these charts is designed for a special purpose. However, as you will see, the underlying logic is the same for each. The x -chart and R-chart are almost always used in tandem. The x -charts are used to monitor a process average. R-charts are used to monitor the variation of individual process values. They require the variable of interest to be quantitative. The following Business Application shows how these two charts are developed and used. Chapter Outcome 2.. x Chart and R-Chart BUSINESS APPLICATION. Excel and Minitab. tutorials. Excel and Minitab Tutorial. MONITORING A PROCESS USING x– AND R CHARTS. CATTLEMEN’S BAR AND GRILL The Cattlemen’s Bar and Grill in Kansas City, Missouri, has developed a name for its excellent food and service. To maintain this reputation, the owners have established key measures of product and service quality, and they monitor these regularly. One measure is the amount of time customers wait from the time they are seated until they are served. Every day, each hour that the business is open, four tables are randomly selected. The elapsed time from when customers are seated at these tables until their orders arrive is recorded. The owners wish to use these data to construct an x -chart and an R-chart. These control charts can be developed using the following steps: Step 1 Collect the initial sample data from which the control charts will be developed. Four measurements during each hour for 30 hours are contained in the file Cattlemen. The four values recorded each hour make up a subgroup. The x -charts and R-charts are typically generated from small subgroups (three to six observations), and the general recommendation is that data from a minimum of 20 subgroups be gathered before a chart is constructed. Once the subgroup size is determined, all subgroups must be the same size. In this case, the subgroup size is four tables. Step 2 Calculate subgroup means and ranges. Figure 18.4 shows the Excel worksheet with a partial listing of the data after the means and ranges have been computed for each subgroup..

(353) 812. CHAPTER 18. FIGURE 18.4. |. Introduction to Quality and Statistical Process Control. |. Excel 2007 Worksheet of Cattlemen’s Service-Time Data, Including Subgroup Means and Ranges. Excel 2007 Instructions: 1. Open file: Cattlemen.xls. 2. Note: Mean and Range values have been computed using Excel functions. Minitab Instructions (for similar results): 5. In Store result in, enter storage column. 1. Open file: Cattlemen.MTW. 6. Click OK. 2. Choose Calc Row Statistics. 7. Repeat 3 through 6 choosing Range under 3. Under Statistics, choose Mean. Statistics. 4. In Input variables, enter data columns.. Step 3 Compute the average of the subgroup means and the average range value. The average of the subgroup means and the average range value are computed using Equations 18.2 and 18.3.. Average Subgroup Mean k. ∑ xi. x i1 k. (18.2). where: xi ith subgroup average k Number of subgroupss. Average Subgroup Range k. ∑ Ri. R i1 k where:. Ri ith subgroup range k Number of subgroups. (18.3).

(354) CHAPTER 18. FIGURE 18.5. |. |. 813. Introduction to Quality and Statistical Process Control. 30. Line Chart for x–-Values for Cattlemen’s Data. 25. x-Values. 20. x. 15. 10. 5. 0. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Hour. Using Equations 18.2 and 18.3, we get: ∑ x 19.5 21 . . . 20.75 19. 24 k 30 ∑ R 77 . . . 3 R 5.73 k 30 x. Step 4 Prepare graphs of the subgroup means and ranges as a line chart. On one graph, plot the x -values in time order across the graph and draw a line across the graph at the value corresponding to x . This is shown in Figure 18.5. Likewise, graph the R-values and R as a line chart, as shown in Figure 18.6. The x and R values in Figures 18.5 and 18.6 are called the “process control centerlines.” The centerline is a graph of the mean value of the sample data. We use x as the notation for centerline, which represents the FIGURE 18.6. |. Line Chart for R-Values for Cattlemen’s Data. 15 13. R-Values. 11 9 7 R. 5 3 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Hour.

(355) 814. CHAPTER 18. |. Introduction to Quality and Statistical Process Control. current process average. For these sample data, the average time people in the subgroups wait between being seated and being served is 19.24 minutes. However, as seen in Figure 18.5, there is variation around the centerline. The next step is to establish the boundaries that define the limits for what is considered normal variation in the process. Step 5 Compute the upper and lower control limits for the x -chart. For a normal distribution with mean m and standard deviation s, approximately 99.7% of the values will fall within m 3s. Most process control charts are developed as 3-sigma control charts, meaning that the range of inherent variation is 3 standard deviations from the mean. Because the x -chart is a graph of subgroup (sample) means, the control limits are established at points 3 standard errors from the centerline, x . The control limits are analogous to the critical values we establish in hypothesistesting problems. Using this analogy, the null hypothesis is that the process is in control. We will reject this null hypothesis whenever we obtain a subgroup mean beyond 3 standard errors from the centerline in either direction. Because the control chart is based on sample data, our conclusions are subject to error. Approximately 3 times in 1,000 (0.003), a subgroup mean should be outside the control limits when, in fact, the process is still in control. If this happens, we will have committed a Type I error. The 0.003 value is the significance level for the test. This small alpha level implies that 3-sigma control charts are very conservative when it comes to saying that a process is out of control. We might also conclude that the process is in control when in fact it isn’t. If this happens, we have committed a Type II error. To construct the control limits, we must determine the standard error of the sample means, s / n. Based on what you have learned in previous chapters, you might suspect that we would use s / n . However, in most applications this is not done. In the 1930s, when process control charts were first introduced, there was no such thing as pocket calculators. To make control charts usable by people without calculators and without statistical training, a simpler approach was needed. An unbiased estimator of the standard error of the sample means, s / n , that was relatively easy to calculate was developed by Walter Shewhart.1 The unbiased estimator is A2 R 3 where: R The mean of the subgroups’ ranges A2 A Shewhart factor that makes (A2 /3) R an unbiased estimator of the standard error of the sample means, s / n Thus, 3 standard errors of the sample means can be estimated by ⎛A ⎞ 3 ⎜ 2 R ⎟ A2 R ⎝ 3 ⎠ Appendix Q displays the Shewhart factors for various subgroup sizes. Equations 18.4 and 18.5 are used to compute the upper and lower control limits for the x -chart.2 Upper Control Limit, x -Chart UCL x A2 ( R). (18.4). 1The leader of a group at the Bell Telephone Laboratories that did much of the original work in SPC, Shewhart is credited with developing the idea of control charts. 2When A /3 is multiplied by R , this product becomes an unbiased estimator of the standard error, which is the 2 reason for A2’s use here..

(356) CHAPTER 18. |. 815. Introduction to Quality and Statistical Process Control. Lower Control Limit, x -Chart LCL x − A2 ( R ). (18.5). where: A2 Shewhart factor for subgroup size n For the Cattlemen’s Bar and Grill example, the subgroup size is 4. Thus, the A2 factor from the Shewhart table (Appendix Q) is 0.729. We can compute the upper and lower control limits as follows: UCL x A2 ( R ) UCL 19.24 0.729(5.73) UCL 23.42 LCL x − A2 ( R ) LCL 19.24 − 0.729(5.73) LCL 15.06 Step 6 Compute the upper and lower control limits for the R-chart. The D3 and D4 factors in the Shewhart table presented in Appendix Q are used to compute the 3-sigma control limits for the R-chart. The control limits are established at points 3 standard errors from the centerline, R . However, unlike the case for the x -chart, the unbiased estimator of the standard error of the sample ranges is a constant multiplied by R . The constant for the lower control limit is the D3 value from the Shewhart table. The D4 value from the Shewhart table is the constant for the upper control limit. Equations 18.6 and 18.7 are used to find the UCL and LCL values. Because the subgroup size is 4 in our example, D3 0.0 and D4 2.282.3 Upper Control Limit, R-chart UCL D4 ( R ). (18.6). Lower Control Limit, R-chart LCL D3 ( R ). (18.7). where: D3 and D4 are taken from Appendix Q, the Shewhart table, for subgroup size n Using Equations 18.6 and 18.7, we get: UCL D4 ( R ) UCL 2.282(5.73) UCL 13.08 LCL D3 ( R ) LCL 0.0(5.73) LCL 0.0 Step 7 Finish constructing the control chart by locating the control limits on the x - and R-charts. Graph the UCL and LCL values on the x -chart and R-chart, as shown in Figures 18.7a and 18.7b and Figures 18.8a and 18.8b, which were constructed using the PHStat add-in to Excel and Minitab.4 3Because a range cannot be negative, the constant is adjusted to indicate that the lower boundary for the range must equal 0. 4See the Excel and Minitab Tutorial for the specific steps required to obtain the x - and R-charts. Minitab provides a much more extensive set of SPC options than PHStat..

(357) 816. |. CHAPTER 18. FIGURE 18.7A. Introduction to Quality and Statistical Process Control. |. Excel 2007 (PHStat) Cattlemen’s x–-Chart Output. Excel 2007 Instructions: 1. Open File: Cattlemen. xls. 2. Select Add-Ins. 3. Select PHStat. 4. Select Control Charts R and X-Bar Charts. 5. Specify Subgroup Size. 6. Define cell range for the subgroup ranges. 7. Check R and X-Bar Chart option. 8. Define cell range for subgroup means. 9. Click OK.. FIGURE 18.7B. |. Minitab Cattlemen’s x–-Chart Output. Minitab Instructions: 1. Open file: Cattlemen. MTW. 2. Choose Stat Control Charts Variables Charts for Subgroups X-bar. 3. Select Observations for a subgroup are in one row of columns. 4. Enter data columns in box. 5. Because file contains additional data select Data Options, Specify which rows to exclude. Select Row Numbers and enter 31, 32, 33, 34, 35, 36. Click OK. 6. Click on Xbar Options and select Test tab. 7. Select Perform the following tests for special courses and select the first four tests. 8. Click OK. OK..

(358) CHAPTER 18. FIGURE 18.8A. |. Introduction to Quality and Statistical Process Control. 817. |. Excel 2007 (PHStat) Cattlemen’s R-Chart Output. Excel 2007 Instructions: 1. Open File: Cattlemen. xls. 2. Select Add-Ins. 3. Select PHStat. 4. Select Control Charts R and X-Bar Charts. 5. Specify Subgroup Size. 6. Define cell range for the subgroup ranges. 7. Check R and X-Bar Chart option. 8. Define cell range for subgroup means. 9. Click OK.. Both students and people in industry sometimes confuse control limits and specification limits. Specification limits are arbitrary and are defined by a customer, by an industry standard, or by engineers who designed the item. The specification limits are defined as values above and below the “target” value for the item. The specification limits pertain to individual FIGURE 18.8B. |. Minitab Cattlemen’s R-Chart Output Minitab Instructions: 1. Open file: Cattlemen. MTW. 2. Choose Stat Control Charts Variables Charts for Subgroups R-bar. 3. Select Observations for a subgroup are in one row of columns. 4. Enter data columns in box. 5. Because file contains additional data select Data Options, Specify which rows to exclude. Select Row Numbers and enter 31, 32, 33, 34, 35, 36. Click OK. 6. Click on R-bar Options and select Test tab. 7. Select Perform the following tests for special courses and select the first four tests. 8. Click OK. OK.. UCL13.08. R5.733. LCL0.

(359) 818. CHAPTER 18. |. Introduction to Quality and Statistical Process Control. items—an item either meets specifications or it does not. Process control limits are computed from actual data from the process. These limits define the range of inherent variation that is actually occurring in the process. The control limits are values above and below the current process average (which may be higher or lower than the “target”). Therefore, a process may be operating in a state of control, but it may be producing individual items that do not meet the specifications. Companies interested in improving quality must first bring the process under control before attempting to make changes in the process to reduce the defect level. Using the Control Charts Once control charts for Cattlemen’s service time have been developed, they can be used to determine whether the time it takes to serve customers remains in control. The concept involved is essentially a hypothesis test in which the null and alternative hypotheses can be stated as H0: The process is in control; the variation around the centerline is a result of common causes inherent in the process. HA: The process is out of control; the variation around the centerline is due to some special cause and is beyond what is normal for the process. In the Cattlemen’s Bar and Grill example, the hypothesis is tested every hour, when four tables are selected and the service time is recorded for each table. The x - and R-values for the new subgroup are computed and plotted on their respective control charts. There are three main process changes that can be detected with a process control chart: 1. The process average has shifted up or down from normal. 2. The process average is trending up or down from normal. 3. The process is behaving in such a manner that the existing variation is not random in nature. If any of these has happened, the null hypothesis is considered false and the process is considered to be out of control. The control charts are used to provide signals that something has changed. There are four primary signals that indicate a change and that, if observed, will cause us to reject the null hypothesis.5 These are Signals. 1. 2. 3. 4.. One or more points outside the upper or lower control limits Nine or more points in a row above (or below) the centerline Six or more consecutive points moving in the same direction (increasing or decreasing) Fourteen points in a row, alternating up and down. These signals were derived such that the probability of a Type I error is less than 0.01. Thus, there is a very small chance that we will conclude the process has changed when, in fact, it has not. If we examine the control charts in Figures 18.7a and 18.7b and 18.8a and 18.8b, we find that none of these signals occur. Thus, the process is deemed in control during the period in which the initial sample data were collected. Suppose that the Cattlemen’s owners monitor the process for the next 5 hours. Table 18.3 shows these new values, along with the mean and range for each hour. The means are plotted on the x -chart, and the R-values are plotted on the R-chart, as shown in Figures 18.9 and 18.10. TABLE 18.3. |. Data for Hours 31 to 35 for Cattlemen’s Bar and Grill. Hour. Table 1. Table 2. Table 3. Table 4. Mean. Range. 31 32 33 34 35. 20 17 23 24 24. 21 22 20 23 25. 24 18 22 19 26. 22 20 22 20 27. 21.75 19.25 21.75 21.50 25.50. 4 5 3 5 3. 5There is some minor disagreement on the signals, depending on which process control source you refer to. Minitab actually lets the user define the signals under the option Define Tests..

(360) CHAPTER 18. FIGURE 18.9. |. |. Introduction to Quality and Statistical Process Control. 819. 30. Cattlemen’s x–-Chart. x 25.5. Subgroup Mean = x. 25. Out of Control. UCL = 23.42. 20. 15. LCL = 15.06. 10. 5. 0. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Hour. When x - and R-charts are used, we first look at the R-chart. Figure 18.10 shows the range (R) has been below the centerline (R 5.733) for seven consecutive hours. Although this doesn’t quite come up to the nine points of signal 2, the owners should begin to suspect something unusual might be happening to cause a downward shift in the variation in service time between tables. Although the R-chart does not indicate the reason for the shift, the owners should be pleased, because this might indicate greater consistency in service times. This change may represent a quality improvement. If this trend continues, the owners will want to study the situation so they will be able to retain these improvements in service-time variability. The x -chart in Figure 18.9 indicates that the average service time is out of control because in hour 35 the mean service time exceeded the upper control limit of 23.42 minutes. The mean wait time for the four tables during this hour was 25.5 minutes. The chance of this happening is extremely low unless something has changed in the process. This should be a signal to. FIGURE 18.10. |. 14. Cattlemen’s R-Chart. UCL 13.06. Subgroup Range = R. 12 10 8 6 4 2 0. 7 Consecutive Points below the Centerline. LCL 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Hour.

(361) 820. CHAPTER 18. |. Introduction to Quality and Statistical Process Control. the owners that a special cause exists. They should immediately investigate possible problems to determine if there has been a system change (e.g., training issue) or if this is truly a one-time event (e.g., fire in the kitchen). They could use a brainstorming technique or a fishbone diagram to identify possible causes of the problem. An important point is that analysis of each of the control charts should not be done in isolation. A moment’s consideration will lead you to see that if the variation of the process has gotten out of control (above the upper control limit), then trying to interpret the x -chart can be very misleading. Widely fluctuating variation could make it much more probable that an x -value would exceed the control limits even though the process mean had not changed. Adding (or subtracting) a given number from all of the numbers in a data set does not change the variance of that data set, so a shift in the mean of a process can occur without that shift affecting the variation of the process. However, a change in the variation almost always affects the x control chart. For this reason, the general advice is to control the variation of a process before the mean of the process is subjected to control chart analysis. The process control charts signal the user when something in the process has changed. For process control charts to be effective, they must be updated immediately after data have been collected, and action must be taken when the charts signal that a change has occurred. Chapter Outcome 3.. p-Charts The previous example illustrated how x -charts and R-charts can be developed and used. They are used in tandem and are applicable when the characteristic being monitored is a variable measured on a continuous scale (e.g., time, weight, length, etc.). However, there are instances when the process issue involves an attribute rather than a quantitative variable. An attribute is a quality characteristic that is either present or not present. In many quality control situations, an attribute is whether an item is good (meets specifications) or defective, and in those cases a p-chart can be used to monitor the proportion of defects.. BUSINESS APPLICATION. CONSTRUCTING p -CHARTS. HILDER’S PUBLISHING COMPANY Hilder’s Publishing Company sells books and records through a catalog, processing hundreds of mail and phone orders each day. Each customer order requires numerous data-entry steps. Mistakes made in data entry can be costly, resulting in shipping delays, incorrect prices, or the wrong items being shipped. As part of its ongoing efforts to improve quality, Hilder’s managers and employees want to reduce errors. The manager of the order-entry department has developed a process control chart to monitor order-entry errors. For each of the past 30 days she has selected a random sample of 100 orders. These orders were examined, with the attribute being Excel and Minitab. ● ●. tutorials. Excel and Minitab Tutorial. Order entry is correct. Order entry is incorrect.. In developing a p-chart, the sample size should be large enough such that np

(362) 5 and n(1 p)

(363) 5. Unlike the x - and R-chart cases, the sample size may differ from sample to sample. However, this complicates the development of the p-chart. The p-chart can be developed using the following steps: Step 1 Collect the sample data. The sample size is 100 orders. This size sample was selected for each of 30 days. The proportion of incorrect orders, called nonconformances, is displayed in the –. file Hilders. The proportions are given the notation p Step 2 Plot the subgroup proportions as a line chart. Figures 18.11a and 18.11b show the line chart for the 30 days..

(364) CHAPTER 18. FIGURE 18.11A. |. Introduction to Quality and Statistical Process Control. |. Excel 2007 (PHStat) p-Chart for Hilder’s Publishing. Excel 2007 (PHStat) Instructions: 1. Open File: Hilders.xls. 2. Select Add-Ins. 3. Select PHStat. 4. Select Control Charts p-Charts. 5. Define cell range for number of nonconformances (note, do not enter cell range for proportions of nonconformances). 6. Check Sample/Subgroup Size does not vary. 7. Enter sample/subgroup size. 8. Click OK.. FIGURE 18.11B. |. Minitab p-Chart for Hilder’s Publishing. UCL 0.1604. p 0.0793. LCL 0. Minitab Instructions: 1. Open file: Hilders.MTW. 2. Choose Stat Control Charts Attribute Charts P. 3. In Variable, enter number of orders with errors column. 4. In Subgroup Sizes enter size of subgroup:100.. 5. Click on P Chart Options and select Tests tab. 6. Select Perform all tests for special causes. 7. Click OK. OK.. 821.

(365) 822. CHAPTER 18. |. Introduction to Quality and Statistical Process Control. Step 3 Compute the mean subgroup proportion for all samples using Equation 18.8 or 18.9, depending on whether the sample sizes are equal. Mean Subgroup Proportion For equal-size samples k. p. ∑ pi i1. (18.8). k. where: pi Sample proportion for subgroup i k Number of samples of size n For unequal sample sizes: k. p. ∑ ni pi i1 k. (18.9). ∑ ni i1. where: ni The number of items in sample i pi Sample proportion for subgroup i k. ∑ ni Total number of items sampled in k samples. i =1. k Number of samples of size n. Because we have equal sample sizes, we use Equation 18.8, as follows: p. ∑ pi 0.10 0.06 0.06 0.07 . . . 2.38 0.0793 k 30 30. Thus, the average proportion of orders with errors is 0.0793. Step 4 Compute the standard error of p using Equation 18.10. Standard Error for the Subgroup Proportions For equal sample sizes: sp where:. ( p)(1− p) n. (18.10). p Mean subgroup proportion n Common sample siize. For unequal sample sizes: Option 1 Compute sp using largest sample size and sp using the smallest sample size. Construct control limits using each value. Option 2. sp . p(1 − p) ni. Compute a unique value of sp for each different sample. size, ni . Construct control limits for each sp value producing “wavy” control limits..

(366) CHAPTER 18. |. Introduction to Quality and Statistical Process Control. 823. We compute s p using Equation 18.10, as follows: sp . ( p ) (1 − p) ( 0.0793) (1 − 0.0793) 0.027 n. 100. Step 5 Compute the 3-sigma control limits using Equations 18.11 and 18.12. Control Limits for p-Chart UCL p 3sp. (18.11). LCL p − 3sp. (18.12). where: p Mean subgroup proportion sp Estimated stand dard error of p . ( p ) (1 − p ) n. Using Equations 18.11 and 18.12, we get the following control limits: UCL p 3sp 0.079 3( 0.027 ) 0.160. LCL p − 3sp 0.079 − 3( 0.027 ) − 0.002 → 0.0. Because a proportion of nonconforming items cannot be negative, the lower control limit is set to 0.0. Step 6 Plot the centerline and control limits on the control chart. Both upper and lower control limits are plotted on the control charts in Figures 18.11a and 18.11b. Using the p-Chart Once the control chart is developed, the same rules are used as for the x - and R-charts6: Signals. 1. 2. 3. 4.. One or more points outside the upper or lower control limits Nine or more points in a row above (or below) the centerline Six or more consecutive points moving in the same direction (increasing or decreasing) Fourteen points in a row, alternating up and down. The p-chart shown in Figure 18.11a indicates the process is in control. None of the signals are present in these data. The variation in the nonconformance rates is assumed to be due to the common cause issues. For future days, the managers would select random samples of 100 orders, count the number with errors, and compute the proportion. This value would be plotted on the p-chart. For each day, the managers would use the control chart to test the hypotheses: H0: The process is in control. The variation around the centerline is a result of common causes and is inherent in the process. HA: The process is out of control. The variation around the centerline is due to some special cause and is beyond what is normal for the process. The signals mentioned previously would be used to test the null hypothesis. Remember, control charts are most useful when the charts are updated as soon as the new sample data become available. When a signal of special cause variation is present, you should take action to determine the source of the problem and address it as quickly as possible. 6Minitab. allows the user to specify the signals. This is done in the Define Tests feature under Stat—Control Charts. Minitab also allows unequal sample sizes. See the Excel and Minitab Tutorial for specifics in developing a p-chart..

(367) 824. CHAPTER 18. |. Introduction to Quality and Statistical Process Control. Chapter Outcome 4.. c-Charts The p-chart just discussed is used when you select a sample of items and you determine the number of the sampled items that possess a specific attribute of interest. Each item either has or does not have that attribute. You will encounter other situations that involve attribute data but differ from the p-chart applications. In these situations, you have what is defined as a sampling unit (or experimental unit), which could be a sheet of plywood, a door panel on a car, a textbook page, an hour of service, or any other defined unit of space, volume, time, and so on. Each sampling unit could have one or more of the attributes of interest, and you would be able to count the number of attributes present in each sampling unit. In cases in which the sampling units are the same size, the appropriate control chart is a c-chart.. BUSINESS APPLICATION. Excel and Minitab. tutorials. Excel and Minitab Tutorial. Special Note. CONSTRUCTING c -CHARTS. CHANDLER TILE COMPANY The Chandler Tile Company makes ceramic tile. In recent years, there has been a big demand for tile products in private residences for kitchens and bathrooms and in commercial establishments for decorative counter and wall covering. Although the demand has increased, so has the competition. The senior management at Chandler knows that three factors are key to winning business from contractors: price, quality, and service. One quality issue is scratches on a tile. The production managers wish to set up a control chart to monitor the level of scratches per tile to determine whether the production process remains in control.. Control charts monitor a process as it currently operates, not necessarily how you would like it to operate. Thus, a process that is in control might still yield a higher number of scratches per tile than the managers would like.. The managers believe that the numbers of scratches per tile are independent of each other and that the prevailing operating conditions are consistent from tile to tile. In this case, the proper control chart is a c-chart. Here the tiles being sampled are the same size, and the managers will tally the number of scratches per tile. However, if we asked, “How many opportunities were there to scratch each tile?” we probably would not be able to answer the question. There are more opportunities than we could count. For this reason, the c-chart is based on the Poisson probability distribution introduced in Chapter 5, rather than the binomial distribution.You might recall that the Poisson distribution is defined by the mean number of successes per interval, or sampling unit, as shown in Equation 18.13. A success can be regarded as a defect, a nonconformance, or any other characteristic of interest. In the Chandler example, a success is a scratch on a tile.. Mean for c-Chart k. c. ∑ xi i1. k. where: xi Number of successes per sampling unit k Number of sampling units. (18.13).

(368) CHAPTER 18. |. Introduction to Quality and Statistical Process Control. 825. Because the Poisson distribution is skewed when the mean of the sampling unit is small, we must define the sampling unit so that it is large enough to provide an average of at least 5 successes per sampling unit ( c

(369) 5). This may require that you combine smaller sampling units into a larger unit size. In this case we combine six tiles to form a sampling unit. The mean and the variance of the Poisson distribution are identical. Therefore, the standard deviation of the Poisson distribution is the square root of its mean. For this reason, the estimator of the standard deviation for the Poisson distribution is computed as the square root of the sample mean, as shown in Equation 18.14.. Standard Deviation for c-Chart sc c. (18.14). Then Equations 18.15 and 18.16 are used to compute the 3-sigma (3 standard deviation) control limits for the c-chart.. c-Chart Control Limits UCL c 3 c. (18.15). LCL c − 3 c. (18.16). and. You can use the following steps to construct a c-chart: Step 1 Collect the sample data. The original plan called for the Chandler Tile Company to select six tiles each hour from the production line and to perform a thorough inspection to count the number of scratches per tile. Like all control charts, at least 20 samples are desired in developing the initial control chart. After collecting 40 sampling units of six tiles each, the total number of scratches found was 228. The data set is contained in the file Chandler. Step 2 Plot the subgroup number of occurrences as a line chart. Figures 18.12a and 18.12b show the line chart for the 40 sampling units. Step 3 Compute the average number of occurrences per sampling unit using Equation 18.13. The mean is c. ∑ x 228 5.70 k 40. Step 4 Compute the standard error of Sc using Equation 18.14. The standard error is sc c 5.70 2.387 Step 5 Construct the 3-sigma control limits, using Equations 18.15 and 18.16. The upper and lower 3-sigma control limits are UCL c 3 c 5.7 3(2.387 ) 12.86 LCL c − 3 c 5.7 − 3(2.387 ) −1.46 → 0.0.

(370) 826. CHAPTER 18. FIGURE 18.12A. |. Introduction to Quality and Statistical Process Control. |. Excel 2007 c-Chart for Chandler Tile Company. Excel 2007 Instructions: 1. Open File: Chandler.xls. 2. Calculate the c-bar value using Excel’s AVERAGE function and copy this value in a new column. 3. Calculate the standard deviation as the square root of the mean using Excel’s. How to do it Constructing SPC charts The following steps are employed when constructing statistical quality control charts:. 1. Collect the sample data. 2. Plot the subgroup statistics as a line chart.. 3. Compute the average subgroup statistic, i.e., the centerline value. The centerline on the control chart is the average value for the sample data. This is the current process average.. 4. Compute the appropriate stan-. SQRT function. Then copy that value in a new column. 4. Select the three columns. 5. Click Insert > Line Chart. 6. Click Layout to add axis labels and chart titles.. Step 6 Plot the centerline and control limits on the control chart. Both upper and lower control limits are plotted on the control charts in Figures 18.12a and 18.12b. As with the p-chart, the lower control limit can’t be negative. We change it to zero, which is the fewest possible scratches on a tile. The completed c-chart is shown in Figures 18.12a and 18.12b. Note in both figures that four samples of six tiles each had a total number of scratches that fell outside the upper control limit of 12.86. The managers need to consult production records and other information to determine what special cause might have generated this level of scratches. If they can determine the cause, these data points should be removed and the control limits should be recomputed from the remaining 36 values. You might also note that the graph changes beginning with about sample 22. The process seems more stable from sample 22 onward. Managers might consider inspecting for another 13 to 15 hours and recomputing the control limits using data from hours 22 and higher.. dard error.. 5. Compute the upper and lower control limits.. 6. Plot the appropriate data on the control chart, along with the centerline and control limits.. Other Control Charts Our purpose in this chapter has been to introduce SPC charts. We have illustrated a few of the most frequently used charts. However, there are many other types of control charts that can be used in special situations. You are encouraged to consult several of the references listed at the end of the chapter for information about these other charts. Regardless of the type of statistical quality control chart you are using, the same general steps are used..

(371) CHAPTER 18. FIGURE 18.12B. |. Introduction to Quality and Statistical Process Control. 827. |. Minitab c-Chart for Chandler Tile Company. Out of control. Minitab Instructions: 1. Open file: Chandler.xls. 2. Choose Stat Control Chart Attributes Charts C. 3. In Variable, enter number of defectives column. 4. Click on C Chart Options and select Tests tab. 5. Select Perform all tests for special causes. 6. Click OK. OK.. MyStatLab. 18-2: Exercises Skill Development 18-10. Fifty sampling units of equal size were inspected, and the number of nonconforming situations was recorded. The total number of instances was 449. a. Determine the appropriate control chart to use for this process. b. Compute the mean value for the control chart. c. Compute the upper and lower control limits. 18-11. Data were collected on a quantitative measure with a subgroup size of five observations. Thirty subgroups were collected, with the following results: x 44.52. R 5.6. a. Determine the Shewhart factors that will be needed if x - and R-charts are to be developed. b. Compute the upper and lower control limits for the R-chart. c. Compute the upper and lower control limits for the x-chart. 18-12. Data were collected from a process in which the factor of interest was whether the finished item contained a particular attribute. The fraction of items that did not contain the attribute was recorded. A total of 30 samples were selected. The common sample size was 100 items. The total number of nonconforming. items was 270. Based on these data, compute the upper and lower control limits for the p-chart. 18-13. Explain why it is important to update the control charts as soon as new data become available.. Computer Database Exercises 18-14. Grandfoods, Inc., makes energy supplement bars for use by athletes and others who need an energy boost. One of the critical quality characteristics is the weight of the bars. Too much weight implies that too many liquids have been added to the mix and the bar will be too chewy. If the bars are light, the implication is that the bars are too dry. To monitor the weights, the production manager wishes to use process control charts. Data for 30 subgroups of size 4 bars are contained in the file Grandfoods. Note that a subgroup is selected every 15 minutes as bars come off the manufacturing line. a. Use these data to construct the appropriate process control chart(s). b. Discuss what each chart is used for. Why do we need both charts? c. Examine the control charts and indicate which, if any, of the signals are present. Is the process currently in control?.

(372) 828. CHAPTER 18. |. Introduction to Quality and Statistical Process Control. d. Develop a histogram for the energy bar weights. Discuss the shape of the distribution and the implications of this toward the validity of the control chart procedures you have used. 18-15. The Haines Lumber Company makes plywood for residential and commercial construction. One of the key quality measures is plywood thickness. Every hour, five pieces of plywood are selected and the thicknesses are measured. The data (in inches) for the first 20 subgroups are in the file Haines. a. Construct an x -chart based on these data. Make sure you plot the centerline and both 3-sigma upper and lower control limits. b. Construct an R-chart based on these data. c. Examine both control charts and determine if there are any special causes of variation that require attention in this process. 18-16. Referring to Exercise 18-15, suppose the process remained in control for the next 40 hours. The thickness measurements for hours 41 through 43 are as follows: Hour 41 Hour 42 Hour 43. 0.764 0.766 0.812. 0.737 0.785 0.774. 0.724 0.777 0.767. 0.716 0.790 0.799. 0.752 0.799 0.821. a. Based on these data and the two control charts, what should you conclude about the process? Has the process gone out of control? Discuss. b. Was it necessary to obtain your answer to Exercise 18-15 before part a could be answered? Explain your reasoning. 18-17. Referring to the process control charts developed in Exercise 18-14, data for periods 31 to 40 are contained in the file Grandfoods-Extra. a. Based on these data, what would you conclude about the energy bar process? b. Write a report discussing the results, and show the control charts along with the new data. 18-18. Wilson, Ryan, and Reed is a large certified public accounting (CPA) firm in Charleston, South Carolina. It has been monitoring the accuracy of its employees and wishes to get the number of accounts with errors under statistical control. It has sampled 100 accounts for each of the last 30 days and has examined them for errors. The data are presented in the file Accounts. a. Construct the relevant control chart for the account process. b. What does the chart indicate about the statistical stability of the process? Give reasons for your answers. c. Suppose that for the next 3 days, sample sizes of 100 accounts are examined with the following results: Number of Errors. 6. 7. 9. Plot the appropriate data on the control chart and indicate whether any of the control chart signals are present. Discuss your results. 18-19. Trinkle & Sons performs subcontract body paint work for one of the “Big Three” automakers. One of its recent contracts called for the company to paint 12,500 door panels. Several quality characteristics are very important to the manufacturer, one of which is blemishes in the paint. The manufacturer has required Trinkle & Sons to have control charts to monitor the number of paint blemishes per door panel. The panels are all for the same model car and are the same size. To initially develop the control chart, data for 88 door panels were collected and are provided in the file CarPaint. a. Determine the appropriate type of process control chart to develop. b. Develop a 3-sigma control chart. c. Based on the control chart and the standard signals discussed in this chapter, what conclusions can you reach about whether the paint process is in control? Discuss. 18-20. Tony Perez is the manager of one of the largest chains of service stores that specialize in oil and lubrication of automobiles, Fastlube, Inc. One of the company’s stated goals is to provide a lube and oil change for anyone’s automobile in 15 minutes. Tony has thought for some time now that there is a growing disparity among his workers in the time it takes to lube and change the oil of an automobile. To monitor this aspect of Fastlube, Tony has selected a sample of 20 days and has recorded the time it took five randomly selected employees to service an automobile. The data are located in the file Fastlube. a. Tony glanced through the data and noticed that the longest time it took to service a car was 25.33 minutes. Suppose the distribution of times to service a car was normal, with a mean of 15. Use your knowledge of a normal distribution to let Tony know what the standard deviation is for the time it takes to service a car. b. Use the Fastlube data to construct an x and an R-chart. c. Based on these data, what would you conclude about the service process? d. Based on your findings on the R-chart, would it be advisable to draw conclusions based on the x -chart? 18-21. The Ajax Taxi company in Manhattan, New York, wishes to set up an x -chart and an R-chart to monitor the number of miles driven per day by its taxi drivers. Each week, the scheduler selects four taxis and (without the drivers’ knowledge) monitors the number of miles driven. He has done this for the past 40 weeks. The data are in the file Ajax. a. Construct the R-chart for these 40 subgroups. b. Construct the x -chart for these 40 subgroups. Be sure to label the chart correctly..

(373) CHAPTER 18. c. Look at both control charts and determine if any of the control chart signals are present to indicate that the process is not in control. Explain the implications of what you have found for the Ajax Taxi Company. 18-22. Referring to Exercise 18-21, assume the Ajax managers determine any issues identified by the control charts were caused by one time events. The data for weeks 41 through 45 are in the Ajax-Extra file. a. Using the control limits developed from the first 40 weeks, do these data indicate that the process is now out of control? Explain. b. If a change has occurred, brainstorm some of the possible reasons. c. What will be the impact on the control charts when the new data are included? d. Use the data in the files Ajax and AjaxExtra to develop the new control charts. e. Are any of the typical control chart signals present? Discuss. 18-23. The Kaiser Corporation makes aluminum at various locations around the country. One of the key factors in being profitable is keeping the machinery running. One particularly troublesome machine is a roller that flattens the sheets to the appropriate thickness. This machine tends to break down for various reasons. Consequently, the maintenance manager has decided to develop a process control chart. Over a period of 10 weeks, 20 subgroups consisting of 5 downtime measures (in minutes) were collected (one measurement at the end of each of the two shifts.) The subgroup means and ranges are shown as follows and are contained in the file called Kaiser. Subgroup Mean Subgroup Range Subgroup Mean Subgroup Range Subgroup Mean Subgroup Range. 104.8. 85.9. 78.6. 72.8. 102.6. 84.8. 67.0. 9.6. 14.3. 8.6. 10.6. 11.2. 13.5. 10.8. 91.1. 79.5. 71.9. 47.6. 106.7. 80.7. 81.0. 5.2. 14.2. 14.1. 14.9. 12.7. 13.3. 15.4. 57.0. 98.9. 87.9. 64.9. 101.6. 83.9. 15.5. 13.8. 16.6. 11.2. 9.6. 11.5. a. Explain why the x - and R-charts would be appropriate in this case. b. Find the centerline value for the x -chart. c. Calculate the centerline value for the R-chart. d. Compute the upper and lower control limits for the R-chart, and construct the chart with appropriate labels. e. Compute the upper and lower control limits for the x -chart, and construct the chart with appropriate labels. f. Examine the charts constructed in parts d and e and determine whether the process was in control during. |. Introduction to Quality and Statistical Process Control. 829. the period for which the control charts were developed. Explain. 18-24. Referring to Exercise 18-23, if necessary delete any out-of-control points and construct the appropriate x and R-charts. Now, suppose the process stays in control for the next six weeks (subgroups 18 through 23). The subgroup means and ranges for subgroups 33 to 38 are as follows: Subgroup Subgroup Mean Subgroup Range. 33 89.0 11.4. 34 88.4 5.4. 35 85.2 14.2. 36 89.3 11.7. 37 97.2 9.5. 38 105.3 10.2. a. Plot the ranges on the R-chart. Is there evidence based on the range chart that the process has gone out of control? Discuss. b. Plot the subgroup means on the x -chart. Is there evidence to suggest that the process has gone out of control with respect to the process mean? Discuss. 18-25. Regis Printing Company performs printing services for individuals and business customers. Many of the jobs require that brochures be folded for mailing. The company has a machine that does the folding. It generally does a good job, but it can have problems that cause it to do improper folds. To monitor this process, the company selects a sample of 50 brochures from every order and counts the number of incorrectly folded items in each sample. Until now, nothing has been done with the 300 samples that have been collected. The data are located in the file Regis. a. What is the appropriate control chart to use to monitor this process? b. Using the data in this file, construct the appropriate control chart and label it properly. c. Suppose that for the next three orders, sample sizes of 50 brochures are examined with the following results: Sample Number Number of Bad Folds. 301 6. 302 9. 303 7. Plot the appropriate data to the control chart and indicate whether any of the control chart signals are present. Discuss your results. d. Suppose that the next sample of 50 has 14 improperly folded brochures. What conclusion should be reached based on the control chart? Discuss. 18-26. Recall from Exercise 18-20 that Tony Perez is the manager of one of the largest chains of service stores that specialize in oil and lubrication of automobiles, Fastlube, Inc. One of the company’s stated goals is to provide a lube and oil change for anyone’s automobile in 15 minutes. Tony has thought for some time now that there is a growing disparity among his workers in the time it takes to lube and change the oil of an automobile. To monitor this aspect of Fastlube, Tony has selected a sample of 24 days and has recorded the time it took to service 100 automobiles each day. The.

(374) 830. CHAPTER 18. |. Introduction to Quality and Statistical Process Control. number of times the service was performed in 15 minutes or less (≤15) is given in the file Lubeoil. a. (1) Convert the sample data to proportions and plot the data as a line graph. (2) Compute p and plot this value on the line graph. (3) Compute sp and interpret what it measures. b. Construct a p-chart and determine if the process of the time required for oil and lube jobs is in control. c. Specify the signals that are used to indicate an outof-control situation on a p-chart. 18-27. Susan Booth is the director of operations for National Skyways, a small commuter airline with headquarters in Cedar Rapids, Iowa. She has become increasingly concerned about the amount of carry-on luggage passengers have been carrying on board National Skyways’ planes. She collected data concerning the number of pieces of baggage that were taken on board over a one-month period. The data collected are provided in the file Carryon. Hint: Consider a U-chart from the optional topics. a. Set up a control chart for the number of carry-on bags per day. b. Is the process in a state of statistical control? Explain your answer. c. Suppose that National Skyways’ aircraft were full for each of the 30 days. Each Skyways aircraft. holds 40 passengers. Describe the control chart you would use. Is it necessary that you use this latter alternative or is it just a preference? Explain your answer. 18-28. Sid Luka is the service manager for Brakes Unlimited, a franchise corporation that specializes in servicing automobile brakes. He wants to study the length of time required to replace the rear drum brakes of automobiles. A subgroup of 10 automobiles needing their brakes replaced was selected on each day for a period of 20 days. The subgroup times required (in hours) for this service were recorded and are presented in the file Brakes. (This problem cannot be done using Minitab.) a. Sid has been trying to get the average time required to replace the rear drum brakes of an automobile to be under 1.65 hours. Use the data Sid has collected to determine if he has reached his goal. b. Set up the appropriate control charts to determine if this process is under control. c. Determine whether the process is under control. If the process is not under control, brainstorm suggestions that might help Sid bring it under control. What tools of quality might Sid find useful? END EXERCISES 18-2.

(375) CHAPTER 18. |. Introduction to Quality and Statistical Process Control. Visual Summary Chapter 18: Organizations across the United States and around the world have turned to quality management in an effort to meet the competitive challenges of the international marketplace. Their efforts have generally followed two distinctive, but complementary, tracks. The first track involves a change in managerial philosophy following principles set out by W. Edwards Deming and Joseph Juran, two pioneers in the quality movement. It generally involves employees at all levels to be brought into the effort as members of process improvement teams. These teams are assisted by training in the Tools of Quality. The second track involves a process of continual improvement using a set of statistically based tools involving process control charts. This chapter presents a brief introduction into the contributions of Deming, Juran and other leaders in the quality movement. It then discusses four of the most commonly used statistical process control charts.. 18.1 Quality Management and Tools for Process Improvement (pg. 805–808) Summary Although Deming is better known, both Deming and Juran were instrumental in the world-wide quality movement. Deming, a statistician, and Juran, an engineer, emphasized that quality was the key to being competitive and that quality could be best achieved by improving the processes that produced the products and delivered the services. Deming, also known as the “Father of Japanese Quality and Productivity” preached a new management philosophy, which has become known as Total Quality Management, or TQM. His philosophy is spelled out in his 14 points that emphasized the importance of managerial leadership and continual improvement. Juran is noted for his 10 steps to quality improvement. Juran and Deming differed on some areas, including the use of goals and targets. Juran is also credited with applying the Pareto Principle to quality which is used to focus on the vital few sources of problems and to separate the vital few from the trivial many. Pareto charts are used to display data in a way that helps managers find the most important problem issues. Deming, Juran and others contributed to the Tools of Quality: process flowcharts, brainstorming, fishbone diagrams, histograms, trend charts, scatter plots and statistical process control charts. Outcome 1. Use the seven basic tools of quality.. 18.2 Introduction to Statistical Process Control Charts (pg. 808–830) Summary Process control charts are based on the idea that all processes exhibit variation. Although variation is always present, two major types occur. The first is common cause variation, which means it is naturally occurring or expected in the system. Other terms people use for common cause variation include normal, random, chance occurrence, inherent, and stable variation. The other type of variation is special cause variation, It indicates that something out of the ordinary has happened. This type of variation is also called nonrandom, unstable, and assignable cause variation. Process control charts are used to separate special cause variation from common cause variation. Process control charts are constructed by determining a centerline (a process average) and construction upper and lower control limits around the center line (three standard deviation lines above and below the average). The charts considered in this chapter are the most commonly used: the x and R charts (used in tandem), p-charts and c-charts. Data is gathered continually from the process being monitored and plotted on the chart. Data points between the upper and lower control limits generally indicate the process is stable, but not always. Process control charts are continually monitored to indicate signs of going “out of control”. Common signals include: one or more points outside the upper or lower control limits, nine or more points in a row above (or below) the centerline, six or more consecutive points moving in the same direction (increasing or decreasing), fourteen points in a row, alternating up and down. Generally, continual process improvement procedures involve identifying and addressing the reason for special cause variation and then working to reduce the common cause variation. Outcome 2. Construct and interpret x charts and R-charts. Outcome 3. Construct and interpret p-charts. Outcome 4. Construct and interpret c-charts.. Conclusion The quality movement throughout the United States and much of the rest of the world has created great expectations among consumers. Ideas such as continuous process improvement and customer focus have become a central part in raising these expectations. Statistics has played a key role in increasing expectations of quality products. The enemy of quality is variation, which exists in everything. Through the use of appropriate statistical tools and the concept of statistical reasoning, managers and employees have developed better understandings of their processes. Although they haven’t yet figured out how to eliminate variation, statistics has helped reduce it and understand how to operate more effectively when it exists. Statistical process control (SPC) has played a big part in the understanding of variation. SPC is quite likely the most frequently used of the Basic 7 Tools. This chapter has introduced SPC. Hopefully, you realize that these tools are merely extensions of the hypothesis-testing and estimation concepts presented in Chapters 8–10. You will very likely have the opportunity to use SPC in one form or another after you leave this course and enter the workforce. Figure 18.13 summarizes some of the key SPC charts and the conditions under which each is developed.. 831.

(376) 832. CHAPTER 18. FIGURE 18.13. |. Introduction to Quality and Statistical Process Control. |. Statistical Process Control Chart Option. Type of Data Variables Data R-Chart UCL = D4(R) LCL = D3(R) k. R=. Ri i =1. k. x -Chart UCL = x + A2(R) LCL = x – A2(R) k xi i =1 x= k. Attribute Data p-Chart UCL = p + 3sP LCL = p – 3sP k pi p = i =1 k (p)(1 – p) sP = n c-Chart UCL = c + 3 c LCL = c–3 c k xi c = i =1 k sc = c. Other Variables Charts:. Other Attribute Charts:. x-Chart MR Chart Zone Chart. U -Chart np-Chart. Equations (18.1) Variation Components pg. 810. Total process variation Common cause variation Special cause variation (18.2) Average Subgroup Means pg. 812 k. x. ∑ xi. (18.6) Upper Control Limit, R-Chart pg. 815. UCL D4 ( R ) (18.7) Lower Control Limit, R-Chart pg. 815. LCL D3 ( R ). i1. k. (18.8) Mean Subgroup Proportion pg. 822. (18.3) Average Subgroup Range pg. 812. For equal-size samples:. k. R. ∑. k. Ri. i1. k. (18.4) Upper Control Limit, x-Chart pg. 814. p. LCL x − A2 ( R ). i1. k. (18.9) For unequal sample sizes:. UCL x A2 ( R ) (18.5) Lower Control Limit, x-Chart pg. 815. ∑ pi k. p. ∑ ni pi i1 k. ∑ ni i1.

(377) CHAPTER 18. (18.10) Estimate for the Standard Error for the Subgroup. |. Introduction to Quality and Statistical Process Control. (18.13) Mean for c-Chart pg. 824. Proportions pg. 822. k. For equal sample sizes:. sp . 833. c. ( p )(1 − p ) n. ∑ xi i1. k. (18.14) Standard Deviation for c-Chart pg. 825. sc c. (18.11) Control Limits for p-Chart pg. 823. UCL p 3s p. (18.15) c-Chart Control Limits pg. 825. UCL c 3 c. (18.12) (18.16). LCL p − 3s p. LCL c − 3 c. Key Terms Pareto principle pg. 805. Total quality management (TQM) pg. 805. Chapter Exercises Conceptual Questions 18-29. Data were collected on a quantitative measure with a subgroup size of three observations. Thirty subgroups were collected, with the following results: x 1, 345.4. R 209.3. a. Determine the Shewhart factors that will be needed if x - and R-charts are to be developed. b. Compute the upper and lower control limits for the R-chart. c. Compute the upper and lower control limits for the x -chart. 18-30. Data were collected on a quantitative measure with subgroups of four observations.Twenty-five subgroups were collected, with the following results: x 2.33. R 0.80. a. Determine the Shewhart factors that will be needed if x - and R-charts are to be developed. b. Compute the upper and lower control limits for the R-chart. c. Compute the upper and lower control limits for the x -chart. 18-31. Data were collected from a process in which the factor of interest was whether a finished item contained a particular attribute. The fraction of items that did not contain the attribute was recorded. A total of 20 samples were selected. The common sample size was 150 items. The total number of nonconforming items was 720. Based on these data, compute the upper and lower control limits for the p-chart.. MyStatLab 18-32. Data were collected from a process in which the factor of interest was whether a finished item contained a particular attribute. The fraction of items that did not contain the attribute was recorded. A total of 30 samples were selected. The common sample size was 100 items. The average number of nonconforming items per sample was 14. Based on these data, construct the upper and lower control limits for the p-chart.. Computer Database Exercises 18-33. CC, Inc., provides billing services for the health care industry. To ensure that its processes are operating as intended, CC selects 100 billing records at random every day and inspects each record to determine if it is free of errors. A billing record is classified as defective whenever there is an error that requires that the bill be reprocessed and mailed again. Such errors can occur for a variety of reasons. For example, a defective bill could have an incorrect mailing address, a wrong insurance identification number, or an improper doctor or hospital reference. The sample data taken from the most recent five weeks of billing records are contained in the file CC Inc. Use the sample information to construct the appropriate 3-sigma control chart. Does CC’s billing process appear to be in control? What, if any, comments can you make regarding the performance of its billing process? 18-34. A & A Enterprises ships integrated circuits to companies that assemble computers. Because computer manufacturing operations run on little inventory, parts must be available when promised..

(378) 834. CHAPTER 18. |. Introduction to Quality and Statistical Process Control. Thus, a critical element of A & A’s customer satisfaction is on-time delivery performance. To ensure that the delivery process is performing as intended, a quality improvement team decided to monitor the firm’s distribution and delivery process. From the A & A corporate database, 100 monthly shipping records were randomly selected for the previous 21 months, and the number of on-time shipments was counted. This information is contained in the file A & A On Time Shipments. Develop the appropriate 3-sigma control chart(s) for monitoring this process. Does it appear that the delivery process is in control? If not, can you suggest some possible assignable causes? 18-35. Fifi Carpets, Inc., produces carpet for homes and offices. Fifi has recently opened a new production process dedicated to the manufacture of a special type of carpet used by firms who want a floor covering for high-traffic spaces. As a part of their ongoing quality improvement activities, the managers of Fifi regularly monitor their production processes using statistical process control. For their new production process, Fifi managers would like to develop control charts to help them in their monitoring activities. Thirty samples of carpet sections, with each section having an area of 50 square meters, were randomly selected, and the numbers of stains, cuts, snags, and tears were counted on each section. The sample data are contained in the file Fifi Carpets. Use the sample data to construct the appropriate 3-sigma control chart(s) for monitoring the production process. Does the process appear to be in statistical control? 18-36. The order-entry, order-processing call center for PS Industries is concerned about the amounts of time that customers must wait before their calls are handled. A quality improvement consultant suggests that it monitor call-wait times using control charts. Using call center statistics maintained by the company’s database system, the consultant randomly selects four calls. every hour for 30 different hours and examines the wait time, in seconds, for each call. This information is contained in the file PS Industries. Use the sample data to construct the appropriate control chart(s). Does the process appear to be in statistical control? What other information concerning the call center’s process should the consultant be aware of? 18-37. Varians Controls manufactures a variety of different electric motors and drives. One step in the manufacturing process involves cutting copper wire from large reels into smaller lengths. For a particular motor, there is a dedicated machine for cutting wire to the required length. As a part of its regular quality improvement activities, the continuous process improvement team at Varians took a sample size of 5 cuttings every hour for 30 consecutive hours of operation. At the time the samples were taken, Varians had every reason to believe that its process was working as intended. The automatic cutting machine records the length of each cut, and the results are reported in the file Varians Controls. a. Develop the appropriate 3-sigma control chart(s) for this process. Does the process appear to be working as intended (in control)? b. A few weeks after the previous data were sampled, a new operator was hired to calibrate the company’s cutting machines. The first 5 samples taken from the machine after the calibration adjustments (samples 225 to 229) are shown as follows: Cutting 1. Cutting 2 Cutting 3 Cutting 4 Cutting 5. Sample 225 Sample 226 Sample 227. 0.7818 0.7694 0.7875. 0.7760 0.7838 0.7738. 0.7814 0.7675 0.7737. 0.7824 0.7834 0.7594. 0.7702 0.7730 0.7837. Sample 228 Sample 229. 0.7762 0.7805. 0.7711 0.7724. 0.7700 0.7748. 0.7823 0.7823. 0.7673 0.7924. Based on these sample values, what can you say about the cutting process? Does it appear to be in control?. Case 18.1 Izbar Precision Casters, Inc. Izbar Precision Casters, Inc., manufactures a variety of structural steel products for the construction trade. Currently, there is a strong demand for its I-beam product produced at a mill outside Memphis. Beams at this facility are shipped throughout the Midwest and mid-South, and demand for the product is high due to the strong economy in the regions served by the plant. Angie Schneider, the mill’s manager, wants to ensure that the plant’s operations are in control, and she has selected several characteristics to monitor. Specifically, she collects data on the number of weekly accidents at the plant, the number of orders shipped on time, and the thickness of the steel I-beams produced.. For the number of reported accidents, Angie selected 30 days at random from the company’s safety records. Angie and all the plant employees are very concerned about workplace safety, and management, labor, and government officials have worked together to help create a safe work environment. As a part of the safety program, the company requires employees to report every accident regardless of how minor it may be. In fact, most accidents are very minor, but Izbar still records them and works to prevent them from recurring. Because of Izbar’s strong reporting requirement, Angie was able to get a count of the number of reported accidents for each of the 30 sampled days. These data are shown in Table 18.4. To monitor the percentage of on-time shipments, Angie randomly selected 50 records from the firm’s shipping and billing.

(379) CHAPTER 18. TABLE 18.4. |. Accident Data. Day. Number of Reported Accidents. Day. Number of Reported Accidents. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15. 9 11 9 9 11 10 10 10 4 7 7 8 11 10 7. 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30. 4 11 7 7 10 11 6 6 7 4 9 11 9 6 5. system every day for 20 different days over the past six months. These records contain the actual and promised shipping dates for each order. Angie used a spreadsheet to determine the number of shipments that were made after the promised shipment dates. The number of late shipments from the 50 sampled records was then reported. These data are shown in Table 18.5. Finally, to monitor the thickness of the I-beam produced at the plant, Angie randomly selected six I-beams every day for 30 days and had each sampled beam measured. The thickness of each beam, in inches, was recorded. All of the data collected by Angie are contained in the file Izbar. She wants to use the information. |. 835. Introduction to Quality and Statistical Process Control. she has collected to construct and analyze the appropriate control charts for the plant’s production processes. She intends to present this information at the next manager’s meeting on Monday morning.. Required Tasks: a. Use the data that Angie has collected to develop and analyze the appropriate control charts for this process. Be sure to label each control chart carefully and also to identify the type of control chart used. b. Do the processes analyzed appear to be in control? Why or why not? What would you suggest that Angie do? c. Does Angie need to continue to monitor her processes on a regular basis? How should she do this? Also, are there other variables that might be of interest to her in monitoring the plant’s performance? If so, what do you think they might be? TABLE 18.5. |. Late Shipments. Day. Number of Late Shipments. Day. Number of Late Shipments. 1 2 3 4 5 6 7 8 9 10. 5 3 1 6 5 8 5 6 4 4. 11 12 13 14 15 16 17 18 19 20. 8 2 5 6 2 7 7 3 2 7. References Crosby, Philip B., Quality Is Free: The Art of Making Quality Certain (New York: McGraw-Hill, 1979). Deming, W. Edwards, Out of the Crisis (Cambridge, MA: MIT Center for Advanced Engineering Study, 1986). Evans, James R., and William M. Lindsay, Managing for Quality and Performance Excellence (Cincinnati,OH: South-Western College Publishing, 2007). Foster, S. Thomas, Managing Quality: Integrating the Supply Chain and Student CD PKG, 3rd ed. (Upper Saddle River, NJ: Prentice Hall, 2007). Juran, Joseph M., Quality Control Handbook, 5th ed. (New York: McGraw-Hill, 1999). Microsoft Excel 2007 (Redmond,WA: Microsoft Corp., 2007). Minitab for Windows Version 15 (State College, PA: Minitab, 2007). Mitra, Amitava, Fundamentals of Quality Control and Improvement, 2d ed. (Upper Saddle River, NJ: Prentice Hall, 1998)..

(380) List of Appendix Tables APPENDIX A Random Numbers Table 837 APPENDIX B Cumulative Binomial Distribution Table 838 APPENDIX C Cumulative Poisson Probability Distribution Table 851 APPENDIX D Standard Normal Distribution Table 856 APPENDIX E. Exponential Distribution Table 857. Values of t for Selected Probabilities 858 2 APPENDIX G Values of χ for Selected Probabilities 859 APPENDIX F. APPENDIX H F-Distribution Table 860 APPENDIX I. Critical Values of Hartley’s Fmax Test 866. APPENDIX J. Distribution of the Studentized Range (q-values) 867. APPENDIX K Critical Values of r in the Runs Test 869 APPENDIX L. Mann-Whitney U Test Probabilities (n < 9) 870. APPENDIX M Mann-Whitney U Test Critical Values (9 ≤ n ≤ 20) 872 APPENDIX N Critical Values of T in the Wilcoxon Matched-Pairs Signed-Ranks Test (n ≤ 25) 874 APPENDIX O Critical Values dL and dU of the Durbin-Watson Statistic D 875 APPENDIX P. Lower and Upper Critical Values W of Wilcoxon Signed-Ranks Test 877. APPENDIX Q Control Chart Factors 878. 836.

(381) APPENDIX A. APPENDIX A Random Numbers Table. 1511 6249 2587 0168 9664 1384 6390 6944 3610 9865 7044 9304 1717 2461 8240 1697 4695 3056 6887 1267 4369 2888 9893 8927 2676 0775 3828 3281 0328 8406 7076 0446 3719 5648 3694 3554 4934 7835 1098 1186 4618 5529 0754 5865 6168 7479 4608 0654 3000 2686 4713 9281 5736 2383 8740. 4745 7073 4800 1379 9021 4981 8953 8134 3119 0028 6712 4857 8278 3598 9856 6805 2251 8558 9035 8824 9267 0333 7251 3977 7064 7316 9178 3419 7471 1826 8418 8641 9712 0563 8582 9876 8446 1506 2113 2685 1522 4173 5808 0806 8963 4144 6576 2483 9694 3675 4121 6522 9419 0408 8038. 8716 0460 3455 7838 4990 2708 4292 0704 7442 1783 7530 5476 0072 5173 0075 1386 8962 3020 8520 5588 9377 5347 6243 6054 2198 2249 3726 6660 5352 8437 6778 3249 7472 6346 3434 4249 4646 0019 8287 7225 0627 5711 8458 2070 0235 6697 9422 6001 6616 5760 5144 7916 5022 2821 7284. 2793 0819 7565 7487 5570 6437 7372 8500 6218 9029 0018 8386 2636 9666 7599 2340 5638 7509 6571 2821 8205 4849 4617 5979 3234 5606 0743 7968 2019 3078 1292 5431 1517 1981 4052 9473 2054 5011 3487 8311 0448 7419 2218 7986 1514 2255 4198 4486 5599 2918 5164 8941 6955 7313 6054. 9142 0729 1196 7420 4697 2298 7197 6996 7623 2858 0945 1540 3217 6165 8468 6694 9459 5105 3233 1247 6479 5526 9256 8566 3796 9411 4075 1238 5842 9068 2019 4068 8850 9512 8392 9085 1136 0563 8250 3835 0669 2535 9180 4800 7875 5465 2578 4941 7759 0185 8104 6710 3356 5781 2246. 4958 6806 7768 5285 7939 6230 2121 3492 0546 8737 8803 5760 1693 7438 7653 9786 5578 4283 7175 0967 7002 2975 4039 8120 5506 3818 3560 2246 1665 1425 3506 6045 6862 0659 3883 6594 1023 4450 2269 8059 4086 5876 6213 3076 2176 7233 1701 1500 1581 7364 0403 1670 5732 6951 1674. 5245 2713 6137 8045 5842 7443 6538 4397 8394 7023 4467 9815 6081 6805 6272 0536 0676 5390 2859 4355 0649 5295 4800 2566 4462 5268 9542 2164 5939 1232 7474 1939 6990 5694 5126 2434 6295 1466 1876 9163 4083 8435 5280 2866 3095 4981 4764 3502 9896 9985 4984 1399 1042 7181 9984. 8312 6595 4941 6679 5353 9425 2093 8802 3286 0444 0979 7191 1330 2357 0573 6423 2276 5715 1615 1385 4731 5071 9393 4449 5121 7652 3922 4567 6337 0573 0141 5626 5475 6668 0477 9453 6483 6334 3684 2539 0881 2564 4753 0515 1171 3553 7460 9693 2312 5930 3877 5961 0527 0608 0355. 837 8925 5149 0488 1361 7503 5384 7629 3253 4463 8575 1342 3291 3458 6994 4344 1083 4724 8405 3349 0727 7086 6011 3263 2414 9052 6098 7688 1801 9102 7751 6544 1867 6227 2563 4034 8883 9915 2606 8856 6487 4270 3031 0696 7417 7892 8144 3509 1956 8140 9869 8772 4714 7441 2864 0775.

(382) 838. APPENDIX B. X. APPENDIX B. P( x ≤ X ) =. ∑ i!(n i)!. p i (1 p) ni. p 0.06 0.9400 1.0000 p 0.35 0.6500 1.0000 p 0.80 0.2000 1.0000 p 0.97 0.0300 1.0000. p 0.07 0.9300 1.0000 p 0.40 0.6000 1.0000 p 0.85 0.1500 1.0000 p 0.98 0.0200 1.0000. p 0.08 0.9200 1.0000 p 0.45 0.5500 1.0000 p 0.90 0.1000 1.0000 p 0.99 0.0100 1.0000. p 0.09 0.9100 1.0000 p 0.50 0.5000 1.0000 p 0.91 0.0900 1.0000 p 1.00 0.0000 1.0000. p 0.06 0.8836 0.9964 1.0000 p 0.35 0.4225 0.8775 1.0000 p 0.80 0.0400 0.3600 1.0000 p 0.97 0.0009 0.0591 1.0000. p 0.07 0.8649 0.9951 1.0000 p 0.40 0.3600 0.8400 1.0000 p 0.85 0.0225 0.2775 1.0000 p 0.98 0.0004 0.0396 1.0000. p 0.08 0.8464 0.9936 1.0000 p 0.45 0.3025 0.7975 1.0000 p 0.90 0.0100 0.1900 1.0000 p 0.99 0.0001 0.0199 1.0000. p 0.09 0.8281 0.9919 1.0000 p 0.50 0.2500 0.7500 1.0000 p 0.91 0.0081 0.1719 1.0000 p 1.00 0.0000 0.0000 1.0000. n!. i =0. Cumulative Binomial Distribution Table n1 x 0 1 x 0 1 x 0 1 x 0 1. p 0.01 0.9900 1.0000 p 0.10 0.9000 1.0000 p 0.55 0.4500 1.0000 p 0.92 0.0800 1.0000. p 0.02 0.9800 1.0000 p 0.15 0.8500 1.0000 p 0.60 0.4000 1.0000 p 0.93 0.0700 1.0000. p 0.03 0.9700 1.0000 p 0.20 0.8000 1.0000 p 0.65 0.3500 1.0000 p 0.94 0.0600 1.0000. p 0.04 0.9600 1.0000 p 0.25 0.7500 1.0000 p 0.70 0.3000 1.0000 p 0.95 0.0500 1.0000. p 0.05 0.9500 1.0000 p 0.30 0.7000 1.0000 p 0.75 0.2500 1.0000 p 0.96 0.0400 1.0000. n2 x 0 1 2 x 0 1 2 x 0 1 2 x 0 1 2. p 0.01 0.9801 0.9999 1.0000 p 0.10 0.8100 0.9900 1.0000 p 0.55 0.2025 0.6975 1.0000 p 0.92 0.0064 0.1536 1.0000. p 0.02 0.9604 0.9996 1.0000 p 0.15 0.7225 0.9775 1.0000 p 0.60 0.1600 0.6400 1.0000 p 0.93 0.0049 0.1351 1.0000. p 0.03 0.9409 0.9991 1.0000 p 0.20 0.6400 0.9600 1.0000 p 0.65 0.1225 0.5775 1.0000 p 0.94 0.0036 0.1164 1.0000. p 0.04 0.9216 0.9984 1.0000 p 0.25 0.5625 0.9375 1.0000 p 0.70 0.0900 0.5100 1.0000 p 0.95 0.0025 0.0975 1.0000. p 0.05 0.9025 0.9975 1.0000 p 0.30 0.4900 0.9100 1.0000 p 0.75 0.0625 0.4375 1.0000 p 0.96 0.0016 0.0784 1.0000. n3 x. p 0.01. p 0.02. p 0.03. p 0.04. p 0.05. p 0.06. p 0.07. p 0.08. p 0.09. 0. 0.9703. 0.9412. 0.9127. 0.8847. 0.8574. 0.8306. 0.8044. 0.7787. 0.7536. 1 2 3 x 0 1 2 3 x 0 1 2 3 x 0 1 2 3. 0.9997 1.0000 1.0000 p 0.10 0.7290 0.9720 0.9990 1.0000 p 0.55 0.0911 0.4253 0.8336 1.0000 p 0.92 0.0005 0.0182 0.2213 1.0000. 0.9988 1.0000 1.0000 p 0.15 0.6141 0.9393 0.9966 1.0000 p 0.60 0.0640 0.3520 0.7840 1.0000 p 0.93 0.0003 0.0140 0.1956 1.0000. 0.9974 1.0000 1.0000 p 0.20 0.5120 0.8960 0.9920 1.0000 p 0.65 0.0429 0.2818 0.7254 1.0000 p 0.94 0.0002 0.0104 0.1694 1.0000. 0.9953 0.9999 1.0000 p 0.25 0.4219 0.8438 0.9844 1.0000 p 0.70 0.0270 0.2160 0.6570 1.0000 p 0.95 0.0001 0.0073 0.1426 1.0000. 0.9928 0.9999 1.0000 p 0.30 0.3430 0.7840 0.9730 1.0000 p 0.75 0.0156 0.1563 0.5781 1.0000 p 0.96 0.0001 0.0047 0.1153 1.0000. 0.9896 0.9998 1.0000 p 0.35 0.2746 0.7183 0.9571 1.0000 p 0.80 0.0080 0.1040 0.4880 1.0000 p 0.97 0.0000 0.0026 0.0873 1.0000. 0.9860 0.9997 1.0000 p 0.40 0.2160 0.6480 0.9360 1.0000 p 0.85 0.0034 0.0608 0.3859 1.0000 p 0.98 0.0000 0.0012 0.0588 1.0000. 0.9818 0.9995 1.0000 p 0.45 0.1664 0.5748 0.9089 1.0000 p 0.90 0.0010 0.0280 0.2710 1.0000 p 0.99 0.0000 0.0003 0.0297 1.0000. 0.9772 0.9993 1.0000 p 0.50 0.1250 0.5000 0.8750 1.0000 p 0.91 0.0007 0.0228 0.2464 1.0000 p 1.00 0.0000 0.0000 0.0000 1.0000.

(383) APPENDIX B. n4 x. p 0.01. p 0.02. p 0.03. p 0.04. p 0.05. p 0.06. p 0.07. p 0.08. p 0.09. 0. 0.9606. 0.9224. 0.8853. 0.8493. 0.8145. 0.7807. 0.7481. 0.7164. 0.6857. 1 2 3 4 x. 0.9994 1.0000 1.0000 1.0000 p 0.10. 0.9977 1.0000 1.0000 1.0000 p 0.15. 0.9948 0.9999 1.0000 1.0000 p 0.20. 0.9909 0.9998 1.0000 1.0000 p 0.25. 0.9860 0.9995 1.0000 1.0000 p 0.30. 0.9801 0.9992 1.0000 1.0000 p 0.35. 0.9733 0.9987 1.0000 1.0000 p 0.40. 0.9656 0.9981 1.0000 1.0000 p 0.45. 0.9570 0.9973 0.9999 1.0000 p 0.50. 0. 0.6561. 0.5220. 0.4096. 0.3164. 0.2401. 0.1785. 0.1296. 0.0915. 0.0625. 1 2 3 4 x 0 1 2 3 4 x 0 1 2 3 4. 0.9477 0.9963 0.9999 1.0000 p 0.55 0.0410 0.2415 0.6090 0.9085 1.0000 p 0.92 0.0000 0.0019 0.0344 0.2836 1.0000. 0.8905 0.9880 0.9995 1.0000 p 0.60 0.0256 0.1792 0.5248 0.8704 1.0000 p 0.93 0.0000 0.0013 0.0267 0.2519 1.0000. 0.8192 0.9728 0.9984 1.0000 p 0.65 0.0150 0.1265 0.4370 0.8215 1.0000 p 0.94 0.0000 0.0008 0.0199 0.2193 1.0000. 0.7383 0.9492 0.9961 1.0000 p 0.70 0.0081 0.0837 0.3483 0.7599 1.0000 p 0.95 0.0000 0.0005 0.0140 0.1855 1.0000. 0.6517 0.9163 0.9919 1.0000 p 0.75 0.0039 0.0508 0.2617 0.6836 1.0000 p 0.96 0.0000 0.0002 0.0091 0.1507 1.0000. 0.5630 0.8735 0.9850 1.0000 p 0.80 0.0016 0.0272 0.1808 0.5904 1.0000 p 0.97 0.0000 0.0001 0.0052 0.1147 1.0000. 0.4752 0.8208 0.9744 1.0000 p 0.85 0.0005 0.0120 0.1095 0.4780 1.0000 p 0.98 0.0000 0.0000 0.0023 0.0776 1.0000. 0.3910 0.7585 0.9590 1.0000 p 0.90 0.0001 0.0037 0.0523 0.3439 1.0000 p 0.99 0.0000 0.0000 0.0006 0.0394 1.0000. 0.3125 0.6875 0.9375 1.0000 p 0.91 0.0001 0.0027 0.0430 0.3143 1.0000 p 1.00 0.0000 0.0000 0.0000 0.0000 1.0000. n5 x. p 0.01. p 0.02. p 0.03. p 0.04. p 0.05. p 0.06. p 0.07. p 0.08. p 0.09. 0. 0.9510. 0.9039. 0.8587. 0.8154. 0.7738. 0.7339. 0.6957. 0.6591. 0.6240. 1 2 3 4 5 x. 0.9990 1.0000 1.0000 1.0000 1.0000 p 0.10. 0.9962 0.9999 1.0000 1.0000 1.0000 p 0.15. 0.9915 0.9997 1.0000 1.0000 1.0000 p 0.20. 0.9852 0.9994 1.0000 1.0000 1.0000 p 0.25. 0.9774 0.9988 1.0000 1.0000 1.0000 p 0.30. 0.9681 0.9980 0.9999 1.0000 1.0000 p 0.35. 0.9575 0.9969 0.9999 1.0000 1.0000 p 0.40. 0.9456 0.9955 0.9998 1.0000 1.0000 p 0.45. 0.9326 0.9937 0.9997 1.0000 1.0000 p 0.50. 0. 0.5905. 0.4437. 0.3277. 0.2373. 0.1681. 0.1160. 0.0778. 0.0503. 0.0313. 1 2 3 4 5 x 0 1 2 3 4 5 x 0 1 2 3 4 5. 0.9185 0.9914 0.9995 1.0000 1.0000 p 0.55 0.0185 0.1312 0.4069 0.7438 0.9497 1.0000 p 0.92 0.0000 0.0002 0.0045 0.0544 0.3409 1.0000. 0.8352 0.9734 0.9978 0.9999 1.0000 p 0.60 0.0102 0.0870 0.3174 0.6630 0.9222 1.0000 p 0.93 0.0000 0.0001 0.0031 0.0425 0.3043 1.0000. 0.7373 0.9421 0.9933 0.9997 1.0000 p 0.65 0.0053 0.0540 0.2352 0.5716 0.8840 1.0000 p 0.94 0.0000 0.0001 0.0020 0.0319 0.2661 1.0000. 0.6328 0.8965 0.9844 0.9990 1.0000 p 0.70 0.0024 0.0308 0.1631 0.4718 0.8319 1.0000 p 0.95 0.0000 0.0000 0.0012 0.0226 0.2262 1.0000. 0.5282 0.8369 0.9692 0.9976 1.0000 p 0.75 0.0010 0.0156 0.1035 0.3672 0.7627 1.0000 p 0.96 0.0000 0.0000 0.0006 0.0148 0.1846 1.0000. 0.4284 0.7648 0.9460 0.9947 1.0000 p 0.80 0.0003 0.0067 0.0579 0.2627 0.6723 1.0000 p 0.97 0.0000 0.0000 0.0003 0.0085 0.1413 1.0000. 0.3370 0.6826 0.9130 0.9898 1.0000 p 0.85 0.0001 0.0022 0.0266 0.1648 0.5563 1.0000 p 0.98 0.0000 0.0000 0.0001 0.0038 0.0961 1.0000. 0.2562 0.5931 0.8688 0.9815 1.0000 p 0.90 0.0000 0.0005 0.0086 0.0815 0.4095 1.0000 p 0.99 0.0000 0.0000 0.0000 0.0010 0.0490 1.0000. 0.1875 0.5000 0.8125 0.9688 1.0000 p 0.91 0.0000 0.0003 0.0063 0.0674 0.3760 1.0000 p 1.00 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000. x 0 1 2 3 4. p 0.01 0.9415 0.9985 1.0000 1.0000 1.0000. p 0.02 0.8858 0.9943 0.9998 1.0000 1.0000. p 0.03 0.8330 0.9875 0.9995 1.0000 1.0000. p 0.04 0.7828 0.9784 0.9988 1.0000 1.0000. p 0.06 0.6899 0.9541 0.9962 0.9998 1.0000. p 0.07 0.6470 0.9392 0.9942 0.9997 1.0000. p 0.08 0.6064 0.9227 0.9915 0.9995 1.0000. p 0.09 0.5679 0.9048 0.9882 0.9992 1.0000. n6 p 0.05 0.7351 0.9672 0.9978 0.9999 1.0000. (continued ). 839.

(384) 840. APPENDIX B 5 6 x. 1.0000 1.0000 p 0.10. 1.0000 1.0000 p 0.15. 1.0000 1.0000 p 0.20. 1.0000 1.0000 p 0.25. 1.0000 1.0000 p 0.30. 1.0000 1.0000 p 0.35. 1.0000 1.0000 p 0.40. 1.0000 1.0000 p 0.45. 1.0000 1.0000 p 0.50. 0. 0.5314. 0.3771. 0.2621. 0.1780. 0.1176. 0.0754. 0.0467. 0.0277. 0.0156. 1 2 3 4 5 6 x 0 1 2 3 4 5 6 x 0 1 2 3 4 5 6. 0.8857 0.9842 0.9987 0.9999 1.0000 1.0000 p 0.55 0.0083 0.0692 0.2553 0.5585 0.8364 0.9723 1.0000 p 0.92 0.0000 0.0000 0.0005 0.0085 0.0773 0.3936 1.0000. 0.7765 0.9527 0.9941 0.9996 1.0000 1.0000 p 0.60 0.0041 0.0410 0.1792 0.4557 0.7667 0.9533 1.0000 p 0.93 0.0000 0.0000 0.0003 0.0058 0.0608 0.3530 1.0000. 0.6554 0.9011 0.9830 0.9984 0.9999 1.0000 p 0.65 0.0018 0.0223 0.1174 0.3529 0.6809 0.9246 1.0000 p 0.94 0.0000 0.0000 0.0002 0.0038 0.0459 0.3101 1.0000. 0.5339 0.8306 0.9624 0.9954 0.9998 1.0000 p 0.70 0.0007 0.0109 0.0705 0.2557 0.5798 0.8824 1.0000 p 0.95 0.0000 0.0000 0.0001 0.0022 0.0328 0.2649 1.0000. 0.4202 0.7443 0.9295 0.9891 0.9993 1.0000 p 0.75 0.0002 0.0046 0.0376 0.1694 0.4661 0.8220 1.0000 p 0.96 0.0000 0.0000 0.0000 0.0012 0.0216 0.2172 1.0000. 0.3191 0.6471 0.8826 0.9777 0.9982 1.0000 p 0.80 0.0001 0.0016 0.0170 0.0989 0.3446 0.7379 1.0000 p 0.97 0.0000 0.0000 0.0000 0.0005 0.0125 0.1670 1.0000. 0.2333 0.5443 0.8208 0.9590 0.9959 1.0000 p 0.85 0.0000 0.0004 0.0059 0.0473 0.2235 0.6229 1.0000 p 0.98 0.0000 0.0000 0.0000 0.0002 0.0057 0.1142 1.0000. 0.1636 0.4415 0.7447 0.9308 0.9917 1.0000 p 0.90 0.0000 0.0001 0.0013 0.0159 0.1143 0.4686 1.0000 p 0.99 0.0000 0.0000 0.0000 0.0000 0.0015 0.0585 1.0000. 0.1094 0.3438 0.6563 0.8906 0.9844 1.0000 p 0.91 0.0000 0.0000 0.0008 0.0118 0.0952 0.4321 1.0000 p 1.00 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000. p 0.06 0.6485 0.9382 0.9937 0.9996 1.0000 1.0000 1.0000 1.0000 p 0.35 0.0490 0.2338 0.5323 0.8002 0.9444 0.9910 0.9994 1.0000 p 0.80 0.0000 0.0004 0.0047 0.0333 0.1480 0.4233 0.7903 1.0000 p 0.97 0.0000 0.0000 0.0000 0.0000 0.0009 0.0171 0.1920 1.0000. p 0.07 0.6017 0.9187 0.9903 0.9993 1.0000 1.0000 1.0000 1.0000 p 0.40 0.0280 0.1586 0.4199 0.7102 0.9037 0.9812 0.9984 1.0000 p 0.85 0.0000 0.0001 0.0012 0.0121 0.0738 0.2834 0.6794 1.0000 p 0.98 0.0000 0.0000 0.0000 0.0000 0.0003 0.0079 0.1319 1.0000. p 0.08 0.5578 0.8974 0.9860 0.9988 0.9999 1.0000 1.0000 1.0000 p 0.45 0.0152 0.1024 0.3164 0.6083 0.8471 0.9643 0.9963 1.0000 p 0.90 0.0000 0.0000 0.0002 0.0027 0.0257 0.1497 0.5217 1.0000 p 0.99 0.0000 0.0000 0.0000 0.0000 0.0000 0.0020 0.0679 1.0000. p 0.09 0.5168 0.8745 0.9807 0.9982 0.9999 1.0000 1.0000 1.0000 p 0.50 0.0078 0.0625 0.2266 0.5000 0.7734 0.9375 0.9922 1.0000 p 0.91 0.0000 0.0000 0.0001 0.0018 0.0193 0.1255 0.4832 1.0000 p 1.00 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000. n7 x 0 1 2 3 4 5 6 7 x 0 1 2 3 4 5 6 7 x 0 1 2 3 4 5 6 7 x 0 1 2 3 4 5 6 7. p 0.01 0.9321 0.9980 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.10 0.4783 0.8503 0.9743 0.9973 0.9998 1.0000 1.0000 1.0000 p 0.55 0.0037 0.0357 0.1529 0.3917 0.6836 0.8976 0.9848 1.0000 p 0.92 0.0000 0.0000 0.0001 0.0012 0.0140 0.1026 0.4422 1.0000. p 0.02 0.8681 0.9921 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.15 0.3206 0.7166 0.9262 0.9879 0.9988 0.9999 1.0000 1.0000 p 0.60 0.0016 0.0188 0.0963 0.2898 0.5801 0.8414 0.9720 1.0000 p 0.93 0.0000 0.0000 0.0000 0.0007 0.0097 0.0813 0.3983 1.0000. p 0.03 0.8080 0.9829 0.9991 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.20 0.2097 0.5767 0.8520 0.9667 0.9953 0.9996 1.0000 1.0000 p 0.65 0.0006 0.0090 0.0556 0.1998 0.4677 0.7662 0.9510 1.0000 p 0.94 0.0000 0.0000 0.0000 0.0004 0.0063 0.0618 0.3515 1.0000. p 0.04 0.7514 0.9706 0.9980 0.9999 1.0000 1.0000 1.0000 1.0000 p 0.25 0.1335 0.4449 0.7564 0.9294 0.9871 0.9987 0.9999 1.0000 p 0.70 0.0002 0.0038 0.0288 0.1260 0.3529 0.6706 0.9176 1.0000 p 0.95 0.0000 0.0000 0.0000 0.0002 0.0038 0.0444 0.3017 1.0000. p 0.05 0.6983 0.9556 0.9962 0.9998 1.0000 1.0000 1.0000 1.0000 p 0.30 0.0824 0.3294 0.6471 0.8740 0.9712 0.9962 0.9998 1.0000 p 0.75 0.0001 0.0013 0.0129 0.0706 0.2436 0.5551 0.8665 1.0000 p 0.96 0.0000 0.0000 0.0000 0.0001 0.0020 0.0294 0.2486 1.0000.

(385) APPENDIX B. n8 x 0 1 2 3 4 5 6 7 8 x 0 1 2 3 4 5 6 7 8 x 0 1 2 3 4 5 6 7 8 x 0 1 2 3 4 5 6 7 8. p 0.01 0.9227 0.9973 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.10 0.4305 0.8131 0.9619 0.9950 0.9996 1.0000 1.0000 1.0000 1.0000 p 0.55 0.0017 0.0181 0.0885 0.2604 0.5230 0.7799 0.9368 0.9916 1.0000 p 0.92 0.0000 0.0000 0.0000 0.0001 0.0022 0.0211 0.1298 0.4868 1.0000. p 0.02 0.8508 0.9897 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.15 0.2725 0.6572 0.8948 0.9786 0.9971 0.9998 1.0000 1.0000 1.0000 p 0.60 0.0007 0.0085 0.0498 0.1737 0.4059 0.6846 0.8936 0.9832 1.0000 p 0.93 0.0000 0.0000 0.0000 0.0001 0.0013 0.0147 0.1035 0.4404 1.0000. p 0.03 0.7837 0.9777 0.9987 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.20 0.1678 0.5033 0.7969 0.9437 0.9896 0.9988 0.9999 1.0000 1.0000 p 0.65 0.0002 0.0036 0.0253 0.1061 0.2936 0.5722 0.8309 0.9681 1.0000 p 0.94 0.0000 0.0000 0.0000 0.0000 0.0007 0.0096 0.0792 0.3904 1.0000. p 0.04 0.7214 0.9619 0.9969 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.25 0.1001 0.3671 0.6785 0.8862 0.9727 0.9958 0.9996 1.0000 1.0000 p 0.70 0.0001 0.0013 0.0113 0.0580 0.1941 0.4482 0.7447 0.9424 1.0000 p 0.95 0.0000 0.0000 0.0000 0.0000 0.0004 0.0058 0.0572 0.3366 1.0000. p 0.05 0.6634 0.9428 0.9942 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.30 0.0576 0.2553 0.5518 0.8059 0.9420 0.9887 0.9987 0.9999 1.0000 p 0.75 0.0000 0.0004 0.0042 0.0273 0.1138 0.3215 0.6329 0.8999 1.0000 p 0.96 0.0000 0.0000 0.0000 0.0000 0.0002 0.0031 0.0381 0.2786 1.0000. p 0.06 0.6096 0.9208 0.9904 0.9993 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.35 0.0319 0.1691 0.4278 0.7064 0.8939 0.9747 0.9964 0.9998 1.0000 p 0.80 0.0000 0.0001 0.0012 0.0104 0.0563 0.2031 0.4967 0.8322 1.0000 p 0.97 0.0000 0.0000 0.0000 0.0000 0.0001 0.0013 0.0223 0.2163 1.0000. p 0.07 0.5596 0.8965 0.9853 0.9987 0.9999 1.0000 1.0000 1.0000 1.0000 p 0.40 0.0168 0.1064 0.3154 0.5941 0.8263 0.9502 0.9915 0.9993 1.0000 p 0.85 0.0000 0.0000 0.0002 0.0029 0.0214 0.1052 0.3428 0.7275 1.0000 p 0.98 0.0000 0.0000 0.0000 0.0000 0.0000 0.0004 0.0103 0.1492 1.0000. p 0.08 0.5132 0.8702 0.9789 0.9978 0.9999 1.0000 1.0000 1.0000 1.0000 p 0.45 0.0084 0.0632 0.2201 0.4770 0.7396 0.9115 0.9819 0.9983 1.0000 p 0.90 0.0000 0.0000 0.0000 0.0004 0.0050 0.0381 0.1869 0.5695 1.0000 p 0.99 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0027 0.0773 1.0000. p 0.09 0.4703 0.8423 0.9711 0.9966 0.9997 1.0000 1.0000 1.0000 1.0000 p 0.50 0.0039 0.0352 0.1445 0.3633 0.6367 0.8555 0.9648 0.9961 1.0000 p 0.91 0.0000 0.0000 0.0000 0.0003 0.0034 0.0289 0.1577 0.5297 1.0000 p 1.00 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000. n9 x 0 1 2 3 4 5 6 7 8 9 x 0 1 2 3 4 5 6 7 8 9. p 0.01. p 0.02. p 0.03. p 0.04. p 0.05. p 0.06. p 0.07. p 0.08. p 0.09. 0.9135. 0.8337. 0.7602. 0.6925. 0.6302. 0.5730. 0.5204. 0.4722. 0.4279. 0.9966 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.10 0.3874 0.7748 0.9470 0.9917 0.9991 0.9999 1.0000 1.0000 1.0000 1.0000. 0.9869 0.9994 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.15 0.2316 0.5995 0.8591 0.9661 0.9944 0.9994 1.0000 1.0000 1.0000 1.0000. 0.9718 0.9980 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.20 0.1342 0.4362 0.7382 0.9144 0.9804 0.9969 0.9997 1.0000 1.0000 1.0000. 0.9522 0.9955 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.25 0.0751 0.3003 0.6007 0.8343 0.9511 0.9900 0.9987 0.9999 1.0000 1.0000. 0.9288 0.9916 0.9994 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.30 0.0404 0.1960 0.4628 0.7297 0.9012 0.9747 0.9957 0.9996 1.0000 1.0000. 0.9022 0.9862 0.9987 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.35 0.0207 0.1211 0.3373 0.6089 0.8283 0.9464 0.9888 0.9986 0.9999 1.0000. 0.8729 0.9791 0.9977 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.40 0.0101 0.0705 0.2318 0.4826 0.7334 0.9006 0.9750 0.9962 0.9997 1.0000. 0.8417 0.9702 0.9963 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.45 0.0046 0.0385 0.1495 0.3614 0.6214 0.8342 0.9502 0.9909 0.9992 1.0000. 0.8088 0.9595 0.9943 0.9995 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.50 0.0020 0.0195 0.0898 0.2539 0.5000 0.7461 0.9102 0.9805 0.9980 1.0000 (continued ). 841.

(386) APPENDIX B. 842 x 0 1 2 3 4 5 6 7 8 9 x 0 1 2 3 4 5 6 7 8 9. p 0.55 0.0008 0.0091 0.0498 0.1658 0.3786 0.6386 0.8505 0.9615 0.9954 1.0000 p 0.92 0.0000 0.0000 0.0000 0.0000 0.0003 0.0037 0.0298 0.1583 0.5278 1.0000. p 0.60 0.0003 0.0038 0.0250 0.0994 0.2666 0.5174 0.7682 0.9295 0.9899 1.0000 p 0.93 0.0000 0.0000 0.0000 0.0000 0.0002 0.0023 0.0209 0.1271 0.4796 1.0000. p 0.65 0.0001 0.0014 0.0112 0.0536 0.1717 0.3911 0.6627 0.8789 0.9793 1.0000 p 0.94 0.0000 0.0000 0.0000 0.0000 0.0001 0.0013 0.0138 0.0978 0.4270 1.0000. p 0.70 0.0000 0.0004 0.0043 0.0253 0.0988 0.2703 0.5372 0.8040 0.9596 1.0000 p 0.95 0.0000 0.0000 0.0000 0.0000 0.0000 0.0006 0.0084 0.0712 0.3698 1.0000. p 0.75 0.0000 0.0001 0.0013 0.0100 0.0489 0.1657 0.3993 0.6997 0.9249 1.0000 p 0.96 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0045 0.0478 0.3075 1.0000. p 0.80 0.0000 0.0000 0.0003 0.0031 0.0196 0.0856 0.2618 0.5638 0.8658 1.0000 p 0.97 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0020 0.0282 0.2398 1.0000. p 0.85 0.0000 0.0000 0.0000 0.0006 0.0056 0.0339 0.1409 0.4005 0.7684 1.0000 p 0.98 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0006 0.0131 0.1663 1.0000. p 0.90 0.0000 0.0000 0.0000 0.0001 0.0009 0.0083 0.0530 0.2252 0.6126 1.0000 p 0.99 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0034 0.0865 1.0000. p 0.91 0.0000 0.0000 0.0000 0.0000 0.0005 0.0057 0.0405 0.1912 0.5721 1.0000 p 1.00 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000. n 10 x 0 1 2 3 4 5 6 7 8 9 10 x 0 1 2 3 4 5 6 7 8 9 10 x 0 1 2 3 4 5 6 7 8 9 10 x. p 0.01 0.9044 0.9957 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.10 0.3487 0.7361 0.9298 0.9872 0.9984 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.55 0.0003 0.0045 0.0274 0.1020 0.2616 0.4956 0.7340 0.9004 0.9767 0.9975 1.0000 p 0.92. p 0.02 0.8171 0.9838 0.9991 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.15 0.1969 0.5443 0.8202 0.9500 0.9901 0.9986 0.9999 1.0000 1.0000 1.0000 1.0000 p 0.60 0.0001 0.0017 0.0123 0.0548 0.1662 0.3669 0.6177 0.8327 0.9536 0.9940 1.0000 p 0.93. p 0.03 0.7374 0.9655 0.9972 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.20 0.1074 0.3758 0.6778 0.8791 0.9672 0.9936 0.9991 0.9999 1.0000 1.0000 1.0000 p 0.65 0.0000 0.0005 0.0048 0.0260 0.0949 0.2485 0.4862 0.7384 0.9140 0.9865 1.0000 p 0.94. p 0.04 0.6648 0.9418 0.9938 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.25 0.0563 0.2440 0.5256 0.7759 0.9219 0.9803 0.9965 0.9996 1.0000 1.0000 1.0000 p 0.70 0.0000 0.0001 0.0016 0.0106 0.0473 0.1503 0.3504 0.6172 0.8507 0.9718 1.0000 p 0.95. p 0.05 0.5987 0.9139 0.9885 0.9990 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.30 0.0282 0.1493 0.3828 0.6496 0.8497 0.9527 0.9894 0.9984 0.9999 1.0000 1.0000 p 0.75 0.0000 0.0000 0.0004 0.0035 0.0197 0.0781 0.2241 0.4744 0.7560 0.9437 1.0000 p 0.96. p 0.06 0.5386 0.8824 0.9812 0.9980 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.35 0.0135 0.0860 0.2616 0.5138 0.7515 0.9051 0.9740 0.9952 0.9995 1.0000 1.0000 p 0.80 0.0000 0.0000 0.0001 0.0009 0.0064 0.0328 0.1209 0.3222 0.6242 0.8926 1.0000 p 0.97. p 0.07 0.4840 0.8483 0.9717 0.9964 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.40 0.0060 0.0464 0.1673 0.3823 0.6331 0.8338 0.9452 0.9877 0.9983 0.9999 1.0000 p 0.85 0.0000 0.0000 0.0000 0.0001 0.0014 0.0099 0.0500 0.1798 0.4557 0.8031 1.0000 p 0.98. p 0.08 0.4344 0.8121 0.9599 0.9942 0.9994 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.45 0.0025 0.0233 0.0996 0.2660 0.5044 0.7384 0.8980 0.9726 0.9955 0.9997 1.0000 p 0.90 0.0000 0.0000 0.0000 0.0000 0.0001 0.0016 0.0128 0.0702 0.2639 0.6513 1.0000 p 0.99. p 0.09 0.3894 0.7746 0.9460 0.9912 0.9990 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.50 0.0010 0.0107 0.0547 0.1719 0.3770 0.6230 0.8281 0.9453 0.9893 0.9990 1.0000 p 0.91 0.0000 0.0000 0.0000 0.0000 0.0001 0.0010 0.0088 0.0540 0.2254 0.6106 1.0000 p 1.00. 0. 0.0000. 0.0000. 0.0000. 0.0000. 0.0000. 0.0000. 0.0000. 0.0000. 0.0000. 1 2. 0.0000 0.0000. 0.0000 0.0000. 0.0000 0.0000. 0.0000 0.0000. 0.0000 0.0000. 0.0000 0.0000. 0.0000 0.0000. 0.0000 0.0000. 0.0000 0.0000.

(387) APPENDIX B 3 4 5 6 7 8 9 10. 0.0000 0.0000 0.0006 0.0058 0.0401 0.1879 0.5656 1.0000. 0.0000 0.0000 0.0003 0.0036 0.0283 0.1517 0.5160 1.0000. 0.0000 0.0000 0.0002 0.0020 0.0188 0.1176 0.4614 1.0000. 0.0000 0.0000 0.0001 0.0010 0.0115 0.0861 0.4013 1.0000. 0.0000 0.0000 0.0000 0.0004 0.0062 0.0582 0.3352 1.0000. 0.0000 0.0000 0.0000 0.0001 0.0028 0.0345 0.2626 1.0000. 0.0000 0.0000 0.0000 0.0000 0.0009 0.0162 0.1829 1.0000. 0.0000 0.0000 0.0000 0.0000 0.0001 0.0043 0.0956 1.0000. 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000. p 0.06 0.5063 0.8618 0.9752 0.9970 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.35 0.0088 0.0606 0.2001 0.4256 0.6683 0.8513 0.9499 0.9878 0.9980 0.9998 1.0000 1.0000 p 0.80 0.0000 0.0000 0.0000 0.0002 0.0020 0.0117 0.0504 0.1611 0.3826 0.6779 0.9141 1.0000 p 0.97 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0037 0.0413 0.2847 1.0000. p 0.07 0.4501 0.8228 0.9630 0.9947 0.9995 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.40 0.0036 0.0302 0.1189 0.2963 0.5328 0.7535 0.9006 0.9707 0.9941 0.9993 1.0000 1.0000 p 0.85 0.0000 0.0000 0.0000 0.0000 0.0003 0.0027 0.0159 0.0694 0.2212 0.5078 0.8327 1.0000 p 0.98 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0012 0.0195 0.1993 1.0000. p 0.08 0.3996 0.7819 0.9481 0.9915 0.9990 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.45 0.0014 0.0139 0.0652 0.1911 0.3971 0.6331 0.8262 0.9390 0.9852 0.9978 0.9998 1.0000 p 0.90 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0028 0.0185 0.0896 0.3026 0.6862 1.0000 p 0.99 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0052 0.1047 1.0000. p 0.09 0.3544 0.7399 0.9305 0.9871 0.9983 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.50 0.0005 0.0059 0.0327 0.1133 0.2744 0.5000 0.7256 0.8867 0.9673 0.9941 0.9995 1.0000 p 0.91 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0017 0.0129 0.0695 0.2601 0.6456 1.0000 p 1.00 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000. n 11 x 0 1 2 3 4 5 6 7 8 9 10 11 x 0 1 2 3 4 5 6 7 8 9 10 11 x 0 1 2 3 4 5 6 7 8 9 10 11 x 0 1 2 3 4 5 6 7 8 9 10 11. p 0.01 0.8953 0.9948 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.10 0.3138 0.6974 0.9104 0.9815 0.9972 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.55 0.0002 0.0022 0.0148 0.0610 0.1738 0.3669 0.6029 0.8089 0.9348 0.9861 0.9986 1.0000 p 0.92 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0010 0.0085 0.0519 0.2181 0.6004 1.0000. p 0.02 0.8007 0.9805 0.9988 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.15 0.1673 0.4922 0.7788 0.9306 0.9841 0.9973 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.60 0.0000 0.0007 0.0059 0.0293 0.0994 0.2465 0.4672 0.7037 0.8811 0.9698 0.9964 1.0000 p 0.93 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0005 0.0053 0.0370 0.1772 0.5499 1.0000. p 0.03 0.7153 0.9587 0.9963 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.20 0.0859 0.3221 0.6174 0.8389 0.9496 0.9883 0.9980 0.9998 1.0000 1.0000 1.0000 1.0000 p 0.65 0.0000 0.0002 0.0020 0.0122 0.0501 0.1487 0.3317 0.5744 0.7999 0.9394 0.9912 1.0000 p 0.94 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0030 0.0248 0.1382 0.4937 1.0000. p 0.04 0.6382 0.9308 0.9917 0.9993 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.25 0.0422 0.1971 0.4552 0.7133 0.8854 0.9657 0.9924 0.9988 0.9999 1.0000 1.0000 1.0000 p 0.70 0.0000 0.0000 0.0006 0.0043 0.0216 0.0782 0.2103 0.4304 0.6873 0.8870 0.9802 1.0000 p 0.95 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0016 0.0152 0.1019 0.4312 1.0000. p 0.05 0.5688 0.8981 0.9848 0.9984 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.30 0.0198 0.1130 0.3127 0.5696 0.7897 0.9218 0.9784 0.9957 0.9994 1.0000 1.0000 1.0000 p 0.75 0.0000 0.0000 0.0001 0.0012 0.0076 0.0343 0.1146 0.2867 0.5448 0.8029 0.9578 1.0000 p 0.96 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0007 0.0083 0.0692 0.3618 1.0000. (continued ). 843.

(388) 844. APPENDIX B. n 12 x. p 0.01. p 0.02. p 0.03. p 0.04. p 0.05. p 0.06. p 0.07. p 0.08. p 0.09. 0. 0.8864. 0.7847. 0.6938. 0.6127. 0.5404. 0.4759. 0.4186. 0.3677. 0.3225. 1 2 3 4 5 6 7 8 9 10 11 12 x 0 1 2 3 4 5 6 7 8 9 10 11 12 x 0 1 2 3 4 5 6 7 8 9 10 11 12 x 0 1 2 3 4 5 6 7 8 9 10 11 12. 0.9938 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.10 0.2824 0.6590 0.8891 0.9744 0.9957 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.55 0.0001 0.0011 0.0079 0.0356 0.1117 0.2607 0.4731 0.6956 0.8655 0.9579 0.9917 0.9992 1.0000 p 0.92 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0016 0.0120 0.0652 0.2487 0.6323 1.0000. 0.9769 0.9985 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.15 0.1422 0.4435 0.7358 0.9078 0.9761 0.9954 0.9993 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.60 0.0000 0.0003 0.0028 0.0153 0.0573 0.1582 0.3348 0.5618 0.7747 0.9166 0.9804 0.9978 1.0000 p 0.93 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0009 0.0075 0.0468 0.2033 0.5814 1.0000. 0.9514 0.9952 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.20 0.0687 0.2749 0.5583 0.7946 0.9274 0.9806 0.9961 0.9994 0.9999 1.0000 1.0000 1.0000 1.0000 p 0.65 0.0000 0.0001 0.0008 0.0056 0.0255 0.0846 0.2127 0.4167 0.6533 0.8487 0.9576 0.9943 1.0000 p 0.94 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0004 0.0043 0.0316 0.1595 0.5241 1.0000. 0.9191 0.9893 0.9990 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.25 0.0317 0.1584 0.3907 0.6488 0.8424 0.9456 0.9857 0.9972 0.9996 1.0000 1.0000 1.0000 1.0000 p 0.70 0.0000 0.0000 0.0002 0.0017 0.0095 0.0386 0.1178 0.2763 0.5075 0.7472 0.9150 0.9862 1.0000 p 0.95 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0022 0.0196 0.1184 0.4596 1.0000. 0.8816 0.9804 0.9978 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.30 0.0138 0.0850 0.2528 0.4925 0.7237 0.8822 0.9614 0.9905 0.9983 0.9998 1.0000 1.0000 1.0000 p 0.75 0.0000 0.0000 0.0000 0.0004 0.0028 0.0143 0.0544 0.1576 0.3512 0.6093 0.8416 0.9683 1.0000 p 0.96 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0010 0.0107 0.0809 0.3873 1.0000. 0.8405 0.9684 0.9957 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.35 0.0057 0.0424 0.1513 0.3467 0.5833 0.7873 0.9154 0.9745 0.9944 0.9992 0.9999 1.0000 1.0000 p 0.80 0.0000 0.0000 0.0000 0.0001 0.0006 0.0039 0.0194 0.0726 0.2054 0.4417 0.7251 0.9313 1.0000 p 0.97 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0048 0.0486 0.3062 1.0000. 0.7967 0.9532 0.9925 0.9991 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.40 0.0022 0.0196 0.0834 0.2253 0.4382 0.6652 0.8418 0.9427 0.9847 0.9972 0.9997 1.0000 1.0000 p 0.85 0.0000 0.0000 0.0000 0.0000 0.0001 0.0007 0.0046 0.0239 0.0922 0.2642 0.5565 0.8578 1.0000 p 0.98 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0015 0.0231 0.2153 1.0000. 0.7513 0.9348 0.9880 0.9984 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.45 0.0008 0.0083 0.0421 0.1345 0.3044 0.5269 0.7393 0.8883 0.9644 0.9921 0.9989 0.9999 1.0000 p 0.90 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0005 0.0043 0.0256 0.1109 0.3410 0.7176 1.0000 p 0.99 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0062 0.1136 1.0000. 0.7052 0.9134 0.9820 0.9973 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.50 0.0002 0.0032 0.0193 0.0730 0.1938 0.3872 0.6128 0.8062 0.9270 0.9807 0.9968 0.9998 1.0000 p 0.91 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0027 0.0180 0.0866 0.2948 0.6775 1.0000 p 1.00 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000. x 0 1 2 3. p 0.01 0.8775 0.9928 0.9997 1.0000. p 0.02 0.7690 0.9730 0.9980 0.9999. p 0.03 0.6730 0.9436 0.9938 0.9995. p 0.04 0.5882 0.9068 0.9865 0.9986. p 0.06 0.4474 0.8186 0.9608 0.9940. p 0.07 0.3893 0.7702 0.9422 0.9897. p 0.08 0.3383 0.7206 0.9201 0.9837. p 0.09 0.2935 0.6707 0.8946 0.9758. n 13 p 0.05 0.5133 0.8646 0.9755 0.9969.

(389) APPENDIX B 4 5 6 7 8 9 10 11 12 13 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.10 0.2542 0.6213 0.8661 0.9658 0.9935 0.9991 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.55 0.0000 0.0005 0.0041 0.0203 0.0698 0.1788 0.3563 0.5732 0.7721 0.9071 0.9731 0.9951 0.9996 1.0000 p 0.92 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0024 0.0163 0.0799 0.2794 0.6617 1.0000. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.15 0.1209 0.3983 0.6920 0.8820 0.9658 0.9925 0.9987 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.60 0.0000 0.0001 0.0013 0.0078 0.0321 0.0977 0.2288 0.4256 0.6470 0.8314 0.9421 0.9874 0.9987 1.0000 p 0.93 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0013 0.0103 0.0578 0.2298 0.6107 1.0000. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.20 0.0550 0.2336 0.5017 0.7473 0.9009 0.9700 0.9930 0.9988 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.65 0.0000 0.0000 0.0003 0.0025 0.0126 0.0462 0.1295 0.2841 0.4995 0.7217 0.8868 0.9704 0.9963 1.0000 p 0.94 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0007 0.0060 0.0392 0.1814 0.5526 1.0000. 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.25 0.0238 0.1267 0.3326 0.5843 0.7940 0.9198 0.9757 0.9944 0.9990 0.9999 1.0000 1.0000 1.0000 1.0000 p 0.70 0.0000 0.0000 0.0001 0.0007 0.0040 0.0182 0.0624 0.1654 0.3457 0.5794 0.7975 0.9363 0.9903 1.0000 p 0.95 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0031 0.0245 0.1354 0.4867 1.0000. 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.30 0.0097 0.0637 0.2025 0.4206 0.6543 0.8346 0.9376 0.9818 0.9960 0.9993 0.9999 1.0000 1.0000 1.0000 p 0.75 0.0000 0.0000 0.0000 0.0001 0.0010 0.0056 0.0243 0.0802 0.2060 0.4157 0.6674 0.8733 0.9762 1.0000 p 0.96 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0014 0.0135 0.0932 0.4118 1.0000. 0.9993 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.35 0.0037 0.0296 0.1132 0.2783 0.5005 0.7159 0.8705 0.9538 0.9874 0.9975 0.9997 1.0000 1.0000 1.0000 p 0.80 0.0000 0.0000 0.0000 0.0000 0.0002 0.0012 0.0070 0.0300 0.0991 0.2527 0.4983 0.7664 0.9450 1.0000 p 0.97 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0005 0.0062 0.0564 0.3270 1.0000. 0.9987 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.40 0.0013 0.0126 0.0579 0.1686 0.3530 0.5744 0.7712 0.9023 0.9679 0.9922 0.9987 0.9999 1.0000 1.0000 p 0.85 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0013 0.0075 0.0342 0.1180 0.3080 0.6017 0.8791 1.0000 p 0.98 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0020 0.0270 0.2310 1.0000. 0.9976 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.45 0.0004 0.0049 0.0269 0.0929 0.2279 0.4268 0.6437 0.8212 0.9302 0.9797 0.9959 0.9995 1.0000 1.0000 p 0.90 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0009 0.0065 0.0342 0.1339 0.3787 0.7458 1.0000 p 0.99 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0072 0.1225 1.0000. 0.9959 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.50 0.0001 0.0017 0.0112 0.0461 0.1334 0.2905 0.5000 0.7095 0.8666 0.9539 0.9888 0.9983 0.9999 1.0000 p 0.91 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0005 0.0041 0.0242 0.1054 0.3293 0.7065 1.0000 p 1.00 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000. p 0.06 0.4205 0.7963 0.9522 0.9920 0.9990 0.9999 1.0000. p 0.07 0.3620 0.7436 0.9302 0.9864 0.9980 0.9998 1.0000. p 0.08 0.3112 0.6900 0.9042 0.9786 0.9965 0.9996 1.0000. p 0.09 0.2670 0.6368 0.8745 0.9685 0.9941 0.9992 0.9999. n 14 x 0 1 2 3 4 5 6. p 0.01 0.8687 0.9916 0.9997 1.0000 1.0000 1.0000 1.0000. p 0.02 0.7536 0.9690 0.9975 0.9999 1.0000 1.0000 1.0000. p 0.03 0.6528 0.9355 0.9923 0.9994 1.0000 1.0000 1.0000. p 0.04 0.5647 0.8941 0.9833 0.9981 0.9998 1.0000 1.0000. p 0.05 0.4877 0.8470 0.9699 0.9958 0.9996 1.0000 1.0000. (continued ). 845.

(390) 846. APPENDIX B 7 8 9 10 11 12 13 14 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.10 0.2288 0.5846 0.8416 0.9559 0.9908 0.9985 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.55 0.0000 0.0003 0.0022 0.0114 0.0426 0.1189 0.2586 0.4539 0.6627 0.8328 0.9368 0.9830 0.9971 0.9998 1.0000 p 0.92 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0004 0.0035 0.0214 0.0958 0.3100 0.6888 1.0000. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.15 0.1028 0.3567 0.6479 0.8535 0.9533 0.9885 0.9978 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.60 0.0000 0.0001 0.0006 0.0039 0.0175 0.0583 0.1501 0.3075 0.5141 0.7207 0.8757 0.9602 0.9919 0.9992 1.0000 p 0.93 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0020 0.0136 0.0698 0.2564 0.6380 1.0000. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.20 0.0440 0.1979 0.4481 0.6982 0.8702 0.9561 0.9884 0.9976 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.65 0.0000 0.0000 0.0001 0.0011 0.0060 0.0243 0.0753 0.1836 0.3595 0.5773 0.7795 0.9161 0.9795 0.9976 1.0000 p 0.94 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0010 0.0080 0.0478 0.2037 0.5795 1.0000. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.25 0.0178 0.1010 0.2811 0.5213 0.7415 0.8883 0.9617 0.9897 0.9978 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.70 0.0000 0.0000 0.0000 0.0002 0.0017 0.0083 0.0315 0.0933 0.2195 0.4158 0.6448 0.8392 0.9525 0.9932 1.0000 p 0.95 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0004 0.0042 0.0301 0.1530 0.5123 1.0000. x 0 1 2 3. p 0.01 0.8601 0.9904 0.9996 1.0000. p 0.02 0.7386 0.9647 0.9970 0.9998. p 0.03 0.6333 0.9270 0.9906 0.9992. p 0.04 0.5421 0.8809 0.9797 0.9976. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.30 0.0068 0.0475 0.1608 0.3552 0.5842 0.7805 0.9067 0.9685 0.9917 0.9983 0.9998 1.0000 1.0000 1.0000 1.0000 p 0.75 0.0000 0.0000 0.0000 0.0000 0.0003 0.0022 0.0103 0.0383 0.1117 0.2585 0.4787 0.7189 0.8990 0.9822 1.0000 p 0.96 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0019 0.0167 0.1059 0.4353 1.0000. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.35 0.0024 0.0205 0.0839 0.2205 0.4227 0.6405 0.8164 0.9247 0.9757 0.9940 0.9989 0.9999 1.0000 1.0000 1.0000 p 0.80 0.0000 0.0000 0.0000 0.0000 0.0000 0.0004 0.0024 0.0116 0.0439 0.1298 0.3018 0.5519 0.8021 0.9560 1.0000 p 0.97 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0006 0.0077 0.0645 0.3472 1.0000. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.40 0.0008 0.0081 0.0398 0.1243 0.2793 0.4859 0.6925 0.8499 0.9417 0.9825 0.9961 0.9994 0.9999 1.0000 1.0000 p 0.85 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0022 0.0115 0.0467 0.1465 0.3521 0.6433 0.8972 1.0000 p 0.98 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0025 0.0310 0.2464 1.0000. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.45 0.0002 0.0029 0.0170 0.0632 0.1672 0.3373 0.5461 0.7414 0.8811 0.9574 0.9886 0.9978 0.9997 1.0000 1.0000 p 0.90 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0015 0.0092 0.0441 0.1584 0.4154 0.7712 1.0000 p 0.99 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0084 0.1313 1.0000. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.50 0.0001 0.0009 0.0065 0.0287 0.0898 0.2120 0.3953 0.6047 0.7880 0.9102 0.9713 0.9935 0.9991 0.9999 1.0000 p 0.91 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0008 0.0059 0.0315 0.1255 0.3632 0.7330 1.0000 p 1.00 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000. p 0.06 0.3953 0.7738 0.9429 0.9896. p 0.07 0.3367 0.7168 0.9171 0.9825. p 0.08 0.2863 0.6597 0.8870 0.9727. p 0.09 0.2430 0.6035 0.8531 0.9601. n 15 p 0.05 0.4633 0.8290 0.9638 0.9945.

(391) APPENDIX B 4 5 6 7 8 9 10 11 12 13 14 15 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.10 0.2059 0.5490 0.8159 0.9444 0.9873 0.9978 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.55 0.0000 0.0001 0.0011 0.0063 0.0255 0.0769 0.1818 0.3465 0.5478 0.7392 0.8796 0.9576 0.9893 0.9983 0.9999 1.0000 p 0.92 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0007 0.0050 0.0273 0.1130 0.3403 0.7137 1.0000. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.15 0.0874 0.3186 0.6042 0.8227 0.9383 0.9832 0.9964 0.9994 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.60 0.0000 0.0000 0.0003 0.0019 0.0093 0.0338 0.0950 0.2131 0.3902 0.5968 0.7827 0.9095 0.9729 0.9948 0.9995 1.0000 p 0.93 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0028 0.0175 0.0829 0.2832 0.6633 1.0000. 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.20 0.0352 0.1671 0.3980 0.6482 0.8358 0.9389 0.9819 0.9958 0.9992 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.65 0.0000 0.0000 0.0001 0.0005 0.0028 0.0124 0.0422 0.1132 0.2452 0.4357 0.6481 0.8273 0.9383 0.9858 0.9984 1.0000 p 0.94 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0014 0.0104 0.0571 0.2262 0.6047 1.0000. 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.25 0.0134 0.0802 0.2361 0.4613 0.6865 0.8516 0.9434 0.9827 0.9958 0.9992 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.70 0.0000 0.0000 0.0000 0.0001 0.0007 0.0037 0.0152 0.0500 0.1311 0.2784 0.4845 0.7031 0.8732 0.9647 0.9953 1.0000 p 0.95 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0006 0.0055 0.0362 0.1710 0.5367 1.0000. 0.9994 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.30 0.0047 0.0353 0.1268 0.2969 0.5155 0.7216 0.8689 0.9500 0.9848 0.9963 0.9993 0.9999 1.0000 1.0000 1.0000 1.0000 p 0.75 0.0000 0.0000 0.0000 0.0000 0.0001 0.0008 0.0042 0.0173 0.0566 0.1484 0.3135 0.5387 0.7639 0.9198 0.9866 1.0000 p 0.96 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0024 0.0203 0.1191 0.4579 1.0000. 0.9986 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.35 0.0016 0.0142 0.0617 0.1727 0.3519 0.5643 0.7548 0.8868 0.9578 0.9876 0.9972 0.9995 0.9999 1.0000 1.0000 1.0000 p 0.80 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0008 0.0042 0.0181 0.0611 0.1642 0.3518 0.6020 0.8329 0.9648 1.0000 p 0.97 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0008 0.0094 0.0730 0.3667 1.0000. 0.9972 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.40 0.0005 0.0052 0.0271 0.0905 0.2173 0.4032 0.6098 0.7869 0.9050 0.9662 0.9907 0.9981 0.9997 1.0000 1.0000 1.0000 p 0.85 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0006 0.0036 0.0168 0.0617 0.1773 0.3958 0.6814 0.9126 1.0000 p 0.98 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0030 0.0353 0.2614 1.0000. 0.9950 0.9993 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.45 0.0001 0.0017 0.0107 0.0424 0.1204 0.2608 0.4522 0.6535 0.8182 0.9231 0.9745 0.9937 0.9989 0.9999 1.0000 1.0000 p 0.90 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0022 0.0127 0.0556 0.1841 0.4510 0.7941 1.0000 p 0.99 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0004 0.0096 0.1399 1.0000. 0.9918 0.9987 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.50 0.0000 0.0005 0.0037 0.0176 0.0592 0.1509 0.3036 0.5000 0.6964 0.8491 0.9408 0.9824 0.9963 0.9995 1.0000 1.0000 p 0.91 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0013 0.0082 0.0399 0.1469 0.3965 0.7570 1.0000 p 1.00 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 (continued ). 847.

(392) 848. APPENDIX B. n 20 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20. p 0.01 0.8179 0.9831 0.9990 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.10 0.1216 0.3917 0.6769 0.8670 0.9568 0.9887 0.9976 0.9996 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.55 0.0000 0.0000 0.0000 0.0003 0.0015 0.0064 0.0214 0.0580 0.1308 0.2493 0.4086 0.5857 0.7480 0.8701 0.9447 0.9811 0.9951 0.9991 0.9999 1.0000 1.0000. p 0.02 0.6676 0.9401 0.9929 0.9994 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.15 0.0388 0.1756 0.4049 0.6477 0.8298 0.9327 0.9781 0.9941 0.9987 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.60 0.0000 0.0000 0.0000 0.0000 0.0003 0.0016 0.0065 0.0210 0.0565 0.1275 0.2447 0.4044 0.5841 0.7500 0.8744 0.9490 0.9840 0.9964 0.9995 1.0000 1.0000. p 0.03 0.5438 0.8802 0.9790 0.9973 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.20 0.0115 0.0692 0.2061 0.4114 0.6296 0.8042 0.9133 0.9679 0.9900 0.9974 0.9994 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.65 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0015 0.0060 0.0196 0.0532 0.1218 0.2376 0.3990 0.5834 0.7546 0.8818 0.9556 0.9879 0.9979 0.9998 1.0000. p 0.04 0.4420 0.8103 0.9561 0.9926 0.9990 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.25 0.0032 0.0243 0.0913 0.2252 0.4148 0.6172 0.7858 0.8982 0.9591 0.9861 0.9961 0.9991 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.70 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0013 0.0051 0.0171 0.0480 0.1133 0.2277 0.3920 0.5836 0.7625 0.8929 0.9645 0.9924 0.9992 1.0000. p 0.05 0.3585 0.7358 0.9245 0.9841 0.9974 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.30 0.0008 0.0076 0.0355 0.1071 0.2375 0.4164 0.6080 0.7723 0.8867 0.9520 0.9829 0.9949 0.9987 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.75 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0009 0.0039 0.0139 0.0409 0.1018 0.2142 0.3828 0.5852 0.7748 0.9087 0.9757 0.9968 1.0000. p 0.06 0.2901 0.6605 0.8850 0.9710 0.9944 0.9991 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.35 0.0002 0.0021 0.0121 0.0444 0.1182 0.2454 0.4166 0.6010 0.7624 0.8782 0.9468 0.9804 0.9940 0.9985 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.80 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0006 0.0026 0.0100 0.0321 0.0867 0.1958 0.3704 0.5886 0.7939 0.9308 0.9885 1.0000. p 0.07 0.2342 0.5869 0.8390 0.9529 0.9893 0.9981 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.40 0.0000 0.0005 0.0036 0.0160 0.0510 0.1256 0.2500 0.4159 0.5956 0.7553 0.8725 0.9435 0.9790 0.9935 0.9984 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.85 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0013 0.0059 0.0219 0.0673 0.1702 0.3523 0.5951 0.8244 0.9612 1.0000. p 0.08 0.1887 0.5169 0.7879 0.9294 0.9817 0.9962 0.9994 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.45 0.0000 0.0001 0.0009 0.0049 0.0189 0.0553 0.1299 0.2520 0.4143 0.5914 0.7507 0.8692 0.9420 0.9786 0.9936 0.9985 0.9997 1.0000 1.0000 1.0000 1.0000 p 0.90 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0004 0.0024 0.0113 0.0432 0.1330 0.3231 0.6083 0.8784 1.0000. p 0.09 0.1516 0.4516 0.7334 0.9007 0.9710 0.9932 0.9987 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.50 0.0000 0.0000 0.0002 0.0013 0.0059 0.0207 0.0577 0.1316 0.2517 0.4119 0.5881 0.7483 0.8684 0.9423 0.9793 0.9941 0.9987 0.9998 1.0000 1.0000 1.0000 p 0.91 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0013 0.0068 0.0290 0.0993 0.2666 0.5484 0.8484 1.0000.

(393) APPENDIX B. x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20. p 0.92 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0006 0.0038 0.0183 0.0706 0.2121 0.4831 0.8113 1.0000. p 0.93 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0019 0.0107 0.0471 0.1610 0.4131 0.7658 1.0000. p 0.94 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0009 0.0056 0.0290 0.1150 0.3395 0.7099 1.0000. p 0.95 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0026 0.0159 0.0755 0.2642 0.6415 1.0000. x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13. p 0.01 0.7778 0.9742 0.9980 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.10 0.0718 0.2712 0.5371 0.7636 0.9020 0.9666 0.9905 0.9977 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000. p 0.02 0.6035 0.9114 0.9868 0.9986 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.15 0.0172 0.0931 0.2537 0.4711 0.6821 0.8385 0.9305 0.9745 0.9920 0.9979 0.9995 0.9999 1.0000 1.0000. p 0.03 0.4670 0.8280 0.9620 0.9938 0.9992 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.20 0.0038 0.0274 0.0982 0.2340 0.4207 0.6167 0.7800 0.8909 0.9532 0.9827 0.9944 0.9985 0.9996 0.9999. p 0.04 0.3604 0.7358 0.9235 0.9835 0.9972 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.25 0.0008 0.0070 0.0321 0.0962 0.2137 0.3783 0.5611 0.7265 0.8506 0.9287 0.9703 0.9893 0.9966 0.9991. p 0.96 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0010 0.0074 0.0439 0.1897 0.5580 1.0000. p 0.97 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0027 0.0210 0.1198 0.4562 1.0000. p 0.98 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0006 0.0071 0.0599 0.3324 1.0000. p 0.99 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0010 0.0169 0.1821 1.0000. p 1.00 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000. p 0.06 0.2129 0.5527 0.8129 0.9402 0.9850 0.9969 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.35 0.0000 0.0003 0.0021 0.0097 0.0320 0.0826 0.1734 0.3061 0.4668 0.6303 0.7712 0.8746 0.9396 0.9745. p 0.07 0.1630 0.4696 0.7466 0.9064 0.9726 0.9935 0.9987 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.40 0.0000 0.0001 0.0004 0.0024 0.0095 0.0294 0.0736 0.1536 0.2735 0.4246 0.5858 0.7323 0.8462 0.9222. p 0.08 0.1244 0.3947 0.6768 0.8649 0.9549 0.9877 0.9972 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.45 0.0000 0.0000 0.0001 0.0005 0.0023 0.0086 0.0258 0.0639 0.1340 0.2424 0.3843 0.5426 0.6937 0.8173. p 0.09 0.0946 0.3286 0.6063 0.8169 0.9314 0.9790 0.9946 0.9989 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.50 0.0000 0.0000 0.0000 0.0001 0.0005 0.0020 0.0073 0.0216 0.0539 0.1148 0.2122 0.3450 0.5000 0.6550. n 25 p 0.05 0.2774 0.6424 0.8729 0.9659 0.9928 0.9988 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.30 0.0001 0.0016 0.0090 0.0332 0.0905 0.1935 0.3407 0.5118 0.6769 0.8106 0.9022 0.9558 0.9825 0.9940. (continued ). 849.

(394) 850. APPENDIX B 14 15 16 17 18 19 20 21 22 23 24 25 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.55 0.0000 0.0000 0.0000 0.0000 0.0001 0.0004 0.0016 0.0058 0.0174 0.0440 0.0960 0.1827 0.3063 0.4574 0.6157 0.7576 0.8660 0.9361 0.9742 0.9914 0.9977 0.9995 0.9999 1.0000 1.0000 1.0000 p 0.92 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0005 0.0028 0.0123 0.0451 0.1351 0.3232 0.6053 0.8756 1.0000. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.60 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0003 0.0012 0.0043 0.0132 0.0344 0.0778 0.1538 0.2677 0.4142 0.5754 0.7265 0.8464 0.9264 0.9706 0.9905 0.9976 0.9996 0.9999 1.0000 1.0000 p 0.93 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0013 0.0065 0.0274 0.0936 0.2534 0.5304 0.8370 1.0000. 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.65 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0008 0.0029 0.0093 0.0255 0.0604 0.1254 0.2288 0.3697 0.5332 0.6939 0.8266 0.9174 0.9680 0.9903 0.9979 0.9997 1.0000 1.0000 p 0.94 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0005 0.0031 0.0150 0.0598 0.1871 0.4473 0.7871 1.0000. 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.70 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0005 0.0018 0.0060 0.0175 0.0442 0.0978 0.1894 0.3231 0.4882 0.6593 0.8065 0.9095 0.9668 0.9910 0.9984 0.9999 1.0000 p 0.95 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0012 0.0072 0.0341 0.1271 0.3576 0.7226 1.0000. 0.9982 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.75 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0009 0.0034 0.0107 0.0297 0.0713 0.1494 0.2735 0.4389 0.6217 0.7863 0.9038 0.9679 0.9930 0.9992 1.0000 p 0.96 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0004 0.0028 0.0165 0.0765 0.2642 0.6396 1.0000. 0.9907 0.9971 0.9992 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.80 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0004 0.0015 0.0056 0.0173 0.0468 0.1091 0.2200 0.3833 0.5793 0.7660 0.9018 0.9726 0.9962 1.0000 p 0.97 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0008 0.0062 0.0380 0.1720 0.5330 1.0000. 0.9656 0.9868 0.9957 0.9988 0.9997 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.85 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0005 0.0021 0.0080 0.0255 0.0695 0.1615 0.3179 0.5289 0.7463 0.9069 0.9828 1.0000 p 0.98 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0014 0.0132 0.0886 0.3965 1.0000. 0.9040 0.9560 0.9826 0.9942 0.9984 0.9996 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 p 0.90 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0005 0.0023 0.0095 0.0334 0.0980 0.2364 0.4629 0.7288 0.9282 1.0000 p 0.99 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0020 0.0258 0.2222 1.0000. 0.7878 0.8852 0.9461 0.9784 0.9927 0.9980 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000 p 0.91 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0011 0.0054 0.0210 0.0686 0.1831 0.3937 0.6714 0.9054 1.0000 p 1.00 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000.

(395) APPENDIX C. X. APPENDIX C. P( x ≤ X ) =. ∑ i =0. Cumulative Poisson Probability Distribution Table. 851. (t)i e − t. i!. t x 0 1 2 3. 0.005 0.9950 1.0000 1.0000 1.0000. 0.01 0.9900 1.0000 1.0000 1.0000. 0.02 0.9802 0.9998 1.0000 1.0000. 0.03 0.9704 0.9996 1.0000 1.0000. 0.04 0.9608 0.9992 1.0000 1.0000. 0.05 0.9512 0.9988 1.0000 1.0000. 0.06 0.9418 0.9983 1.0000 1.0000. 0.07 0.9324 0.9977 0.9999 1.0000. 0.08 0.9231 0.9970 0.9999 1.0000. 0.09 0.9139 0.9962 0.9999 1.0000. x. 0.10. 0.20. 0.30. 0.40. 0.50. 0.60. 0.70. 0.80. 0.90. 1.00. 0. 0.9048. 0.8187. 0.7408. 0.6703. 0.6065. 0.5488. 0.4966. 0.4493. 0.4066. 0.3679. 1 2 3 4 5 6 7. 0.9953 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000. 0.9825 0.9989 0.9999 1.0000 1.0000 1.0000 1.0000. 0.9631 0.9964 0.9997 1.0000 1.0000 1.0000 1.0000. 0.9384 0.9921 0.9992 0.9999 1.0000 1.0000 1.0000. 0.9098 0.9856 0.9982 0.9998 1.0000 1.0000 1.0000. 0.8781 0.9769 0.9966 0.9996 1.0000 1.0000 1.0000. 0.8442 0.9659 0.9942 0.9992 0.9999 1.0000 1.0000. 0.8088 0.9526 0.9909 0.9986 0.9998 1.0000 1.0000. 0.7725 0.9371 0.9865 0.9977 0.9997 1.0000 1.0000. 0.7358 0.9197 0.9810 0.9963 0.9994 0.9999 1.0000. x 0 1 2 3 4 5 6 7 8 9. 1.10 0.3329 0.6990 0.9004 0.9743 0.9946 0.9990 0.9999 1.0000 1.0000 1.0000. 1.20 0.3012 0.6626 0.8795 0.9662 0.9923 0.9985 0.9997 1.0000 1.0000 1.0000. 1.30 0.2725 0.6268 0.8571 0.9569 0.9893 0.9978 0.9996 0.9999 1.0000 1.0000. 1.40 0.2466 0.5918 0.8335 0.9463 0.9857 0.9968 0.9994 0.9999 1.0000 1.0000. 1.50 0.2231 0.5578 0.8088 0.9344 0.9814 0.9955 0.9991 0.9998 1.0000 1.0000. 1.60 0.2019 0.5249 0.7834 0.9212 0.9763 0.9940 0.9987 0.9997 1.0000 1.0000. 1.70 0.1827 0.4932 0.7572 0.9068 0.9704 0.9920 0.9981 0.9996 0.9999 1.0000. 1.80 0.1653 0.4628 0.7306 0.8913 0.9636 0.9896 0.9974 0.9994 0.9999 1.0000. 1.90 0.1496 0.4337 0.7037 0.8747 0.9559 0.9868 0.9966 0.9992 0.9998 1.0000. 2.00 0.1353 0.4060 0.6767 0.8571 0.9473 0.9834 0.9955 0.9989 0.9998 1.0000. x 0 1 2 3 4 5 6 7 8 9 10 11 12. 2.10 0.1225 0.3796 0.6496 0.8386 0.9379 0.9796 0.9941 0.9985 0.9997 0.9999 1.0000 1.0000 1.0000. 2.20 0.1108 0.3546 0.6227 0.8194 0.9275 0.9751 0.9925 0.9980 0.9995 0.9999 1.0000 1.0000 1.0000. 2.30 0.1003 0.3309 0.5960 0.7993 0.9162 0.9700 0.9906 0.9974 0.9994 0.9999 1.0000 1.0000 1.0000. 2.40 0.0907 0.3084 0.5697 0.7787 0.9041 0.9643 0.9884 0.9967 0.9991 0.9998 1.0000 1.0000 1.0000. 2.50 0.0821 0.2873 0.5438 0.7576 0.8912 0.9580 0.9858 0.9958 0.9989 0.9997 0.9999 1.0000 1.0000. 2.60 0.0743 0.2674 0.5184 0.7360 0.8774 0.9510 0.9828 0.9947 0.9985 0.9996 0.9999 1.0000 1.0000. 2.70 0.0672 0.2487 0.4936 0.7141 0.8629 0.9433 0.9794 0.9934 0.9981 0.9995 0.9999 1.0000 1.0000. 2.80 0.0608 0.2311 0.4695 0.6919 0.8477 0.9349 0.9756 0.9919 0.9976 0.9993 0.9998 1.0000 1.0000. 2.90 0.0550 0.2146 0.4460 0.6696 0.8318 0.9258 0.9713 0.9901 0.9969 0.9991 0.9998 0.9999 1.0000. 3.00 0.0498 0.1991 0.4232 0.6472 0.8153 0.9161 0.9665 0.9881 0.9962 0.9989 0.9997 0.9999 1.0000. t. t. t. (continued ).

(396) 852. APPENDIX C t. x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14. 3.10 0.0450 0.1847 0.4012 0.6248 0.7982 0.9057 0.9612 0.9858 0.9953 0.9986 0.9996 0.9999 1.0000 1.0000 1.0000. 3.20 0.0408 0.1712 0.3799 0.6025 0.7806 0.8946 0.9554 0.9832 0.9943 0.9982 0.9995 0.9999 1.0000 1.0000 1.0000. 3.30 0.0369 0.1586 0.3594 0.5803 0.7626 0.8829 0.9490 0.9802 0.9931 0.9978 0.9994 0.9998 1.0000 1.0000 1.0000. 3.40 0.0334 0.1468 0.3397 0.5584 0.7442 0.8705 0.9421 0.9769 0.9917 0.9973 0.9992 0.9998 0.9999 1.0000 1.0000. 3.50 0.0302 0.1359 0.3208 0.5366 0.7254 0.8576 0.9347 0.9733 0.9901 0.9967 0.9990 0.9997 0.9999 1.0000 1.0000. 3.60 0.0273 0.1257 0.3027 0.5152 0.7064 0.8441 0.9267 0.9692 0.9883 0.9960 0.9987 0.9996 0.9999 1.0000 1.0000. 3.70 0.0247 0.1162 0.2854 0.4942 0.6872 0.8301 0.9182 0.9648 0.9863 0.9952 0.9984 0.9995 0.9999 1.0000 1.0000. 3.80 0.0224 0.1074 0.2689 0.4735 0.6678 0.8156 0.9091 0.9599 0.9840 0.9942 0.9981 0.9994 0.9998 1.0000 1.0000. 3.90 0.0202 0.0992 0.2531 0.4532 0.6484 0.8006 0.8995 0.9546 0.9815 0.9931 0.9977 0.9993 0.9998 0.9999 1.0000. 4.00 0.0183 0.0916 0.2381 0.4335 0.6288 0.7851 0.8893 0.9489 0.9786 0.9919 0.9972 0.9991 0.9997 0.9999 1.0000. t x. 4.10. 4.20. 4.30. 4.40. 4.50. 4.60. 4.70. 4.80. 4.90. 5.00. 0. 0.0166. 0.0150. 0.0136. 0.0123. 0.0111. 0.0101. 0.0091. 0.0082. 0.0074. 0.0067. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16. 0.0845 0.2238 0.4142 0.6093 0.7693 0.8786 0.9427 0.9755 0.9905 0.9966 0.9989 0.9997 0.9999 1.0000 1.0000 1.0000. 0.0780 0.2102 0.3954 0.5898 0.7531 0.8675 0.9361 0.9721 0.9889 0.9959 0.9986 0.9996 0.9999 1.0000 1.0000 1.0000. 0.0719 0.1974 0.3772 0.5704 0.7367 0.8558 0.9290 0.9683 0.9871 0.9952 0.9983 0.9995 0.9998 1.0000 1.0000 1.0000. 0.0663 0.1851 0.3594 0.5512 0.7199 0.8436 0.9214 0.9642 0.9851 0.9943 0.9980 0.9993 0.9998 0.9999 1.0000 1.0000. 0.0611 0.1736 0.3423 0.5321 0.7029 0.8311 0.9134 0.9597 0.9829 0.9933 0.9976 0.9992 0.9997 0.9999 1.0000 1.0000. 0.0563 0.1626 0.3257 0.5132 0.6858 0.8180 0.9049 0.9549 0.9805 0.9922 0.9971 0.9990 0.9997 0.9999 1.0000 1.0000. 0.0518 0.1523 0.3097 0.4946 0.6684 0.8046 0.8960 0.9497 0.9778 0.9910 0.9966 0.9988 0.9996 0.9999 1.0000 1.0000. 0.0477 0.1425 0.2942 0.4763 0.6510 0.7908 0.8867 0.9442 0.9749 0.9896 0.9960 0.9986 0.9995 0.9999 1.0000 1.0000. 0.0439 0.1333 0.2793 0.4582 0.6335 0.7767 0.8769 0.9382 0.9717 0.9880 0.9953 0.9983 0.9994 0.9998 0.9999 1.0000. 0.0404 0.1247 0.2650 0.4405 0.6160 0.7622 0.8666 0.9319 0.9682 0.9863 0.9945 0.9980 0.9993 0.9998 0.9999 1.0000. x. 5.10. 5.20. 5.30. 5.40. 5.50. 5.60. 5.70. 5.80. 5.90. 6.00. 0. 0.0061. 0.0055. 0.0050. 0.0045. 0.0041. 0.0037. 0.0033. 0.0030. 0.0027. 0.0025. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18. 0.0372 0.1165 0.2513 0.4231 0.5984 0.7474 0.8560 0.9252 0.9644 0.9844 0.9937 0.9976 0.9992 0.9997 0.9999 1.0000 1.0000 1.0000. 0.0342 0.1088 0.2381 0.4061 0.5809 0.7324 0.8449 0.9181 0.9603 0.9823 0.9927 0.9972 0.9990 0.9997 0.9999 1.0000 1.0000 1.0000. 0.0314 0.1016 0.2254 0.3895 0.5635 0.7171 0.8335 0.9106 0.9559 0.9800 0.9916 0.9967 0.9988 0.9996 0.9999 1.0000 1.0000 1.0000. 0.0289 0.0948 0.2133 0.3733 0.5461 0.7017 0.8217 0.9027 0.9512 0.9775 0.9904 0.9962 0.9986 0.9995 0.9998 0.9999 1.0000 1.0000. 0.0266 0.0884 0.2017 0.3575 0.5289 0.6860 0.8095 0.8944 0.9462 0.9747 0.9890 0.9955 0.9983 0.9994 0.9998 0.9999 1.0000 1.0000. 0.0244 0.0824 0.1906 0.3422 0.5119 0.6703 0.7970 0.8857 0.9409 0.9718 0.9875 0.9949 0.9980 0.9993 0.9998 0.9999 1.0000 1.0000. 0.0224 0.0768 0.1800 0.3272 0.4950 0.6544 0.7841 0.8766 0.9352 0.9686 0.9859 0.9941 0.9977 0.9991 0.9997 0.9999 1.0000 1.0000. 0.0206 0.0715 0.1700 0.3127 0.4783 0.6384 0.7710 0.8672 0.9292 0.9651 0.9841 0.9932 0.9973 0.9990 0.9996 0.9999 1.0000 1.0000. 0.0189 0.0666 0.1604 0.2987 0.4619 0.6224 0.7576 0.8574 0.9228 0.9614 0.9821 0.9922 0.9969 0.9988 0.9996 0.9999 1.0000 1.0000. 0.0174 0.0620 0.1512 0.2851 0.4457 0.6063 0.7440 0.8472 0.9161 0.9574 0.9799 0.9912 0.9964 0.9986 0.9995 0.9998 0.9999 1.0000. t.

(397) APPENDIX C t x. 6.10. 6.20. 6.30. 6.40. 6.50. 6.60. 6.70. 6.80. 6.90. 7.00. 0. 0.0022. 0.0020. 0.0018. 0.0017. 0.0015. 0.0014. 0.0012. 0.0011. 0.0010. 0.0009. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20. 0.0159 0.0577 0.1425 0.2719 0.4298 0.5902 0.7301 0.8367 0.9090 0.9531 0.9776 0.9900 0.9958 0.9984 0.9994 0.9998 0.9999 1.0000 1.0000 1.0000. 0.0146 0.0536 0.1342 0.2592 0.4141 0.5742 0.7160 0.8259 0.9016 0.9486 0.9750 0.9887 0.9952 0.9981 0.9993 0.9997 0.9999 1.0000 1.0000 1.0000. 0.0134 0.0498 0.1264 0.2469 0.3988 0.5582 0.7017 0.8148 0.8939 0.9437 0.9723 0.9873 0.9945 0.9978 0.9992 0.9997 0.9999 1.0000 1.0000 1.0000. 0.0123 0.0463 0.1189 0.2351 0.3837 0.5423 0.6873 0.8033 0.8858 0.9386 0.9693 0.9857 0.9937 0.9974 0.9990 0.9996 0.9999 1.0000 1.0000 1.0000. 0.0113 0.0430 0.1118 0.2237 0.3690 0.5265 0.6728 0.7916 0.8774 0.9332 0.9661 0.9840 0.9929 0.9970 0.9988 0.9996 0.9998 0.9999 1.0000 1.0000. 0.0103 0.0400 0.1052 0.2127 0.3547 0.5108 0.6581 0.7796 0.8686 0.9274 0.9627 0.9821 0.9920 0.9966 0.9986 0.9995 0.9998 0.9999 1.0000 1.0000. 0.0095 0.0371 0.0988 0.2022 0.3406 0.4953 0.6433 0.7673 0.8596 0.9214 0.9591 0.9801 0.9909 0.9961 0.9984 0.9994 0.9998 0.9999 1.0000 1.0000. 0.0087 0.0344 0.0928 0.1920 0.3270 0.4799 0.6285 0.7548 0.8502 0.9151 0.9552 0.9779 0.9898 0.9956 0.9982 0.9993 0.9997 0.9999 1.0000 1.0000. 0.0080 0.0320 0.0871 0.1823 0.3137 0.4647 0.6136 0.7420 0.8405 0.9084 0.9510 0.9755 0.9885 0.9950 0.9979 0.9992 0.9997 0.9999 1.0000 1.0000. 0.0073 0.0296 0.0818 0.1730 0.3007 0.4497 0.5987 0.7291 0.8305 0.9015 0.9467 0.9730 0.9872 0.9943 0.9976 0.9990 0.9996 0.9999 1.0000 1.0000. 7.60 0.0005 0.0043 0.0188 0.0554 0.1249 0.2307 0.3646 0.5100 0.6482 0.7649 0.8535 0.9148 0.9536 0.9762 0.9886 0.9948 0.9978 0.9991 0.9996 0.9999 1.0000 1.0000. 7.70 0.0005 0.0039 0.0174 0.0518 0.1181 0.2203 0.3514 0.4956 0.6343 0.7531 0.8445 0.9085 0.9496 0.9739 0.9873 0.9941 0.9974 0.9989 0.9996 0.9998 0.9999 1.0000. 7.80 0.0004 0.0036 0.0161 0.0485 0.1117 0.2103 0.3384 0.4812 0.6204 0.7411 0.8352 0.9020 0.9454 0.9714 0.9859 0.9934 0.9971 0.9988 0.9995 0.9998 0.9999 1.0000. 7.90 0.0004 0.0033 0.0149 0.0453 0.1055 0.2006 0.3257 0.4670 0.6065 0.7290 0.8257 0.8952 0.9409 0.9687 0.9844 0.9926 0.9967 0.9986 0.9994 0.9998 0.9999 1.0000. 8.00 0.0003 0.0030 0.0138 0.0424 0.0996 0.1912 0.3134 0.4530 0.5925 0.7166 0.8159 0.8881 0.9362 0.9658 0.9827 0.9918 0.9963 0.9984 0.9993 0.9997 0.9999 1.0000. t x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21. 7.10 0.0008 0.0067 0.0275 0.0767 0.1641 0.2881 0.4349 0.5838 0.7160 0.8202 0.8942 0.9420 0.9703 0.9857 0.9935 0.9972 0.9989 0.9996 0.9998 0.9999 1.0000 1.0000. 7.20 0.0007 0.0061 0.0255 0.0719 0.1555 0.2759 0.4204 0.5689 0.7027 0.8096 0.8867 0.9371 0.9673 0.9841 0.9927 0.9969 0.9987 0.9995 0.9998 0.9999 1.0000 1.0000. 7.30 0.0007 0.0056 0.0236 0.0674 0.1473 0.2640 0.4060 0.5541 0.6892 0.7988 0.8788 0.9319 0.9642 0.9824 0.9918 0.9964 0.9985 0.9994 0.9998 0.9999 1.0000 1.0000. 7.40 0.0006 0.0051 0.0219 0.0632 0.1395 0.2526 0.3920 0.5393 0.6757 0.7877 0.8707 0.9265 0.9609 0.9805 0.9908 0.9959 0.9983 0.9993 0.9997 0.9999 1.0000 1.0000. 7.50 0.0006 0.0047 0.0203 0.0591 0.1321 0.2414 0.3782 0.5246 0.6620 0.7764 0.8622 0.9208 0.9573 0.9784 0.9897 0.9954 0.9980 0.9992 0.9997 0.9999 1.0000 1.0000. (continued ). 853.

(398) APPENDIX C. 854. t x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23. 8.10 0.0003 0.0028 0.0127 0.0396 0.0940 0.1822 0.3013 0.4391 0.5786 0.7041 0.8058 0.8807 0.9313 0.9628 0.9810 0.9908 0.9958 0.9982 0.9992 0.9997 0.9999 1.0000 1.0000 1.0000. 8.20 0.0003 0.0025 0.0118 0.0370 0.0887 0.1736 0.2896 0.4254 0.5647 0.6915 0.7955 0.8731 0.9261 0.9595 0.9791 0.9898 0.9953 0.9979 0.9991 0.9997 0.9999 1.0000 1.0000 1.0000. 8.30 0.0002 0.0023 0.0109 0.0346 0.0837 0.1653 0.2781 0.4119 0.5507 0.6788 0.7850 0.8652 0.9207 0.9561 0.9771 0.9887 0.9947 0.9977 0.9990 0.9996 0.9998 0.9999 1.0000 1.0000. 8.40 0.0002 0.0021 0.0100 0.0323 0.0789 0.1573 0.2670 0.3987 0.5369 0.6659 0.7743 0.8571 0.9150 0.9524 0.9749 0.9875 0.9941 0.9973 0.9989 0.9995 0.9998 0.9999 1.0000 1.0000. 8.50 0.0002 0.0019 0.0093 0.0301 0.0744 0.1496 0.2562 0.3856 0.5231 0.6530 0.7634 0.8487 0.9091 0.9486 0.9726 0.9862 0.9934 0.9970 0.9987 0.9995 0.9998 0.9999 1.0000 1.0000. 8.60 0.0002 0.0018 0.0086 0.0281 0.0701 0.1422 0.2457 0.3728 0.5094 0.6400 0.7522 0.8400 0.9029 0.9445 0.9701 0.9848 0.9926 0.9966 0.9985 0.9994 0.9998 0.9999 1.0000 1.0000. 8.70 0.0002 0.0016 0.0079 0.0262 0.0660 0.1352 0.2355 0.3602 0.4958 0.6269 0.7409 0.8311 0.8965 0.9403 0.9675 0.9832 0.9918 0.9962 0.9983 0.9993 0.9997 0.9999 1.0000 1.0000. 8.80 0.0002 0.0015 0.0073 0.0244 0.0621 0.1284 0.2256 0.3478 0.4823 0.6137 0.7294 0.8220 0.8898 0.9358 0.9647 0.9816 0.9909 0.9957 0.9981 0.9992 0.9997 0.9999 1.0000 1.0000. 8.90 0.0001 0.0014 0.0068 0.0228 0.0584 0.1219 0.2160 0.3357 0.4689 0.6006 0.7178 0.8126 0.8829 0.9311 0.9617 0.9798 0.9899 0.9952 0.9978 0.9991 0.9996 0.9998 0.9999 1.0000. 9.00 0.0001 0.0012 0.0062 0.0212 0.0550 0.1157 0.2068 0.3239 0.4557 0.5874 0.7060 0.8030 0.8758 0.9261 0.9585 0.9780 0.9889 0.9947 0.9976 0.9989 0.9996 0.9998 0.9999 1.0000. x. 9.10. 9.20. 9.30. 9.40. 9.50. 9.60. 9.70. 9.80. 9.90. 10.00. 0. 0.0001. 0.0001. 0.0001. 0.0001. 0.0001. 0.0001. 0.0001. 0.0001. 0.0001. 0.0000. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24. 0.0011 0.0058 0.0198 0.0517 0.1098 0.1978 0.3123 0.4426 0.5742 0.6941 0.7932 0.8684 0.9210 0.9552 0.9760 0.9878 0.9941 0.9973 0.9988 0.9995 0.9998 0.9999 1.0000 1.0000. 0.0010 0.0053 0.0184 0.0486 0.1041 0.1892 0.3010 0.4296 0.5611 0.6820 0.7832 0.8607 0.9156 0.9517 0.9738 0.9865 0.9934 0.9969 0.9986 0.9994 0.9998 0.9999 1.0000 1.0000. 0.0009 0.0049 0.0172 0.0456 0.0986 0.1808 0.2900 0.4168 0.5479 0.6699 0.7730 0.8529 0.9100 0.9480 0.9715 0.9852 0.9927 0.9966 0.9985 0.9993 0.9997 0.9999 1.0000 1.0000. 0.0009 0.0045 0.0160 0.0429 0.0935 0.1727 0.2792 0.4042 0.5349 0.6576 0.7626 0.8448 0.9042 0.9441 0.9691 0.9838 0.9919 0.9962 0.9983 0.9992 0.9997 0.9999 1.0000 1.0000. 0.0008 0.0042 0.0149 0.0403 0.0885 0.1649 0.2687 0.3918 0.5218 0.6453 0.7520 0.8364 0.8981 0.9400 0.9665 0.9823 0.9911 0.9957 0.9980 0.9991 0.9996 0.9999 0.9999 1.0000. 0.0007 0.0038 0.0138 0.0378 0.0838 0.1574 0.2584 0.3796 0.5089 0.6329 0.7412 0.8279 0.8919 0.9357 0.9638 0.9806 0.9902 0.9952 0.9978 0.9990 0.9996 0.9998 0.9999 1.0000. 0.0007 0.0035 0.0129 0.0355 0.0793 0.1502 0.2485 0.3676 0.4960 0.6205 0.7303 0.8191 0.8853 0.9312 0.9609 0.9789 0.9892 0.9947 0.9975 0.9989 0.9995 0.9998 0.9999 1.0000. 0.0006 0.0033 0.0120 0.0333 0.0750 0.1433 0.2388 0.3558 0.4832 0.6080 0.7193 0.8101 0.8786 0.9265 0.9579 0.9770 0.9881 0.9941 0.9972 0.9987 0.9995 0.9998 0.9999 1.0000. 0.0005 0.0030 0.0111 0.0312 0.0710 0.1366 0.2294 0.3442 0.4705 0.5955 0.7081 0.8009 0.8716 0.9216 0.9546 0.9751 0.9870 0.9935 0.9969 0.9986 0.9994 0.9997 0.9999 1.0000. 0.0005 0.0028 0.0103 0.0293 0.0671 0.1301 0.2202 0.3328 0.4579 0.5830 0.6968 0.7916 0.8645 0.9165 0.9513 0.9730 0.9857 0.9928 0.9965 0.9984 0.9993 0.9997 0.9999 1.0000. t.

(399) APPENDIX C t x. 11.00. 12.00. 13.00. 14.00. 15.00. 16.00. 17.00. 18.00. 19.00. 20.00. 0. 0.0000. 0.0000. 0.0000. 0.0000. 0.0000. 0.0000. 0.0000. 0.0000. 0.0000. 0.0000. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40. 0.0002 0.0012 0.0049 0.0151 0.0375 0.0786 0.1432 0.2320 0.3405 0.4599 0.5793 0.6887 0.7813 0.8540 0.9074 0.9441 0.9678 0.9823 0.9907 0.9953 0.9977 0.9990 0.9995 0.9998 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000. 0.0001 0.0005 0.0023 0.0076 0.0203 0.0458 0.0895 0.1550 0.2424 0.3472 0.4616 0.5760 0.6815 0.7720 0.8444 0.8987 0.9370 0.9626 0.9787 0.9884 0.9939 0.9970 0.9985 0.9993 0.9997 0.9999 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000. 0.0000 0.0002 0.0011 0.0037 0.0107 0.0259 0.0540 0.0998 0.1658 0.2517 0.3532 0.4631 0.5730 0.6751 0.7636 0.8355 0.8905 0.9302 0.9573 0.9750 0.9859 0.9924 0.9960 0.9980 0.9990 0.9995 0.9998 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000. 0.0000 0.0001 0.0005 0.0018 0.0055 0.0142 0.0316 0.0621 0.1094 0.1757 0.2600 0.3585 0.4644 0.5704 0.6694 0.7559 0.8272 0.8826 0.9235 0.9521 0.9712 0.9833 0.9907 0.9950 0.9974 0.9987 0.9994 0.9997 0.9999 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000. 0.0000 0.0000 0.0002 0.0009 0.0028 0.0076 0.0180 0.0374 0.0699 0.1185 0.1848 0.2676 0.3632 0.4657 0.5681 0.6641 0.7489 0.8195 0.8752 0.9170 0.9469 0.9673 0.9805 0.9888 0.9938 0.9967 0.9983 0.9991 0.9996 0.9998 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000. 0.0000 0.0000 0.0001 0.0004 0.0014 0.0040 0.0100 0.0220 0.0433 0.0774 0.1270 0.1931 0.2745 0.3675 0.4667 0.5660 0.6593 0.7423 0.8122 0.8682 0.9108 0.9418 0.9633 0.9777 0.9869 0.9925 0.9959 0.9978 0.9989 0.9994 0.9997 0.9999 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000. 0.0000 0.0000 0.0000 0.0002 0.0007 0.0021 0.0054 0.0126 0.0261 0.0491 0.0847 0.1350 0.2009 0.2808 0.3715 0.4677 0.5640 0.6550 0.7363 0.8055 0.8615 0.9047 0.9367 0.9594 0.9748 0.9848 0.9912 0.9950 0.9973 0.9986 0.9993 0.9996 0.9998 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000. 0.0000 0.0000 0.0000 0.0001 0.0003 0.0010 0.0029 0.0071 0.0154 0.0304 0.0549 0.0917 0.1426 0.2081 0.2867 0.3751 0.4686 0.5622 0.6509 0.7307 0.7991 0.8551 0.8989 0.9317 0.9554 0.9718 0.9827 0.9897 0.9941 0.9967 0.9982 0.9990 0.9995 0.9998 0.9999 0.9999 1.0000 1.0000 1.0000 1.0000. 0.0000 0.0000 0.0000 0.0000 0.0002 0.0005 0.0015 0.0039 0.0089 0.0183 0.0347 0.0606 0.0984 0.1497 0.2148 0.2920 0.3784 0.4695 0.5606 0.6472 0.7255 0.7931 0.8490 0.8933 0.9269 0.9514 0.9687 0.9805 0.9882 0.9930 0.9960 0.9978 0.9988 0.9994 0.9997 0.9998 0.9999 1.0000 1.0000 1.0000. 0.0000 0.0000 0.0000 0.0000 0.0001 0.0003 0.0008 0.0021 0.0050 0.0108 0.0214 0.0390 0.0661 0.1049 0.1565 0.2211 0.2970 0.3814 0.4703 0.5591 0.6437 0.7206 0.7875 0.8432 0.8878 0.9221 0.9475 0.9657 0.9782 0.9865 0.9919 0.9953 0.9973 0.9985 0.9992 0.9996 0.9998 0.9999 0.9999 1.0000. 855.

(400) APPENDIX D. 856. APPENDIX D Standard Normal Distribution Table 0.3944. 0. z = 1.25. z. z. 0. 0.01. 0.02. 0.03. 0.04. 0.05. 0.06. 0.07. 0.08. 0.09. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0. 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.4987. 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 0.4826 0.4864 0.4896 0.4920 0.4940 0.4955 0.4966 0.4975 0.4982 0.4987. 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.4830 0.4868 0.4898 0.4922 0.4941 0.4956 0.4967 0.4976 0.4982 0.4987. 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.4788 0.4834 0.4871 0.4901 0.4925 0.4943 0.4957 0.4968 0.4977 0.4983 0.4988. 0.0160 0.0557 0.0948 0.1331 0.1700 0.2054 0.2389 0.2704 0.2995 0.3264 0.3508 0.3729 0.3925 0.4099 0.4251 0.4382 0.4495 0.4591 0.4671 0.4738 0.4793 0.4838 0.4875 0.4904 0.4927 0.4945 0.4959 0.4969 0.4977 0.4984 0.4988. 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.3531 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.4842 0.4878 0.4906 0.4929 0.4946 0.4960 0.4970 0.4978 0.4984 0.4989. 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.4846 0.4881 0.4909 0.4931 0.4948 0.4961 0.4971 0.4979 0.4985 0.4989. 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.4808 0.4850 0.4884 0.4911 0.4932 0.4949 0.4962 0.4972 0.4979 0.4985 0.4989. 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2517 0.2823 0.3106 0.3365 0.3599 0.3810 0.3997 0.4162 0.4306 0.4429 0.4535 0.4625 0.4699 0.4761 0.4812 0.4854 0.4887 0.4913 0.4934 0.4951 0.4963 0.4973 0.4980 0.4986 0.4990. 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 0.3830 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 0.4817 0.4857 0.4890 0.4916 0.4936 0.4952 0.4964 0.4974 0.4981 0.4986 0.4990.

(401) APPENDIX E. Values of e−la. APPENDIX E Exponential Distribution Table. 857. la. e−la. la. e−la. la. e−la. la. e−la. la. e−la. 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00. 1.0000 0.9512 0.9048 0.8607 0.8187 0.7788 0.7408 0.7047 0.6703 0.6376 0.6065 0.5769 0.5488 0.5220 0.4966 0.4724 0.4493 0.4274 0.4066 0.3867 0.3679 0.3499 0.3329 0.3166 0.3012 0.2865 0.2725 0.2592 0.2466 0.2346 0.2231 0.2122 0.2019 0.1920 0.1827 0.1738 0.1653 0.1572 0.1496 0.1423 0.1353. 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00 3.05 3.10 3.15 3.20 3.25 3.30 3.35 3.40 3.45 3.50 3.55 3.60 3.65 3.70 3.75 3.80 3.85 3.90 3.95 4.00. 0.1287 0.1225 0.1165 0.1108 0.1054 0.1003 0.0954 0.0907 0.0863 0.0821 0.0781 0.0743 0.0707 0.0672 0.0639 0.0608 0.0578 0.0550 0.0523 0.0498 0.0474 0.0450 0.0429 0.0408 0.0388 0.0369 0.0351 0.0334 0.0317 0.0302 0.0287 0.0273 0.0260 0.0247 0.0235 0.0224 0.0213 0.0202 0.0193 0.0183. 4.05 4.10 4.15 4.20 4.25 4.30 4.35 4.40 4.45 4.50 4.55 4.60 4.65 4.70 4.75 4.80 4.85 4.90 4.95 5.00 5.05 5.10 5.15 5.20 5.25 5.30 5.35 5.40 5.45 5.50 5.55 5.60 5.65 5.70 5.75 5.80 5.85 5.90 5.95 6.00. 0.0174 0.0166 0.0158 0.0150 0.0143 0.0136 0.0129 0.0123 0.0117 0.0111 0.0106 0.0101 0.0096 0.0091 0.0087 0.0082 0.0078 0.0074 0.0071 0.0067 0.0064 0.0061 0.0058 0.0055 0.0052 0.0050 0.0047 0.0045 0.0043 0.0041 0.0039 0.0037 0.0035 0.0033 0.0032 0.0030 0.0029 0.0027 0.0026 0.0025. 6.05 6.10 6.15 6.20 6.25 6.30 6.35 6.40 6.45 6.50 6.55 6.60 6.65 6.70 6.75 6.80 6.85 6.90 6.95 7.00 7.05 7.10 7.15 7.20 7.25 7.30 7.35 7.40 7.45 7.50 7.55 7.60 7.65 7.70 7.75 7.80 7.85 7.90 7.95 8.00. 0.0024 0.0022 0.0021 0.0020 0.0019 0.0018 0.0017 0.0017 0.0016 0.0015 0.0014 0.0014 0.0013 0.0012 0.0012 0.0011 0.0011 0.0010 0.0010 0.0009 0.0009 0.0008 0.0008 0.0007 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003. 8.05 8.10 8.15 8.20 8.25 8.30 8.35 8.40 8.45 8.50 8.55 8.60 8.65 8.70 8.75 8.80 8.85 8.90 8.95 9.00 9.05 9.10 9.15 9.20 9.25 9.30 9.35 9.40 9.45 9.50 9.55 9.60 9.65 9.70 9.75 9.80 9.85 9.90 9.95 10.00. 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000.

(402) 858. APPENDIX F. APPENDIX F df = 10. Values of t for Selected Probabilities 0.05. 0.05. 0. t = –1.8125. t = 1.8125. t. PROBABILITIES (OR AREAS UNDER t-DISTRIBUTION CURVE). Conf. Level One Tail Two Tails. 0.1 0.45 0.9. 0.3 0.35 0.7. 0.5 0.25 0.5. 0.7 0.15 0.3. 0.1584 0.1421 0.1366 0.1338 0.1322 0.1311 0.1303 0.1297 0.1293 0.1289 0.1286 0.1283 0.1281 0.1280 0.1278 0.1277 0.1276 0.1274 0.1274 0.1273 0.1272 0.1271 0.1271 0.1270 0.1269 0.1269 0.1268 0.1268 0.1268 0.1267 0.1265 0.1263 0.1262 0.1261 0.1261 0.1260 0.1260 0.1258 0.1257 0.1257. 0.5095 0.4447 0.4242 0.4142 0.4082 0.4043 0.4015 0.3995 0.3979 0.3966 0.3956 0.3947 0.3940 0.3933 0.3928 0.3923 0.3919 0.3915 0.3912 0.3909 0.3906 0.3904 0.3902 0.3900 0.3898 0.3896 0.3894 0.3893 0.3892 0.3890 0.3881 0.3875 0.3872 0.3869 0.3867 0.3866 0.3864 0.3858 0.3855 0.3853. 1.0000 0.8165 0.7649 0.7407 0.7267 0.7176 0.7111 0.7064 0.7027 0.6998 0.6974 0.6955 0.6938 0.6924 0.6912 0.6901 0.6892 0.6884 0.6876 0.6870 0.6864 0.6858 0.6853 0.6848 0.6844 0.6840 0.6837 0.6834 0.6830 0.6828 0.6807 0.6794 0.6786 0.6780 0.6776 0.6772 0.6770 0.6755 0.6750 0.6745. 1.9626 1.3862 1.2498 1.1896 1.1558 1.1342 1.1192 1.1081 1.0997 1.0931 1.0877 1.0832 1.0795 1.0763 1.0735 1.0711 1.0690 1.0672 1.0655 1.0640 1.0627 1.0614 1.0603 1.0593 1.0584 1.0575 1.0567 1.0560 1.0553 1.0547 1.0500 1.0473 1.0455 1.0442 1.0432 1.0424 1.0418 1.0386 1.0375 1.0364. df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 70 80 90 100 250 500 ∞. 0.8 0.1 0.2. 0.9 0.05 0.1. 0.95 0.025 0.05. 0.98 0.01 0.02. 0.99 0.005 0.01. 6.3137 2.9200 2.3534 2.1318 2.0150 1.9432 1.8946 1.8595 1.8331 1.8125 1.7959 1.7823 1.7709 1.7613 1.7531 1.7459 1.7396 1.7341 1.7291 1.7247 1.7207 1.7171 1.7139 1.7109 1.7081 1.7056 1.7033 1.7011 1.6991 1.6973 1.6839 1.6759 1.6706 1.6669 1.6641 1.6620 1.6602 1.6510 1.6479 1.6449. 12.7062 4.3027 3.1824 2.7765 2.5706 2.4469 2.3646 2.3060 2.2622 2.2281 2.2010 2.1788 2.1604 2.1448 2.1315 2.1199 2.1098 2.1009 2.0930 2.0860 2.0796 2.0739 2.0687 2.0639 2.0595 2.0555 2.0518 2.0484 2.0452 2.0423 2.0211 2.0086 2.0003 1.9944 1.9901 1.9867 1.9840 1.9695 1.9647 1.9600. 31.8210 6.9645 4.5407 3.7469 3.3649 3.1427 2.9979 2.8965 2.8214 2.7638 2.7181 2.6810 2.6503 2.6245 2.6025 2.5835 2.5669 2.5524 2.5395 2.5280 2.5176 2.5083 2.4999 2.4922 2.4851 2.4786 2.4727 2.4671 2.4620 2.4573 2.4233 2.4033 2.3901 2.3808 2.3739 2.3685 2.3642 2.3414 2.3338 2.3263. 63.6559 9.9250 5.8408 4.6041 4.0321 3.7074 3.4995 3.3554 3.2498 3.1693 3.1058 3.0545 3.0123 2.9768 2.9467 2.9208 2.8982 2.8784 2.8609 2.8453 2.8314 2.8188 2.8073 2.7970 2.7874 2.7787 2.7707 2.7633 2.7564 2.7500 2.7045 2.6778 2.6603 2.6479 2.6387 2.6316 2.6259 2.5956 2.5857 2.5758. Values of t 3.0777 1.8856 1.6377 1.5332 1.4759 1.4398 1.4149 1.3968 1.3830 1.3722 1.3634 1.3562 1.3502 1.3450 1.3406 1.3368 1.3334 1.3304 1.3277 1.3253 1.3232 1.3212 1.3195 1.3178 1.3163 1.3150 1.3137 1.3125 1.3114 1.3104 1.3031 1.2987 1.2958 1.2938 1.2922 1.2910 1.2901 1.2849 1.2832 1.2816.

(403) APPENDIX G. APPENDIX G. 859. f (χ2). Values of χ for Selected Probabilities 2. df = 5 0.10. χ2. χ2 = 9.2363. PROBABILITIES (OR AREAS UNDER CHI-SQUARE DISTRIBUTION CURVE ABOVE GIVEN CHI-SQUARE VALUES). 0.995. 0.99. 0.975. 0.95. df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30. 0.90. 0.10. 0.05. 0.025. 0.01. 0.005. 5.0239 7.3778 9.3484 11.1433 12.8325 14.4494 16.0128 17.5345 19.0228 20.4832 21.9200 23.3367 24.7356 26.1189 27.4884 28.8453 30.1910 31.5264 32.8523 34.1696 35.4789 36.7807 38.0756 39.3641 40.6465 41.9231 43.1945 44.4608 45.7223 46.9792. 6.6349 9.2104 11.3449 13.2767 15.0863 16.8119 18.4753 20.0902 21.6660 23.2093 24.7250 26.2170 27.6882 29.1412 30.5780 31.9999 33.4087 34.8052 36.1908 37.5663 38.9322 40.2894 41.6383 42.9798 44.3140 45.6416 46.9628 48.2782 49.5878 50.8922. 7.8794 10.5965 12.8381 14.8602 16.7496 18.5475 20.2777 21.9549 23.5893 25.1881 26.7569 28.2997 29.8193 31.3194 32.8015 34.2671 35.7184 37.1564 38.5821 39.9969 41.4009 42.7957 44.1814 45.5584 46.9280 48.2898 49.6450 50.9936 52.3355 53.6719. Values of Chi-Squared 0.0000 0.0100 0.0717 0.2070 0.4118 0.6757 0.9893 1.3444 1.7349 2.1558 2.6032 3.0738 3.5650 4.0747 4.6009 5.1422 5.6973 6.2648 6.8439 7.4338 8.0336 8.6427 9.2604 9.8862 10.5196 11.1602 11.8077 12.4613 13.1211 13.7867. 0.0002 0.0201 0.1148 0.2971 0.5543 0.8721 1.2390 1.6465 2.0879 2.5582 3.0535 3.5706 4.1069 4.6604 5.2294 5.8122 6.4077 7.0149 7.6327 8.2604 8.8972 9.5425 10.1957 10.8563 11.5240 12.1982 12.8785 13.5647 14.2564 14.9535. 0.0010 0.0506 0.2158 0.4844 0.8312 1.2373 1.6899 2.1797 2.7004 3.2470 3.8157 4.4038 5.0087 5.6287 6.2621 6.9077 7.5642 8.2307 8.9065 9.5908 10.2829 10.9823 11.6885 12.4011 13.1197 13.8439 14.5734 15.3079 16.0471 16.7908. 0.0039 0.1026 0.3518 0.7107 1.1455 1.6354 2.1673 2.7326 3.3251 3.9403 4.5748 5.2260 5.8919 6.5706 7.2609 7.9616 8.6718 9.3904 10.1170 10.8508 11.5913 12.3380 13.0905 13.8484 14.6114 15.3792 16.1514 16.9279 17.7084 18.4927. 0.0158 0.2107 0.5844 1.0636 1.6103 2.2041 2.8331 3.4895 4.1682 4.8652 5.5778 6.3038 7.0415 7.7895 8.5468 9.3122 10.0852 10.8649 11.6509 12.4426 13.2396 14.0415 14.8480 15.6587 16.4734 17.2919 18.1139 18.9392 19.7677 20.5992. 2.7055 4.6052 6.2514 7.7794 9.2363 10.6446 12.0170 13.3616 14.6837 15.9872 17.2750 18.5493 19.8119 21.0641 22.3071 23.5418 24.7690 25.9894 27.2036 28.4120 29.6151 30.8133 32.0069 33.1962 34.3816 35.5632 36.7412 37.9159 39.0875 40.2560. 3.8415 5.9915 7.8147 9.4877 11.0705 12.5916 14.0671 15.5073 16.9190 18.3070 19.6752 21.0261 22.3620 23.6848 24.9958 26.2962 27.5871 28.8693 30.1435 31.4104 32.6706 33.9245 35.1725 36.4150 37.6525 38.8851 40.1133 41.3372 42.5569 43.7730.

(404) 860. APPENDIX H. APPENDIX H. f(F). df = D1 = 5 D2 = 10. F-Distribution Table: Upper 5% Probability (or 5% Area) under F-Distribution Curve. 0.05. F. F = 3.326. DENOMINATOR df D2. NUMERATOR df D1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 50 100 200 300. 1 161.446 18.513 10.128 7.709 6.608 5.987 5.591 5.318 5.117 4.965 4.844 4.747 4.667 4.600 4.543 4.494 4.451 4.414 4.381 4.351 4.260 4.171 4.085 4.034 3.936 3.888 3.873. 2 199.499 19.000 9.552 6.944 5.786 5.143 4.737 4.459 4.256 4.103 3.982 3.885 3.806 3.739 3.682 3.634 3.592 3.555 3.522 3.493 3.403 3.316 3.232 3.183 3.087 3.041 3.026. 3 215.707 19.164 9.277 6.591 5.409 4.757 4.347 4.066 3.863 3.708 3.587 3.490 3.411 3.344 3.287 3.239 3.197 3.160 3.127 3.098 3.009 2.922 2.839 2.790 2.696 2.650 2.635. 4 224.583 19.247 9.117 6.388 5.192 4.534 4.120 3.838 3.633 3.478 3.357 3.259 3.179 3.112 3.056 3.007 2.965 2.928 2.895 2.866 2.776 2.690 2.606 2.557 2.463 2.417 2.402. 11 242.981 19.405 8.763 5.936 4.704 4.027 3.603 3.313 3.102 2.943 2.818 2.717 2.635 2.565 2.507 2.456. 12 243.905 19.412 8.745 5.912 4.678 4.000 3.575 3.284 3.073 2.913 2.788 2.687 2.604 2.534 2.475 2.425. 13 244.690 19.419 8.729 5.891 4.655 3.976 3.550 3.259 3.048 2.887 2.761 2.660 2.577 2.507 2.448 2.397. 14 245.363 19.424 8.715 5.873 4.636 3.956 3.529 3.237 3.025 2.865 2.739 2.637 2.554 2.484 2.424 2.373. 5 230.160 19.296 9.013 6.256 5.050 4.387 3.972 3.688 3.482 3.326 3.204 3.106 3.025 2.958 2.901 2.852 2.810 2.773 2.740 2.711 2.621 2.534 2.449 2.400 2.305 2.259 2.244. 6 233.988 19.329 8.941 6.163 4.950 4.284 3.866 3.581 3.374 3.217 3.095 2.996 2.915 2.848 2.790 2.741 2.699 2.661 2.628 2.599 2.508 2.421 2.336 2.286 2.191 2.144 2.129. 7 236.767 19.353 8.887 6.094 4.876 4.207 3.787 3.500 3.293 3.135 3.012 2.913 2.832 2.764 2.707 2.657 2.614 2.577 2.544 2.514 2.423 2.334 2.249 2.199 2.103 2.056 2.040. 8 238.884 19.371 8.845 6.041 4.818 4.147 3.726 3.438 3.230 3.072 2.948 2.849 2.767 2.699 2.641 2.591 2.548 2.510 2.477 2.447 2.355 2.266 2.180 2.130 2.032 1.985 1.969. 9 240.543 19.385 8.812 5.999 4.772 4.099 3.677 3.388 3.179 3.020 2.896 2.796 2.714 2.646 2.588 2.538 2.494 2.456 2.423 2.393 2.300 2.211 2.124 2.073 1.975 1.927 1.911. 10 241.882 19.396 8.785 5.964 4.735 4.060 3.637 3.347 3.137 2.978 2.854 2.753 2.671 2.602 2.544 2.494 2.450 2.412 2.378 2.348 2.255 2.165 2.077 2.026 1.927 1.878 1.862. 17 246.917 19.437 8.683 5.832 4.590 3.908 3.480 3.187 2.974 2.812 2.685 2.583 2.499 2.428 2.368 2.317. 18 247.324 19.440 8.675 5.821 4.579 3.896 3.467 3.173 2.960 2.798 2.671 2.568 2.484 2.413 2.353 2.302. 19 247.688 19.443 8.667 5.811 4.568 3.884 3.455 3.161 2.948 2.785 2.658 2.555 2.471 2.400 2.340 2.288. 20 248.016 19.446 8.660 5.803 4.558 3.874 3.445 3.150 2.936 2.774 2.646 2.544 2.459 2.388 2.328 2.276. DENOMINATOR df D2. NUMERATOR df D1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16. 15 245.949 19.429 8.703 5.858 4.619 3.938 3.511 3.218 3.006 2.845 2.719 2.617 2.533 2.463 2.403 2.352. 16 246.466 19.433 8.692 5.844 4.604 3.922 3.494 3.202 2.989 2.828 2.701 2.599 2.515 2.445 2.385 2.333.

(405) APPENDIX H DENOMINATOR df D2. 17 18 19 20 24 30 40 50 100 200 300. NUMERATOR df D1. 11 2.413 2.374 2.340 2.310 2.216 2.126 2.038 1.986 1.886 1.837 1.821. 12 2.381 2.342 2.308 2.278 2.183 2.092 2.003 1.952 1.850 1.801 1.785. 13 2.353 2.314 2.280 2.250 2.155 2.063 1.974 1.921 1.819 1.769 1.753. 14 2.329 2.290 2.256 2.225 2.130 2.037 1.948 1.895 1.792 1.742 1.725. DENOMINATOR df D2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 50 100 200 300. 15 2.308 2.269 2.234 2.203 2.108 2.015 1.924 1.871 1.768 1.717 1.700. 16 2.289 2.250 2.215 2.184 2.088 1.995 1.904 1.850 1.746 1.694 1.677. 17 2.272 2.233 2.198 2.167 2.070 1.976 1.885 1.831 1.726 1.674 1.657. 18 2.257 2.217 2.182 2.151 2.054 1.960 1.868 1.814 1.708 1.656 1.638. 19 2.243 2.203 2.168 2.137 2.040 1.945 1.853 1.798 1.691 1.639 1.621. 20 2.230 2.191 2.155 2.124 2.027 1.932 1.839 1.784 1.676 1.623 1.606. NUMERATOR df D1. 24 249.052 19.454 8.638 5.774 4.527 3.841 3.410 3.115 2.900 2.737 2.609 2.505 2.420 2.349 2.288 2.235 2.190 2.150 2.114 2.082 1.984 1.887 1.793 1.737 1.627 1.572 1.554. 30 250.096 19.463 8.617 5.746 4.496 3.808 3.376 3.079 2.864 2.700 2.570 2.466 2.380 2.308 2.247 2.194 2.148 2.107 2.071 2.039 1.939 1.841 1.744 1.687 1.573 1.516 1.497. 40 251.144 19.471 8.594 5.717 4.464 3.774 3.340 3.043 2.826 2.661 2.531 2.426 2.339 2.266 2.204 2.151 2.104 2.063 2.026 1.994 1.892 1.792 1.693 1.634 1.515 1.455 1.435. 50 251.774 19.476 8.581 5.699 4.444 3.754 3.319 3.020 2.803 2.637 2.507 2.401 2.314 2.241 2.178 2.124 2.077 2.035 1.999 1.966 1.863 1.761 1.660 1.599 1.477 1.415 1.393. 100 253.043 19.486 8.554 5.664 4.405 3.712 3.275 2.975 2.756 2.588 2.457 2.350 2.261 2.187 2.123 2.068 2.020 1.978 1.940 1.907 1.800 1.695 1.589 1.525 1.392 1.321 1.296. 200 253.676 19.491 8.540 5.646 4.385 3.690 3.252 2.951 2.731 2.563 2.431 2.323 2.234 2.159 2.095 2.039 1.991 1.948 1.910 1.875 1.768 1.660 1.551 1.484 1.342 1.263 1.234. 300 253.887 19.492 8.536 5.640 4.378 3.683 3.245 2.943 2.723 2.555 2.422 2.314 2.225 2.150 2.085 2.030 1.981 1.938 1.899 1.865 1.756 1.647 1.537 1.469 1.323 1.240 1.210 (continued ). 861.

(406) 862. APPENDIX H. APPENDIX H (continued). f(F). df = D1 = 9 D2 = 15. F-Distribution Table: Upper 2.5% Probability (or 2.5% Area) under F-Distribution Curve. 0.025. F = 3.123. DENOMINATOR df D2. NUMERATOR df D1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 50 100 200 300. 1 647.793 38.506 17.443 12.218 10.007 8.813 8.073 7.571 7.209 6.937 6.724 6.554 6.414 6.298 6.200 6.115 6.042 5.978 5.922 5.871 5.717 5.568 5.424 5.340 5.179 5.100 5.075. 2 799.482 39.000 16.044 10.649 8.434 7.260 6.542 6.059 5.715 5.456 5.256 5.096 4.965 4.857 4.765 4.687 4.619 4.560 4.508 4.461 4.319 4.182 4.051 3.975 3.828 3.758 3.735. 3 864.151 39.166 15.439 9.979 7.764 6.599 5.890 5.416 5.078 4.826 4.630 4.474 4.347 4.242 4.153 4.077 4.011 3.954 3.903 3.859 3.721 3.589 3.463 3.390 3.250 3.182 3.160. 4 899.599 39.248 15.101 9.604 7.388 6.227 5.523 5.053 4.718 4.468 4.275 4.121 3.996 3.892 3.804 3.729 3.665 3.608 3.559 3.515 3.379 3.250 3.126 3.054 2.917 2.850 2.829. 5 921.835 39.298 14.885 9.364 7.146 5.988 5.285 4.817 4.484 4.236 4.044 3.891 3.767 3.663 3.576 3.502 3.438 3.382 3.333 3.289 3.155 3.026 2.904 2.833 2.696 2.630 2.609. 6 937.114 39.331 14.735 9.197 6.978 5.820 5.119 4.652 4.320 4.072 3.881 3.728 3.604 3.501 3.415 3.341 3.277 3.221 3.172 3.128 2.995 2.867 2.744 2.674 2.537 2.472 2.451. 7 948.203 39.356 14.624 9.074 6.853 5.695 4.995 4.529 4.197 3.950 3.759 3.607 3.483 3.380 3.293 3.219 3.156 3.100 3.051 3.007 2.874 2.746 2.624 2.553 2.417 2.351 2.330. 8 956.643 39.373 14.540 8.980 6.757 5.600 4.899 4.433 4.102 3.855 3.664 3.512 3.388 3.285 3.199 3.125 3.061 3.005 2.956 2.913 2.779 2.651 2.529 2.458 2.321 2.256 2.234. 9 963.279 39.387 14.473 8.905 6.681 5.523 4.823 4.357 4.026 3.779 3.588 3.436 3.312 3.209 3.123 3.049 2.985 2.929 2.880 2.837 2.703 2.575 2.452 2.381 2.244 2.178 2.156. 10 968.634 39.398 14.419 8.844 6.619 5.461 4.761 4.295 3.964 3.717 3.526 3.374 3.250 3.147 3.060 2.986 2.922 2.866 2.817 2.774 2.640 2.511 2.388 2.317 2.179 2.113 2.091. 11 973.028 39.407 14.374 8.794 6.568 5.410 4.709 4.243 3.912 3.665 3.474 3.321 3.197 3.095 3.008 2.934 2.870 2.814 2.765 2.721 2.586 2.458 2.334 2.263 2.124 2.058 2.036. 18 990.345 39.442 14.196 8.592 6.362 5.202 4.501 4.034 3.701 3.453 3.261 3.108 2.983 2.879 2.792 2.717. 19 991.800 39.446 14.181 8.575 6.344 5.184 4.483 4.016 3.683 3.435 3.243 3.090 2.965 2.861 2.773 2.698. 20 993.081 39.448 14.167 8.560 6.329 5.168 4.467 3.999 3.667 3.419 3.226 3.073 2.948 2.844 2.756 2.681. 24 30 997.272 1001.405 39.457 39.465 14.124 14.081 8.511 8.461 6.278 6.227 5.117 5.065 4.415 4.362 3.947 3.894 3.614 3.560 3.365 3.311 3.173 3.118 3.019 2.963 2.893 2.837 2.789 2.732 2.701 2.644 2.625 2.568. DENOMINATOR df D2. NUMERATOR df D1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16. 12 976.725 39.415 14.337 8.751 6.525 5.366 4.666 4.200 3.868 3.621 3.430 3.277 3.153 3.050 2.963 2.889. 13 979.839 39.421 14.305 8.715 6.488 5.329 4.628 4.162 3.831 3.583 3.392 3.239 3.115 3.012 2.925 2.851. 14 982.545 39.427 14.277 8.684 6.456 5.297 4.596 4.130 3.798 3.550 3.359 3.206 3.082 2.979 2.891 2.817. 15 984.874 39.431 14.253 8.657 6.428 5.269 4.568 4.101 3.769 3.522 3.330 3.177 3.053 2.949 2.862 2.788. 16 986.911 39.436 14.232 8.633 6.403 5.244 4.543 4.076 3.744 3.496 3.304 3.152 3.027 2.923 2.836 2.761. 17 988.715 39.439 14.213 8.611 6.381 5.222 4.521 4.054 3.722 3.474 3.282 3.129 3.004 2.900 2.813 2.738.

(407) APPENDIX H DENOMINATOR df D2. 17 18 19 20 24 30 40 50 100 200 300. NUMERATOR df D1. 12 2.825 2.769 2.720 2.676 2.541 2.412 2.288 2.216 2.077 2.010 1.988. 13 2.786 2.730 2.681 2.637 2.502 2.372 2.248 2.176 2.036 1.969 1.947. 14 2.753 2.696 2.647 2.603 2.468 2.338 2.213 2.140 2.000 1.932 1.910. 15 2.723 2.667 2.617 2.573 2.437 2.307 2.182 2.109 1.968 1.900 1.877. 16 2.697 2.640 2.591 2.547 2.411 2.280 2.154 2.081 1.939 1.870 1.848. 17 2.673 2.617 2.567 2.523 2.386 2.255 2.129 2.056 1.913 1.844 1.821. 18 2.652 2.596 2.546 2.501 2.365 2.233 2.107 2.033 1.890 1.820 1.797. 19 2.633 2.576 2.526 2.482 2.345 2.213 2.086 2.012 1.868 1.798 1.775. 20 2.616 2.559 2.509 2.464 2.327 2.195 2.068 1.993 1.849 1.778 1.755. 24 2.560 2.503 2.452 2.408 2.269 2.136 2.007 1.931 1.784 1.712 1.688. 30 2.502 2.445 2.394 2.349 2.209 2.074 1.943 1.866 1.715 1.640 1.616. DENOMINATOR df D2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 50 100 200 300. NUMERATOR df D1. 40 50 100 200 300 1005.596 1008.098 1013.163 1015.724 1016.539 39.473 39.478 39.488 39.493 39.495 14.036 14.010 13.956 13.929 13.920 8.411 8.381 8.319 8.288 8.278 6.175 6.144 6.080 6.048 6.037 5.012 4.980 4.915 4.882 4.871 4.309 4.276 4.210 4.176 4.165 3.840 3.807 3.739 3.705 3.693 3.505 3.472 3.403 3.368 3.357 3.255 3.221 3.152 3.116 3.104 3.061 3.027 2.956 2.920 2.908 2.906 2.871 2.800 2.763 2.750 2.780 2.744 2.671 2.634 2.621 2.674 2.638 2.565 2.526 2.513 2.585 2.549 2.474 2.435 2.422 2.509 2.472 2.396 2.357 2.343 2.442 2.405 2.329 2.289 2.275 2.384 2.347 2.269 2.229 2.215 2.333 2.295 2.217 2.176 2.162 2.287 2.249 2.170 2.128 2.114 2.146 2.107 2.024 1.981 1.966 2.009 1.968 1.882 1.835 1.819 1.875 1.832 1.741 1.691 1.673 1.796 1.752 1.656 1.603 1.584 1.640 1.592 1.483 1.420 1.397 1.562 1.511 1.393 1.320 1.293 1.536 1.484 1.361 1.285 1.255 (continued ). 863.

(408) 864. APPENDIX H. APPENDIX H (continued). f(F). df = D1 = 7 D2 = 14. F-Distribution Table: Upper 1% Probability (or 1% Area) under F-Distribution Curve. 0.01. F = 4.278. F. DENOMINATOR df D2. NUMERATOR df D1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 50 100 200 300. 1 2 3 4 5 6 7 8 9 10 11 4052.185 4999.340 5403.534 5624.257 5763.955 5858.950 5928.334 5980.954 6022.397 6055.925 6083.399 98.502 99.000 99.164 99.251 99.302 99.331 99.357 99.375 99.390 99.397 99.408 34.116 30.816 29.457 28.710 28.237 27.911 27.671 27.489 27.345 27.228 27.132 21.198 18.000 16.694 15.977 15.522 15.207 14.976 14.799 14.659 14.546 14.452 16.258 13.274 12.060 11.392 10.967 10.672 10.456 10.289 10.158 10.051 9.963 13.745 10.925 9.780 9.148 8.746 8.466 8.260 8.102 7.976 7.874 7.790 12.246 9.547 8.451 7.847 7.460 7.191 6.993 6.840 6.719 6.620 6.538 11.259 8.649 7.591 7.006 6.632 6.371 6.178 6.029 5.911 5.814 5.734 10.562 8.022 6.992 6.422 6.057 5.802 5.613 5.467 5.351 5.257 5.178 10.044 7.559 6.552 5.994 5.636 5.386 5.200 5.057 4.942 4.849 4.772 9.646 7.206 6.217 5.668 5.316 5.069 4.886 4.744 4.632 4.539 4.462 9.330 6.927 5.953 5.412 5.064 4.821 4.640 4.499 4.388 4.296 4.220 9.074 6.701 5.739 5.205 4.862 4.620 4.441 4.302 4.191 4.100 4.025 8.862 6.515 5.564 5.035 4.695 4.456 4.278 4.140 4.030 3.939 3.864 8.683 6.359 5.417 4.893 4.556 4.318 4.142 4.004 3.895 3.805 3.730 8.531 6.226 5.292 4.773 4.437 4.202 4.026 3.890 3.780 3.691 3.616 8.400 6.112 5.185 4.669 4.336 4.101 3.927 3.791 3.682 3.593 3.518 8.285 6.013 5.092 4.579 4.248 4.015 3.841 3.705 3.597 3.508 3.434 8.185 5.926 5.010 4.500 4.171 3.939 3.765 3.631 3.523 3.434 3.360 8.096 5.849 4.938 4.431 4.103 3.871 3.699 3.564 3.457 3.368 3.294 7.823 5.614 4.718 4.218 3.895 3.667 3.496 3.363 3.256 3.168 3.094 7.562 5.390 4.510 4.018 3.699 3.473 3.305 3.173 3.067 2.979 2.906 7.314 5.178 4.313 3.828 3.514 3.291 3.124 2.993 2.888 2.801 2.727 7.171 5.057 4.199 3.720 3.408 3.186 3.020 2.890 2.785 2.698 2.625 6.895 4.824 3.984 3.513 3.206 2.988 2.823 2.694 2.590 2.503 2.430 6.763 4.713 3.881 3.414 3.110 2.893 2.730 2.601 2.497 2.411 2.338 6.720 4.677 3.848 3.382 3.079 2.862 2.699 2.571 2.467 2.380 2.307. DENOMINATOR df D2. NUMERATOR df D1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16. 12 13 14 15 16 17 18 19 20 24 30 6106.682 6125.774 6143.004 6156.974 6170.012 6181.188 6191.432 6200.746 6208.662 6234.273 6260.350 99.419 99.422 99.426 99.433 99.437 99.441 99.444 99.448 99.448 99.455 99.466 27.052 26.983 26.924 26.872 26.826 26.786 26.751 26.719 26.690 26.597 26.504 14.374 14.306 14.249 14.198 14.154 14.114 14.079 14.048 14.019 13.929 13.838 9.888 9.825 9.770 9.722 9.680 9.643 9.609 9.580 9.553 9.466 9.379 7.718 7.657 7.605 7.559 7.519 7.483 7.451 7.422 7.396 7.313 7.229 6.469 6.410 6.359 6.314 6.275 6.240 6.209 6.181 6.155 6.074 5.992 5.667 5.609 5.559 5.515 5.477 5.442 5.412 5.384 5.359 5.279 5.198 5.111 5.055 5.005 4.962 4.924 4.890 4.860 4.833 4.808 4.729 4.649 4.706 4.650 4.601 4.558 4.520 4.487 4.457 4.430 4.405 4.327 4.247 4.397 4.342 4.293 4.251 4.213 4.180 4.150 4.123 4.099 4.021 3.941 4.155 4.100 4.052 4.010 3.972 3.939 3.910 3.883 3.858 3.780 3.701 3.960 3.905 3.857 3.815 3.778 3.745 3.716 3.689 3.665 3.587 3.507 3.800 3.745 3.698 3.656 3.619 3.586 3.556 3.529 3.505 3.427 3.348 3.666 3.612 3.564 3.522 3.485 3.452 3.423 3.396 3.372 3.294 3.214 3.553 3.498 3.451 3.409 3.372 3.339 3.310 3.283 3.259 3.181 3.101.

(409) APPENDIX H DENOMINATOR df D2. 17 18 19 20 24 30 40 50 100 200 300. NUMERATOR df D1. 12 3.455 3.371 3.297 3.231 3.032 2.843 2.665 2.563 2.368 2.275 2.244. 13 3.401 3.316 3.242 3.177 2.977 2.789 2.611 2.508 2.313 2.220 2.190. 14 3.353 3.269 3.195 3.130 2.930 2.742 2.563 2.461 2.265 2.172 2.142. 15 3.312 3.227 3.153 3.088 2.889 2.700 2.522 2.419 2.223 2.129 2.099. 16 3.275 3.190 3.116 3.051 2.852 2.663 2.484 2.382 2.185 2.091 2.061. 17 3.242 3.158 3.084 3.018 2.819 2.630 2.451 2.348 2.151 2.057 2.026. DENOMINATOR df D2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 50 100 200 300. NUMERATOR df D1. 40 50 100 200 300 6286.427 6302.260 6333.925 6349.757 6355.345 99.477 99.477 99.491 99.491 99.499 26.411 26.354 26.241 26.183 26.163 13.745 13.690 13.577 13.520 13.501 9.291 9.238 9.130 9.075 9.057 7.143 7.091 6.987 6.934 6.916 5.908 5.858 5.755 5.702 5.685 5.116 5.065 4.963 4.911 4.894 4.567 4.517 4.415 4.363 4.346 4.165 4.115 4.014 3.962 3.944 3.860 3.810 3.708 3.656 3.638 3.619 3.569 3.467 3.414 3.397 3.425 3.375 3.272 3.219 3.202 3.266 3.215 3.112 3.059 3.040 3.132 3.081 2.977 2.923 2.905 3.018 2.967 2.863 2.808 2.790 2.920 2.869 2.764 2.709 2.691 2.835 2.784 2.678 2.623 2.604 2.761 2.709 2.602 2.547 2.528 2.695 2.643 2.535 2.479 2.460 2.492 2.440 2.329 2.271 2.251 2.299 2.245 2.131 2.070 2.049 2.114 2.058 1.938 1.874 1.851 2.007 1.949 1.825 1.757 1.733 1.797 1.735 1.598 1.518 1.490 1.694 1.629 1.481 1.391 1.357 1.660 1.594 1.441 1.346 1.309. 18 3.212 3.128 3.054 2.989 2.789 2.600 2.421 2.318 2.120 2.026 1.995. 19 3.186 3.101 3.027 2.962 2.762 2.573 2.394 2.290 2.092 1.997 1.966. 20 3.162 3.077 3.003 2.938 2.738 2.549 2.369 2.265 2.067 1.971 1.940. 24 3.083 2.999 2.925 2.859 2.659 2.469 2.288 2.183 1.983 1.886 1.854. 30 3.003 2.919 2.844 2.778 2.577 2.386 2.203 2.098 1.893 1.794 1.761. 865.

(410) 866. APPENDIX I. APPENDIX I. Fmax =. Critical Values of Hartley’s Fmax Test. 2 S largest. ∼ Fmax. 2 Ssmallest. 1a(c,v). UPPER 5% POINTS (a 0.05). c. v 2 3 4 5 6 7 8 9 10 12 15 20 30 60 ∞. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 39.0 15.4 9.60 7.15 5.82 4.99 4.43 4.03 3.72 3.28 2.86 2.46 2.07 1.67 1.00. 87.5 27.8 15.5 10.8 8.38 6.94 6.00 5.34 4.85 4.16 3.54 2.95 2.40 1.85 1.00. 142 39.2 20.6 13.7 10.4 8.44 7.18 6.31 5.67 4.79 4.01 3.29 2.61 1.96 1.00. 202 50.7 25.2 16.3 12.1 9.70 8.12 7.11 6.34 5.30 4.37 3.54 2.78 2.04 1.00. 266 62.0 29.5 18.7 13.7 10.8 9.03 7.80 6.92 5.72 4.68 3.76 2.91 2.11 1.00. 333 72.9 33.6 20.8 15.0 11.8 9.78 8.41 7.42 6.09 4.95 3.94 3.02 2.17 1.00. 403 83.5 37.5 22.9 16.3 12.7 10.5 8.95 7.87 6.42 5.19 4.10 3.12 2.22 1.00. 475 93.9 41.1 24.7 17.5 13.5 11.1 9.45 8.28 6.72 5.40 4.24 3.21 2.26 1.00. 550 104 44.6 26.5 18.6 14.3 11.7 9.91 8.66 7.00 5.59 4.37 3.29 2.30 1.00. 626 114 48.0 28.2 19.7 15.1 12.2 10.3 9.01 7.25 5.77 4.49 3.36 2.33 1.00. 704 124 51.4 29.9 20.7 15.8 12.7 10.7 9.34 7.48 5.93 4.59 3.39 2.36 1.00. 9. 10. 11. 12. UPPER 1% POINTS (a 0.01). c. 2 3 v 2 199 448 3 47.5 85 4 23.2 37 5 14.9 22 6 11.1 15.5 7 8.89 12.1 8 7.50 9.9 9 6.54 8.5 10 5.85 7.4 12 4.91 6.1 15 4.07 4.9 20 3.32 3.8 30 2.63 3.0 60 1.96 2.2 ∞ 1.00 1.0. 4 729 120 49 28 19.1 14.5 11.7 9.9 8.6 6.9 5.5 4.3 3.3 2.3 1.0. 5 1036 151 59 33 22 16.5 13.2 11.1 9.6 7.6 6.0 4.6 3.4 2.4 1.0. 6 1362 184 69 38 25 18.4 14.5 12.1 10.4 8.2 6.4 4.9 3.6 2.4 1.0. 7. 8. 1705 2063 2432 2813 3204 3605 21(6) 24(9) 28(1) 31(0) 33(7) 36(1) 79 89 97 106 113 120 42 46 50 54 57 60 27 30 32 34 36 37 20 22 23 24 26 27 15.8 16.9 17.9 18.9 19.8 21 13.1 13.9 14.7 15.3 16.0 16.6 11.1 11.8 12.4 12.9 13.4 13.9 8.7 9.1 9.5 9.9 10.2 10.6 6.7 7.1 7.3 7.5 7.8 8.0 5.1 5.3 5.5 5.6 5.8 5.9 3.7 3.8 3.9 4.0 4.1 4.2 2.5 2.5 2.6 2.6 2.7 2.7 1.0 1.0 1.0 1.0 1.00 1.0. 2 Note: Slargest is the largest and the s2smallest the smallest in a set of c independent mean squares, each based on v degrees of freedom. Source: Reprinted from E. S. Pearson and H. O. Hartley, eds., Biometrika Tables for Statisticians, 3d ed. (New York: Cambridge University Press, 1966), by permission of the Biometrika Trustees..

(411) APPENDIX J. APPENDIX J Distribution of the Studentized Range (q-values). 867. p 0.95. D2 D1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 60 120 ∞. 17.97 6.08 4.50 3.93 3.64 3.46 3.34 3.26 3.20 3.15 3.11 3.08 3.06 3.03 3.01 3.00 2.98 2.97 2.96 2.95 2.92 2.89 2.86 2.83 2.80 2.77. 26.98 8.33 5.91 5.04 4.60 4.34 4.16 4.04 3.95 3.88 3.82 3.77 3.73 3.70 3.67 3.65 3.63 3.61 3.59 3.58 3.53 3.49 3.44 3.40 3.36 3.31. 32.82 9.80 6.82 5.76 5.22 4.90 4.68 4.53 4.41 4.33 4.26 4.20 4.15 4.11 4.08 4.05 4.02 4.00 3.98 3.96 3.90 3.85 3.79 3.74 3.68 3.63. 37.08 10.88 7.50 6.29 5.67 5.30 5.06 4.89 4.76 4.65 4.57 4.51 4.45 4.41 4.37 4.33 4.30 4.28 4.25 4.23 4.17 4.10 4.04 3.98 3.92 3.86. 40.41 11.74 8.04 6.71 6.03 5.63 5.36 5.17 5.02 4.91 4.82 4.75 4.69 4.64 4.59 4.56 4.52 4.49 4.47 4.45 4.37 4.30 4.23 4.16 4.10 4.03. 43.12 12.44 8.48 7.05 6.33 5.90 5.61 5.40 5.24 5.12 5.03 4.95 4.88 4.83 4.78 4.74 4.70 4.67 4.65 4.62 4.54 4.46 4.39 4.31 4.24 4.17. 45.40 13.03 8.85 7.35 6.58 6.12 5.82 5.60 5.43 5.30 5.20 5.12 5.05 4.99 4.94 4.90 4.86 4.82 4.79 4.77 4.68 4.60 4.52 4.44 4.36 4.29. 47.36 13.54 9.18 7.60 6.80 6.32 6.00 5.77 5.59 5.46 5.35 5.27 5.19 5.13 5.08 5.03 4.99 4.96 4.92 4.90 4.81 4.72 4.63 4.55 4.47 4.39. 49.07 13.99 9.46 7.83 6.99 6.49 6.16 5.92 5.74 5.60 5.49 5.39 5.32 5.25 5.20 5.15 5.11 5.07 5.04 5.01 4.92 4.82 4.73 4.65 4.56 4.47. D2. D1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 60 120 ∞. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 50.59 14.39 9.72 8.03 7.17 6.65 6.30 6.05 5.87 5.72 5.61 5.51 5.43 5.36 5.31 5.26 5.21 5.17 5.14 5.11 5.01 4.92 4.82 4.73 4.64 4.55. 51.96 14.75 9.95 8.21 7.32 6.79 6.43 6.18 5.98 5.83 5.71 5.61 5.53 5.46 5.40 5.35 5.31 5.27 5.23 5.20 5.10 5.00 4.90 4.81 4.71 4.62. 53.20 15.08 10.15 8.37 7.47 6.92 6.55 6.29 6.09 5.93 5.81 5.71 5.63 5.55 5.49 5.44 5.39 5.35 5.31 5.28 5.18 5.08 4.98 4.88 4.78 4.68. 54.33 15.38 10.35 8.52 7.60 7.03 6.66 6.39 6.19 6.03 5.90 5.80 5.71 5.64 5.57 5.52 5.47 5.43 5.39 5.36 5.25 5.15 5.04 4.94 4.84 4.74. 55.36 15.65 10.52 8.66 7.72 7.14 6.76 6.48 6.28 6.11 5.98 5.88 5.79 5.71 5.65 5.59 5.54 5.50 5.46 5.43 5.32 5.21 5.11 5.00 4.90 4.80. 56.32 15.91 10.69 8.79 7.83 7.24 6.85 6.57 6.36 6.19 6.06 5.95 5.86 5.79 5.72 5.66 5.61 5.57 5.53 5.49 5.38 5.27 5.16 5.06 4.95 4.85. 57.22 16.14 10.84 8.91 7.93 7.34 6.94 6.65 6.44 6.27 6.13 6.02 5.93 5.85 5.78 5.73 5.67 5.63 5.59 5.55 5.44 5.33 5.22 5.11 5.00 4.89. 58.04 16.37 10.98 9.03 8.03 7.43 7.02 6.73 6.51 6.34 6.20 6.09 5.99 5.91 5.85 5.79 5.73 5.69 5.65 5.61 5.49 5.38 5.27 5.15 5.04 4.93. 58.83 16.57 11.11 9.13 8.12 7.51 7.10 6.80 6.58 6.40 6.27 6.15 6.05 5.97 5.90 5.84 5.79 5.74 5.70 5.66 5.55 5.43 5.31 5.20 5.09 4.97. 59.56 16.77 11.24 9.23 8.21 7.59 7.17 6.87 6.64 6.47 6.33 6.21 6.11 6.03 5.96 5.90 5.84 5.79 5.75 5.71 5.59 5.47 5.36 5.24 5.13 5.01. Note: D1 K populations and D2 N − K..

(412) 868. APPENDIX J p 0.99. D2 D1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 60 120 ∞. 90.03 14.04 8.26 6.51 5.70 5.24 4.95 4.75 4.60 4.48 4.39 4.32 4.26 4.21 4.17 4.13 4.10 4.07 4.05 4.02 3.96 3.89 3.82 3.76 3.70 3.64. 135.0 19.02 10.62 8.12 6.98 6.33 5.92 5.64 5.43 5.27 5.15 5.05 4.96 4.89 4.84 4.79 4.74 4.70 4.67 4.64 4.55 4.45 4.37 4.28 4.20 4.12. 164.3 22.29 12.17 9.17 7.80 7.03 6.54 6.20 5.96 5.77 5.62 5.50 5.40 5.32 5.25 5.19 5.14 5.09 5.05 5.02 4.91 4.80 4.70 4.59 4.50 4.40. 185.6 24.72 13.33 9.96 8.42 7.56 7.01 6.62 6.35 6.14 5.97 5.84 5.73 5.63 5.56 5.49 5.43 5.38 5.33 5.29 5.17 5.05 4.93 4.82 4.71 4.60. 202.2 26.63 14.24 10.58 8.91 7.97 7.37 6.96 6.66 6.43 6.25 6.10 5.98 5.88 5.80 5.72 5.66 5.60 5.55 5.51 5.37 5.24 5.11 4.99 4.87 4.76. 215.8 28.20 15.00 11.10 9.32 8.32 7.68 7.24 6.91 6.67 6.48 6.32 6.19 6.08 5.99 5.92 5.85 5.79 5.73 5.69 5.54 5.40 5.26 5.13 5.01 4.88. 227.2 29.53 15.64 11.55 9.67 8.61 7.94 7.47 7.13 6.87 6.67 6.51 6.37 6.26 6.16 6.08 6.01 5.94 5.89 5.84 5.69 5.54 5.39 5.25 5.12 4.99. 237.0 30.68 16.20 11.93 9.97 8.87 8.17 7.68 7.33 7.05 6.84 6.67 6.53 6.41 6.31 6.22 6.15 6.08 6.02 5.97 5.81 5.65 5.50 5.36 5.21 5.08. 245.6 31.69 16.69 12.27 10.24 9.10 8.37 7.86 7.49 7.21 6.99 6.81 6.67 6.54 6.44 6.35 6.27 6.20 6.14 6.09 5.92 5.76 5.60 5.45 5.30 5.16. D2. D1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 60 120 ∞. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 253.2 32.59 17.13 12.57 10.48 9.30 8.55 8.03 7.65 7.36 7.13 6.94 6.79 6.66 6.55 6.46 6.38 6.31 6.25 6.19 6.02 5.85 5.69 5.53 5.37 5.23. 260.0 33.40 17.53 12.84 10.70 9.48 8.71 8.18 7.78 7.49 7.25 7.06 6.90 6.77 6.66 6.56 6.48 6.41 6.34 6.28 6.11 5.93 5.76 5.60 5.44 5.28. 266.2 34.13 17.89 13.09 10.89 9.65 8.86 8.31 7.91 7.60 7.36 7.17 7.01 6.87 6.76 6.66 6.57 6.50 6.43 6.37 6.19 6.01 5.83 5.67 5.50 5.35. 271.8 34.81 18.22 13.32 11.08 9.81 9.00 8.44 8.03 7.71 7.46 7.26 7.10 6.96 6.84 6.74 6.66 6.58 6.51 6.45 6.26 6.08 5.90 5.73 5.56 5.40. 277.0 35.43 18.52 13.53 11.24 9.95 9.12 8.55 8.13 7.81 7.56 7.36 7.19 7.05 6.93 6.82 6.73 6.65 6.58 6.52 6.33 6.14 5.96 5.78 5.61 5.45. 281.8 36.00 18.81 13.73 11.40 10.08 9.24 8.66 8.23 7.91 7.65 7.44 7.27 7.13 7.00 6.90 6.81 6.73 6.65 6.59 6.39 6.20 6.02 5.84 5.66 5.49. 286.3 36.53 19.07 13.91 11.55 10.21 9.35 8.76 8.33 7.99 7.73 7.52 7.35 7.20 7.07 6.97 6.87 6.79 6.72 6.65 6.45 6.26 6.07 5.89 5.71 5.54. 290.4 37.03 19.32 14.08 11.68 10.32 9.46 8.85 8.41 8.08 7.81 7.59 7.42 7.27 7.14 7.03 6.94 6.85 6.78 6.71 6.51 6.31 6.12 5.93 5.75 5.57. 294.3 37.50 19.55 14.24 11.81 10.43 9.55 8.94 8.49 8.15 7.88 7.66 7.48 7.33 7.20 7.09 7.00 6.91 6.84 6.77 6.56 6.36 6.16 5.97 5.79 5.61. 298.0 37.95 19.77 14.40 11.93 10.54 9.65 9.03 8.57 8.23 7.95 7.73 7.55 7.39 7.26 7.15 7.05 6.97 6.89 6.82 6.61 6.41 6.21 6.01 5.83 5.65. Source: Reprinted with permission from E. S. Pearson and H. O. Hartley, Biometrika Tables for Statisticians (New York: Cambridge University Press, 1954)..

(413) APPENDIX K. APPENDIX K Critical Values of r in the Runs Test a. Lower Tail: Too Few Runs. n1. 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20. n1. b. Upper Tail: Too Many Runs. n2. n2. 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20. 2. 3. 4. 5. 6. 2 2 2 2 2 2 2 2 2. 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3. 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4. 2 2 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5. 2 2 3 3 3 3 4 4 4 4 5 5 5 5 5 5 6 6. 2. 3. 4. 5. 6. 9 9 9 10 10 9 10 11 11 12 11 12 13 13 13 13. 7. 8. 9. 10. 869. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 2 2 3 4 5 5 6 6 7 7 8 8 9 9 9 10 10 10 10. 2 2 3 4 5 5 6 7 7 8 8 9 9 9 10 10 10 11 11. 2 3 3 4 5 6 6 7 7 8 8 9 9 10 10 11 11 11 12. 2 3 4 4 5 6 6 7 8 8 9 9 10 10 11 11 11 12 12. 2 3 4 4 5 6 7 7 8 9 9 10 10 11 11 11 12 12 13. 2 3 4 5 5 6 7 8 8 9 9 10 10 11 11 12 12 13 13. 2 3 4 5 6 6 7 8 8 9 10 10 11 11 12 12 13 13 13. 2 3 4 5 6 6 7 8 9 9 10 10 11 12 12 13 13 13 14. 2 2 3 3 3 4 4 5 5 5 5 5 6 6 6 6 6 6. 2 3 3 3 4 4 5 5 5 6 6 6 6 6 7 7 7 7. 2 3 3 4 4 5 5 5 6 6 6 7 7 7 7 8 8 8. 2 3 3 4 5 5 5 6 6 7 7 7 7 8 8 8 8 9. 2 3 4 4 5 5 6 6 7 7 7 8 8 8 9 9 9 9. 2 2 3 4 4 5 6 6 7 7 7 8 8 8 9 9 9 10 10. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 11 12 13 13 14 14 14 14 15 15 15. 11 12 13 14 14 15 15 16 16 16 16 17 17 17 17 17. 13 14 14 15 16 16 16 17 17 18 18 18 18 18 18. 13 14 15 16 16 17 17 18 18 18 19 19 19 20 20. 13 14 15 16 17 17 18 19 19 19 20 20 20 21 21. 13 14 16 16 17 18 19 19 20 20 21 21 21 22 22. 15 16 17 18 19 19 20 20 21 21 22 22 23 23. 15 16 17 18 19 20 20 21 22 22 23 23 23 24. 15 16 18 18 19 20 21 22 22 23 23 24 24 25. 17 18 19 20 21 21 22 23 23 24 25 25 25. 17 18 19 20 21 22 23 23 24 25 25 26 26. 17 18 19 20 21 22 23 24 25 25 26 26 27. 17 18 20 21 22 23 23 24 25 26 26 27 27. 17 18 20 21 22 23 24 25 25 26 27 27 28. Source: Adapted from Frieda S. Swed and C. Eisenhart, “Tables for testing randomness of grouping in a sequence of alternatives,” Ann. Math. Statist. 14 (1943): 83–86, with the permission of the publisher..

(414) 870. APPENDIX L. APPENDIX L U. Mann-Whitney U Test Probabilities (n 9). n1 0 1 2 3 4 5. U. n1. 0 1 2 3 4 5 6 7 8 9 10 11 12 13. U. n1. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25. n2 3 1. 2. 3. .250 .500 .750. .100 .200 .400 .600. .050 .100 .200 .350 .500 .650. U. n1 0 1 2 3 4 5 6 7 8. n2 5 1. 2. 3. 4. 5. .167 .333 .500 .667. .047 .095 .190 .286 .429 .571. .018 .036 .071 .125 .196 .286 .393 .500 .607. .008 .016 .032 .056 .095 .143 .206 .278 .365 .452 .548. .004 .008 .016 .028 .048 .075 .111 .155 .210 .274 .345 .421 .500 .579. U. n1. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18. n2 7 1. 2. 3. 4. 5. 6. 7. .125 .250 .375 .500 .625. .028 .056 .111 .167 .250 .333 .444 .556. .008 .017 .033 .058 .092 .133 .192 .258 .333 .417 .500 .583. .003 .006 .012 .021 .036 .055 .082 .115 .158 .206 .264 .324 .394 .464 .538. .001 .003 .005 .009 .015 .024 .037 .053 .074 .101 .134 .172 .216 .265 .319 .378 .438 .500 .562. .001 .001 .002 .004 .007 .011 .017 .026 .037 .051 .069 .090 .117 .147 .183 .223 .267 .314 .365 .418 .473 .527. .000 .001 .001 .002 .003 .006 .009 .013 .019 .027 .036 .049 .064 .082 .104 .130 .159 .191 .228 .267 .310 .355 .402 .451 .500 .549. n2 4 1. 2. 3. 4. .200 .400 .600. .067 .133 .267 .400 .600. .028 .057 .114 .200 .314 .429 .571. .014 .029 .057 .100 .171 .243 .343 .443 .557. n2 6 1. 2. 3. 4. 5. 6. .143 .286 .428 .571. .036 .071 .143 .214 .321 .429 .571. .012 .024 .048 .083 .131 .190 .274 .357 .452 .548. .005 .010 .019 .033 .057 .086 .129 .176 .238 .305 .381 .457 .545. .002 .004 .009 .015 .026 .041 .063 .089 .123 .165 .214 .268 .331 .396 .465 .535. .001 .002 .004 .008 .013 .021 .032 .047 .066 .090 .120 .155 .197 .242 .294 .350 .409 .469 .531.

(415) APPENDIX L. U. n1. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32. 871. n2 8 1. 2. 3. 4. 5. 6. 7. 8. t. Normal. .111 .222 .333 .444 .556. .022 .044 .089 .133 .200 .267 .356 .444 .556. .006 .012 .024 .042 .067 .097 .139 .188 .248 .315 .387 .461 .539. .002 .004 .008 .014 .024 .036 .055 .077 .107 .141 .184 .230 .285 .341 .404 .467 .533. .001 .002 .003 .005 .009 .015 .023 .033 .047 .064 .085 .111 .142 .177 .217 .262 .311 .362 .416 .472 .528. .000 .001 .001 .002 .004 .006 .010 .015 .021 .030 .041 .054 .071 .091 .114 .141 .172 .207 .245 .286 .331 .377 .426 .475 .525. .000 .000 .001 .001 .002 .003 .005 .007 .010 .014 .020 .027 .036 .047 .060 .076 .095 .116 .140 .168 .198 .232 .268 .306 .347 .389 .433 .478 .522. .000 .000 .000 .001 .001 .001 .002 .003 .005 .007 .010 .014 .019 .025 .032 .041 .052 .065 .080 .097 .117 .139 .164 .191 .221 .253 .287 .323 .360 .399 .439 .480 .520. 3.308 3.203 3.098 2.993 2.888 2.783 2.678 2.573 2.468 2.363 2.258 2.153 2.048 1.943 1.838 1.733 1.628 1.523 1.418 1.313 1.208 1.102 .998 .893 .788 .683 .578 .473 .368 .263 .158 .052. .001 .001 .001 .001 .002 .003 .004 .005 .007 .009 .012 .016 .020 .026 .033 .041 .052 .064 .078 .094 .113 .135 .159 .185 .215 .247 .282 .318 .356 .396 .437 .481. Source: Reproduced from H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,” Ann. Math. Statist. 18 (1947): 52–54, with the permission of the publisher..

(416) 872. APPENDIX M. APPENDIX M Mann-Whitney U Test Critical Values (9 ≤ n ≤ 20) Critical Values of U for a One-Tailed Test at a 0.001 or for a Two-Tailed Test at a 0.002. Critical Values of U for a One-Tailed Test at a 0.01 or for a Two-Tailed Test at a 0.02. n1. n2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20. n1. n2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 2 3 5 7 8 10 12 14 15 17 19 21 23 25 26. 0 1 3 5 6 8 10 12 14 17 19 21 23 25 27 29 32. 0 2 4 6 8 10 12 15 17 20 22 24 27 29 32 34 37. 0 2 4 7 9 12 14 17 20 23 25 28 31 34 37 40 42. 1 3 5 8 11 14 17 20 23 26 29 32 35 38 42 45 48. 1 3 6 9 12 15 19 22 25 29 32 36 39 43 46 50 54. 1 4 7 10 14 17 21 24 28 32 36 40 43 47 51 55 59. 2 5 8 11 15 19 23 27 31 35 39 43 48 52 56 60 65. 0 2 5 9 13 17 21 25 29 34 38 43 47 52 57 61 66 70. 0 3 6 10 14 18 23 27 32 37 42 46 51 56 61 66 71 76. 0 3 7 11 15 20 25 29 34 40 45 50 55 60 66 71 77 82. 0 3 7 12 16 21 26 32 37 42 48 54 59 65 70 76 82 88. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 3 5 7 9 11 14 16 18 21 23 26 28 31 33 36 38 40. 1 3 6 8 11 13 16 19 22 24 27 30 33 36 38 41 44 47. 1 4 7 9 12 15 18 22 25 28 31 34 37 41 44 47 50 53. 2 5 8 11 14 17 21 24 28 31 35 38 42 46 49 53 56 60. 0 2 5 9 12 16 20 23 27 31 35 39 43 47 51 55 59 63 67. 0 2 6 10 13 17 22 26 30 34 38 43 47 51 56 60 65 69 73. 0 3 7 11 15 19 24 28 33 37 42 47 51 56 61 66 70 75 80. 0 3 7 12 16 21 26 31 36 41 46 51 56 61 66 71 76 82 87. 0 4 8 13 18 23 28 33 38 44 49 55 60 66 71 77 82 88 93. 0 4 9 14 19 24 30 36 41 47 53 59 65 70 76 82 88 94 100. 1 4 9 15 20 26 32 38 44 50 56 63 69 75 82 88 94 101 107. 1 5 10 16 22 28 34 40 47 53 60 67 73 80 87 93 100 107 114.

(417) APPENDIX M Critical Values of U for a One-Tailed Test at a 0.025 or for a Two-Tailed Test at a 0.05. Critical Values of U for a One-Tailed Test at a 0.05 or for a Two-Tailed Test at a 0.10. n1. n2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20. n1. n2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20. 873. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 0 2 4 7 10 12 15 17 20 23 26 28 31 34 37 39 42 45 48. 0 3 5 8 11 14 17 20 23 26 29 33 36 39 42 45 48 52 55. 0 3 6 9 13 16 19 23 26 30 33 37 40 44 47 51 55 58 62. 1 4 7 11 14 18 22 26 29 33 37 41 45 49 53 57 61 65 69. 1 4 8 12 16 20 24 28 33 37 41 45 50 54 59 63 67 72 76. 1 5 9 13 17 22 26 31 36 40 45 50 55 59 64 67 74 78 83. 1 5 10 14 19 24 29 34 39 44 49 54 59 64 70 75 80 85 90. 1 6 11 15 21 26 31 37 42 47 53 59 64 70 75 81 86 92 98. 2 6 11 17 22 28 34 39 45 51 57 63 67 75 81 87 93 99 105. 2 7 12 18 24 30 36 42 48 55 61 67 74 80 86 93 99 106 112. 2 7 13 19 25 32 38 45 52 58 65 72 78 85 92 99 106 113 119. 2 8 13 20 27 34 41 48 55 62 69 76 83 90 98 105 112 119 127. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 4 9 16 22 28 35 41 48 55 61 68 75 82 88 95 102 109 116 123. 0 4 10 17 23 30 37 44 51 58 65 72 80 87 94 101 109 116 123 130. 0 4 11 18 25 32 39 47 54 62 69 77 84 92 100 107 115 123 130 138. 1 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54. 1 4 7 11 14 17 20 24 27 31 34 37 41 44 48 51 55 58 62. 1 5 8 12 16 19 23 27 31 34 38 42 46 50 54 57 61 65 69. 2 5 9 13 17 21 26 30 34 38 42 47 51 55 60 64 68 72 77. 2 6 10 15 19 24 28 33 37 42 47 51 56 61 65 70 75 80 84. 2 7 11 16 21 26 31 36 41 46 51 56 61 66 71 77 82 87 92. 3 7 12 18 23 28 33 39 44 50 55 61 66 72 77 83 88 94 100. 3 8 14 19 25 30 36 42 48 54 60 65 71 77 83 89 95 101 107. 3 9 15 20 26 33 39 45 51 57 64 70 77 83 89 96 102 109 115. Source: Adapted and abridged from Tables 1, 3, 5, and 7 of D. Auble, “Extended tables for the Mann-Whitney statistic,” Bulletin of the Institute of Educational Research at Indiana University 1, No. 2 (1953), with the permission of the publisher..

(418) 874. APPENDIX N. LEVEL OF SIGNIFICANCE FOR ONE-TAILED TEST. APPENDIX N Critical Values of T in the Wilcoxon Matched-Pairs Signed-Ranks Test (n 25). n. 0.025. 0.01. 0.005. LEVEL OF SIGNIFICANCE FOR TWO-TAILED TEST. 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25. 0.05. 0.02. 0.01. 0 2 4 6 8 11 14 17 21 25 30 35 40 46 52 59 66 73 81 89. — 0 2 3 5 7 10 13 16 20 24 28 33 38 43 49 56 62 69 77. — — 0 2 3 5 7 10 13 16 20 23 28 32 38 43 49 55 61 68. Source: Adapted from Table 1 of F. Wilcoxon, Some Rapid Approximate Statistical Procedures (New York: American Cyanamid Company, 1949), 13, with the permission of the publisher..

(419) APPENDIX O a .05. APPENDIX O Critical Values dL and dU of the Durbin-Watson Statistic D (Critical Values Are One-Sided). 875. P1. P2. P3. P4. P5. n. dL. dU. dL. dU. dL. dU. dL. dU. dL. dU. 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 45 50 55 60 65 70 75 80 85 90 95 100. 1.08 1.10 1.13 1.16 1.18 1.20 1.22 1.24 1.26 1.27 1.29 1.30 1.32 1.33 1.34 1.35 1.36 1.37 1.38 1.39 1.40 1.41 1.42 1.43 1.43 1.44 1.48 1.50 1.53 1.55 1.57 1.58 1.60 1.61 1.62 1.63 1.64 1.65. 1.36 1.37 1.38 1.39 1.40 1.41 1.42 1.43 1.44 1.45 1.45 1.46 1.47 1.48 1.48 1.49 1.50 1.50 1.51 1.51 1.52 1.52 1.53 1.54 1.54 1.54 1.57 1.59 1.60 1.62 1.63 1.64 1.65 1.66 1.67 1.68 1.69 1.69. .95 .98 1.02 1.05 1.08 1.10 1.13 1.15 1.17 1.19 1.21 1.22 1.24 1.26 1.27 1.28 1.30 1.31 1.32 1.33 1.34 1.35 1.36 1.37 1.38 1.39 1.43 1.46 1.49 1.51 1.54 1.55 1.57 1.59 1.60 1.61 1.62 1.63. 1.54 1.54 1.54 1.53 1.53 1.54 1.54 1.54 1.54 1.55 1.55 1.55 1.56 1.56 1.56 1.57 1.57 1.57 1.58 1.58 1.58 1.59 1.59 1.59 1.60 1.60 1.62 1.63 1.64 1.65 1.66 1.67 1.68 1.69 1.70 1.70 1.71 1.72. .82 .86 .90 .93 .97 1.00 1.03 1.05 1.08 1.10 1.12 1.14 1.16 1.18 1.20 1.21 1.23 1.24 1.26 1.27 1.28 1.29 1.31 1.32 1.33 1.34 1.38 1.42 1.45 1.48 1.50 1.52 1.54 1.56 1.57 1.59 1.60 1.61. 1.75 1.73 1.71 1.69 1.68 1.68 1.67 1.66 1.66 1.66 1.66 1.65 1.65 1.65 1.65 1.65 1.65 1.65 1.65 1.65 1.65 1.65 1.66 1.66 1.66 1.66 1.67 1.67 1.68 1.69 1.70 1.70 1.71 1.72 1.72 1.73 1.73 1.74. .69 .74 .78 .82 .86 .90 .93 .96 .99 1.01 1.04 1.06 1.08 1.10 1.12 1.14 1.16 1.18 1.19 1.21 1.22 1.24 1.25 1.26 1.27 1.29 1.34 1.38 1.41 1.44 1.47 1.49 1.51 1.53 1.55 1.57 1.58 1.59. 1.97 1.93 1.90 1.87 1.85 1.83 1.81 1.80 1.79 1.78 1.77 1.76 1.76 1.75 1.74 1.74 1.74 1.73 1.73 1.73 1.73 1.73 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.73 1.73 1.74 1.74 1.74 1.75 1.75 1.75 1.76. .56 .62 .67 .71 .75 .79 .83 .86 .90 .93 .95 .98 1.01 1.03 1.05 1.07 1.09 1.11 1.13 1.15 1.16 1.18 1.19 1.21 1.22 1.23 1.29 1.34 1.38 1.41 1.44 1.46 1.49 1.51 1.52 1.54 1.56 1.57. 2.21 2.15 2.10 2.06 2.02 1.99 1.96 1.94 1.92 1.90 1.89 1.88 1.86 1.85 1.84 1.83 1.83 1.82 1.81 1.81 1.80 1.80 1.80 1.79 1.79 1.79 1.78 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.77 1.78 1.78 1.78 (continued ). n Number of observations; P Number of independent variables. Source: This table is reproduced from Biometrika 41 (1951): 173 and 175, with the permission of the Biometrika Trustees..

(420) 876. APPENDIX O a .01 P1. P2. P3. P4. P5. n. dL. dU. dL. dU. dL. dU. dL. dU. dL. dU. 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 45 50 55 60 65 70 75 80 85 90 95 100. .81 .84 .87 .90 .93 .95 .97 1.00 1.02 1.04 1.05 1.07 1.09 1.10 1.12 1.13 1.15 1.16 1.17 1.18 1.19 1.21 1.22 1.23 1.24 1.25 1.29 1.32 1.36 1.38 1.41 1.43 1.45 1.47 1.48 1.50 1.51 1.52. 1.07 1.09 1.10 1.12 1.13 1.15 1.16 1.17 1.19 1.20 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29 1.30 1.31 1.32 1.32 1.33 1.34 1.34 1.38 1.40 1.43 1.45 1.47 1.49 1.50 1.52 1.53 1.54 1.55 1.56. .70 .74 .77 .80 .83 .86 .89 .91 .94 .96 .98 1.00 1.02 1.04 1.05 1.07 1.08 1.10 1.11 1.13 1.14 1.15 1.16 1.18 1.19 1.20 1.24 1.28 1.32 1.35 1.38 1.40 1.42 1.44 1.46 1.47 1.49 1.50. 1.25 1.25 1.25 1.26 1.26 1.27 1.27 1.28 1.29 1.30 1.30 1.31 1.32 1.32 1.33 1.34 1.34 1.35 1.36 1.36 1.37 1.38 1.38 1.39 1.39 1.40 1.42 1.45 1.47 1.48 1.50 1.52 1.53 1.54 1.55 1.56 1.57 1.58. .59 .63 .67 .71 .74 .77 .80 .83 .86 .88 .90 .93 .95 .97 .99 1.01 1.02 1.04 1.05 1.07 1.08 1.10 1.11 1.12 1.14 1.15 1.20 1.24 1.28 1.32 1.35 1.37 1.39 1.42 1.43 1.45 1.47 1.48. 1.46 1.44 1.43 1.42 1.41 1.41 1.41 1.40 1.40 1.41 1.41 1.41 1.41 1.41 1.42 1.42 1.42 1.43 1.43 1.43 1.44 1.44 1.45 1.45 1.45 1.46 1.48 1.49 1.51 1.52 1.53 1.55 1.56 1.57 1.58 1.59 1.60 1.60. .49 .53 .57 .61 .65 .68 .72 .75 .77 .80 .83 .85 .88 .90 .92 .94 .96 .98 1.00 1.01 1.03 1.04 1.06 1.07 1.09 1.10 1.16 1.20 1.25 1.28 1.31 1.34 1.37 1.39 1.41 1.43 1.45 1.46. 1.70 1.66 1.63 1.60 1.58 1.57 1.55 1.54 1.53 1.53 1.52 1.52 1.51 1.51 1.51 1.51 1.51 1.51 1.51 1.51 1.51 1.51 1.51 1.52 1.52 1.52 1.53 1.54 1.55 1.56 1.57 1.58 1.59 1.60 1.60 1.61 1.62 1.63. .39 .44 .48 .52 .56 .60 .63 .66 .70 .72 .75 .78 .81 .83 .85 .88 .90 .92 .94 .95 .97 .99 1.00 1.02 1.03 1.05 1.11 1.16 1.21 1.25 1.28 1.31 1.34 1.36 1.39 1.41 1.42 1.44. 1.96 1.90 1.85 1.80 1.77 1.74 1.71 1.69 1.67 1.66 1.65 1.64 1.63 1.62 1.61 1.61 1.60 1.60 1.59 1.59 1.59 1.59 1.59 1.58 1.58 1.58 1.58 1.59 1.59 1.60 1.61 1.61 1.62 1.62 1.63 1.64 1.64 1.65. n Number of observations; P Number of independent variables. Source: This table is reproduced from Biometrika 41 (1951): 173 and 175, with the permission of the Biometrika Trustees..

(421) APPENDIX P. One-Tailed: a .05 Two-Tailed: a .10. APPENDIX P Lower and Upper Critical Values W of Wilcoxon Signed-Ranks Test. n 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20. a .025 a .05. a .01 a .02. a .005 a .01. —,— —,— 0,28 1,35 3,42 5,50 7,59 10,68 12,79 16,89 19,101 23,113 27,126 32,139 37,153 43,167. —,— —,— —,— 0,36 1,44 3,52 5,61 7,71 10,81 13,92 16,104 19,117 23,130 27,144 32,158 37,173. (Lower, Upper) 0,15 2,19 3,25 5,31 8,37 10,45 13,53 17,61 21,70 25,80 30,90 35,101 41,112 47,124 53,137 60,150. —,— 0,21 2,26 3,33 5,40 8,47 10,56 13,65 17,74 21,84 25,95 29,107 34,119 40,131 46,144 52,158. Source: Adapted from Table 2 of F. Wilcoxon and R. A. Wilcox, Some Rapid Approximate Statistical Procedures (Pearl River, NY: Lederle Laboratories, 1964), with permission of the American Cyanamid Company.. 877.

(422) 878. APPENDIX Q. APPENDIX Q Control Chart Factors. Number of Observations in Subgroup. d2. d3. D3. D4. A2. 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25. 1.128 1.693 2.059 2.326 2.534 2.704 2.847 2.970 3.078 3.173 3.258 3.336 3.407 3.472 3.532 3.588 3.640 3.689 3.735 3.778 3.819 3.858 3.895 3.931. 0.853 0.888 0.880 0.864 0.848 0.833 0.820 0.808 0.797 0.787 0.778 0.770 0.763 0.756 0.750 0.744 0.739 0.733 0.729 0.724 0.720 0.716 0.712 0.708. 0 0 0 0 0 0.076 0.136 0.184 0.223 0.256 0.283 0.307 0.328 0.347 0.363 0.378 0.391 0.404 0.415 0.425 0.435 0.443 0.452 0.459. 3.267 2.575 2.282 2.114 2.004 1.924 1.864 1.816 1.777 1.744 1.717 1.693 1.672 1.653 1.637 1.622 1.609 1.596 1.585 1.575 1.565 1.557 1.548 1.541. 1.880 1.023 0.729 0.577 0.483 0.419 0.373 0.337 0.308 0.285 0.266 0.249 0.235 0.223 0.212 0.203 0.194 0.187 0.180 0.173 0.167 0.162 0.157 0.153. Source: Reprinted from ASTM-STP 15D by kind permission of the American Society for Testing and Materials..

(423) Answers to Selected Odd-Numbered Problems This section contains summary answers to most of the odd-numbered problems in the text. The Student Solutions Manual contains fully developed solutions to all odd-numbered problems and shows clearly how each answer is determined.. Chapter 2. Chapter 1 1-1. Descriptive; use charts, graphs, tables, and numerical measures. 1-3. A bar chart is used whenever you want to display data that has already been categorized, while a histogram is used to display data over a range of values for the factor under consideration. 1-5. Hypothesis testing uses statistical techniques to validate a claim. 1-13. statistical inference, particularly estimation 1-17. written survey or telephone survey 1-19. An experiment is any process that generates data as its outcome. 1-23. internal and external validity 1-27. Advantages—low cost, speed of delivery, instant updating of data analysis; disadvantages—low response and potential confusion about questions 1-29. personal observation data gathering 1-33. Part range . 1-37. 1-41. 1-43.. 1-49.. 1-51.. 1-53. 1-55.. 1-61. 1-67.. Population size Sample size. . 18, 000 100. 180. Thus, the first person selected will come from employees 1–180. Once that person is randomly selected, the second person will be the one numbered 100 higher than the first, and so on. The census would consist of all items produced on the line in a defined period of time. parameters, since it would include all U.S. colleges a. stratified random sampling b. simple random sampling or possibly cluster random sampling c. systematic random sampling d. stratified random sampling a. time-series b. cross-sectional c. time-series d. cross-sectional a. ordinal—categories with defined order b. nominal—categories with no defined order c. ratio d. nominal—categories with no defined order ordinal data a. nominal data b. ratio data c. nominal data d. ratio data e. ratio data f. nominal data g. ratio data interval or ratio data a. Use a random sample or systematic random sample. b. The product is going to be ruined after testing it. You would not want to ruin the entire product that comes off the assembly line.. 2-3. a. 2k

(424) n or 210 1,024

(425) 1,000 Thus, use k 10 classes. b. w . High Low Classes. . 2,900 300 10. . 2,600 10. 260 (round to 300). 2-5. a. 2.833, which is rounded to 3. b. Divide the number of occurrences (frequency) in each class by the total number of occurrences. c. Compute a running sum for each class by adding the frequency for that class to the frequencies for all classes above it. d. Classes form the horizontal axis, and the frequency forms the vertical axis. Bars corresponding to the frequency of each class are developed. 2-7. a. 1 0.24 0.76 b. 0.56 0.08 0.48 c. 0.96 0.08 0.86 2-9. a. Class Frequency Relative Frequency 2–3. 2. 0.0333. 4–5. 25. 0.4167. 6–7. 26. 0.4333. 8–9. 6. 0.1000. 10–11. 1. 0.0167. b. cumulative frequencies: 2; 27; 53; 59; 60 c. cumulative relative frequencies: 0.0333; 0.4500; 0.8833; 0.9833; 1.000 d. ogive 2-13. a. The weights are sorted from smallest to largest to create the data array. b. Weight (Classes) Frequency 77–81. 3. 82–86. 9. 87–91. 16. 92–96. 16. 97–101. 5. Total =. 49. c. The histogram can be created from the frequency distribution. d. 10.20% Largest smallest. 214.4 105.0 9.945 → w 10. 11 number of classes b. 8 of the 25, or 0.32 of the salaries at least $175,000 c. 18 of the 25, or 0.72 having salaries that are at most $205,000 but at least $135,000. 2-15. a. w . 879.

(426) 880. ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. 2-19. a. 9 classes High Low 32 10 22 b. w 2.44 (round up p to 3.0) Classes 9 9 c. The frequency distribution with nine classes and a class width of 3.0 will depend on the starting point for the first class. This starting value must be at or below the minimum value of 10. d. The distribution is mound shaped and fairly symmetrical. It appears that the center is between 19 and 22 rounds per year, but the rounds played are quite spread out around the center. 2-21. a. 25 32 and 26 64. Therefore, 6 classes are chosen. b. The class width is 690,189/6 classes = 115,031.5. Rounding up to the nearest 1,000 passengers results in a class width of 116,000. c. Classes Frequency 0 116,000. 3. 232,000 347,000. 4. 347,000 462,000. 1. 462,000 577,000. 0. 577,000 692,000. 1. 2-43. a.. 50. $6,398. 2-67. 2-73.. 3-1. Q1 = 4,423; Median = 5,002; Q3 = 5,381 3-3. Q1 Q3 . 24.28. University Related. 2-55. 2-61.. Chapter 3. d. More airlines have fewer than 116,000 passengers. 2-23. a. Order the observations (coffee consumption) from smallest to largest. b. Using the 2k n guideline, the number of classes, k, would be 0.9 and w (10.1 3.5)/8 0.821, which is rounded to 0. Most observations fall in the class of 5.3–7.9 kg of coffee. c. The histogram can be created from the frequency distribution. The classes are shown on the horizontal axis and the frequency on the vertical axis. 2-29. a. The pie chart categories are the regions and the measure is the region’s percentage of total income. b. The bar chart categories are the regions and the measure for each category is the region’s percentage of total income. c. The bar chart, however, makes it easier to compare percentages across regions. 2-31. b. 1 0.0985 0.9015 2-33. The bar chart is skewed below indicating that the number of $1 million houses is growing rapidly. It also appears that that growth is exponential rather than linear. 2-35. A bar chart can be used to make the comparison. 2-37. a. The stem unit is 10 and the leaf unit is 1. b. between 70 and 79 seconds 2-41. a. Leaf unit 1.0 b. Slightly skewed to the left. The center is between 24 and 26. 2, 428. 2-51.. 29. 116,000 232,000. c. x . 2-47. 2-49.. c. A pie chart showing how that total is divided among the four hospital types would not be useful or appropriate. The sales have trended upward over the past 12 months. The line chart shows that year-end deposits have been increasing since 1997, but have increased more sharply since 2002 and have leveled off between 2006 and 2007. b. curvilinear c. The largest difference in sales occurred between 2006 and 2007. That difference was 14.835 10.711 4.124 ($billions). positive linear relationship b. Both relationships seem to be linear in nature. c. This occurred in 1998, 1999, and 2001. b. It appears that there is a positive linear relationship between the attendance and the year. a. The independent variable is hours and the dependent variable is sales. b. It appears that there is a positive linear relationship.. Municipally Owned. Privately Held. $3,591. $4,613. $5,191. b. The largest average charges occurs at university-related hospitals and the lowest average appears to be in Religious Affiliated hospitals.. 2 15.5 15.9 2. 13.55 15.7. (31.2 32.2)/2 = 31.7 (26.7 31.2)/2 28.95 (20.8 22.8)/2 21.8 Mean 19; Median (19 19)/2 19; Mode 19 Symmetrical; Mean Median Mode 11,213.48 Use weighted average. Mean 114.21; Median 107.50; Mode 156 skewed right 562.99 551.685 562.90 FDIC 768,351,603.83 Bank of America average 113,595,000 b. 768,351,603.83/113,595,000 6.76 3-19. Mean 0.33 minutes; Median 0.31 minutes; Mode 0.24 minutes; slightly right skewed; 80th percentile 0.40 minutes. 3-7. a. b. c. 3-9. a. b. 3-11. a. b. 3-13. a. b. 3-15. a. b. c. 3-17. a.. 3-21. a. x . 3-25.. 3-27.. Religious Affiliated. 13.5 13.6. 3-29. 3-31.. 2, 448.30 20. 122.465; Median (123.2 + 123.7) 2 123.45;. left-skewed b. 0.925 c. 126.36 d. weighted average a. Range 8 - 0 8 b. 3.99 c. 1.998 a. 16.87 b. 4.11 Standard deviation 2.8 a. Standard deviation 7.21; IQR 1212 12 b. Range Largest Smallest 30 - 6 24; Standard deviation 6.96; IQR 12 c. s2 is smaller than s2 by a factor of ( N 1)/N. s is smaller than s by a factor of affected.. ( N 1) / N. The range is not.

(427) ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS 3-33. a. The variance is 815.79 and the standard deviation is 28.56. b. Interquartile range overcomes the susceptibility of the range to being highly influenced by extreme values. 3-35. a. Range 33 - 21 12 n. ∑ xi. x i =1 n. 261/10 26.1. n. S2 = S= b. 3-37. a. b. c. 3-39. a.. b.. ∑(x − x ) i =1. 2. n −1. 3-55. a. x =. b. 51 22.60, 51 2(22.60), 51 3(22.60), i.e., (28.4, 73.6), (5.8, 96.2), and (16.8, 118.8). There are (19/30)100% 63.3% of the data within (28.4, 73.6), (30/30)100% 100% of the data within (5.8, 96.2), (30/30)100% 100% of the data within (16.8, 118.8). c. bell-shaped population 3-57. a.. Mean. S 2 = 16.5444 = 4.0675. c. s1 1.000 and s2 0.629 3-41. a. Men spent an average of $117, whereas women spent an average of $98 for their phones. The standard deviation for men was nearly twice that for women. b. Business users spent an average of $166.67 on their phone, whereas home users spent an average of $105.74. The variation in phone costs for the two groups was about equal. 3-43. a. The population mean is ∑x m= = $178, 465 N b. The population median is ~ $173,000 m c. The range is: R High Low R $361,100 $54,100 $307,000 d. The population standard deviation is ∑ (x − ) = $63,172 N. 3-47. a. at least 75% in the range 2,600 to 3,400; m 2(s). b. The range 2,400 to 3,600 should contain at least 89% of the data values. c. less than 11%. 3-49. a. 25.008 b. CV 23.55% c. The range from 31.19 to 181.24 should contain at least 89% of the data values. s 100 (100 ) = 20% 3-51. For Distribution A: CV = (100 ) = m 500 s 4.0 For Distribution B: CV = (100 ) = (100 ) = 40% 10.0 800 − x 800 − 1, 000 = = − 0.80 s 250 b. z 0.80 c. z 0.00. 3-53. a. z =. Drug A. Drug B. 234.75. 270.92. 13.92. 19.90. Standard Deviation. b. Based on the sample means of the time each drug is effective, Drug B appears to be effective longer than Drug A. c. Based on the standard deviation of effect time, Drug B exhibits a higher variability in effect time than Drug A. d. Drug A, CV 5.93%; Drug B, CV 7.35%. Drug B has a higher coefficient of variation and the greater relative spread. 0.078 3-59. Existing supplier: CV = (100 ) = 2.08% 3.75 New supplier: CV =. 0.135 (100 ) = 0.75% 18.029. 3-61. Anyone scoring below 61.86 (rounded to 62) will be rejected without an interview. Anyone scoring higher than 91.98 (rounded to 92) will be sent directly to the company. 3-63. CV =. 3, 083.45 (100 ) = 27.67% 11,144.48. At least 75% of CPA firms will compute a tax owed between $4,977.58 ————— $17,311.38 3-65. a. Varibale. Mean StDev. Variance. Scores. 94.780 4.130. 17.056. 2. s=. 1, 530 = 51; Variance = 510.76; Standard deviatiion = 22.60 30. = 148.9 / (10 − 1) = 16.5444;. Interquartile range 28 - 23 5 Ages are lower at Whitworth than for the U.S. colleges and universities as a group. The range is 113.0, the IQR is 62.25, the variance is 1,217.14, and the standard deviation is 34.89. No Adding a constant to all the data values leaves the variance unchanged. 2004: Mean 0.422; Variance 0.999; Standard deviation 1.000; IQR 1.7 2005: Mean 1.075; Variance 0.396; Standard deviation 0.629; IQR 0.75 x 2 1.075 and x1 0.422. 881. 3-73. 3-75.. 3-77. 3-81.. Q1 Median. Q3. IQR. 93.000 96.000 98.000 5.000. b. Tchebysheff’s Theorem would be preferable. c. 99 The mode is a useful measure of location of a set of data if the data set is large and involves nominal or ordinal data. a. 0.34 b. 0.34 c. 0.16 a. 364.42 b. Variance 16,662.63; Standard deviation 129.08 a. Comparing only the mean bushels/acre you would say that Seed Type C produces the greatest average yield per acre. b. CV of Seed Type A 25/88 0.2841 or 28.41% CV of Seed Type B 15/56 0.2679 or 26.79% CV of Seed Type C 16/100 0.1600 or 16% Seed Type C shows the least relative variability. c. Seed Type A: 68% between 63 to 113 95% between 38 to 138 approximately 100% between 13 to 163 Seed Type B: 68% between 41 to 71 95% between 26 to 86 approximately 100% between 11 to 101.

(428) 882. ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. Seed Type C: 68% between 84 to 116 95% between 68 to 132 approximately 100% between 52 to 148 d. Seed Type A e. Seed type C 3-87. a. Variable Mean StDev Price. 22.000. -0.0354. 0.2615. -0.0600. b. It means that the closing price for GE stock is an average of approximately four ($0.0354) cents lower than the opening price. c. Variable Mean StDev Median Open. 33.947. 0.503. 33.980. Close-Open. -0.0354. 0.2615. -0.0600. Chapter 4 4-1. independent events 4-3. V, V V, C V, S C, V C, C C, S S, V S, C S, S 4-5. a. subjective probability based on expert opinion b. relative frequency based on previous customer return history c. 1/5 0.20 4-7. 1/3 0.333333 4-9. a. P(Brown) # Brown/Total 310/982 0.3157 b. P(YZ-99) # YZ-99/Total 375/982 0.3819 c. P(YZ-99 and Brown) 205/982 0.2088 d. not mutually exclusive since their joint probability is 0.1324 4-11. 0.375 4-15. Type of Ad Occurrences Help Wanted Ad Real Estate Ad Other Ad Total. Electrical. Mechanical. Total. 28 64 92. 39 69 108. 67 133 200. Lincoln Tyler Total. 3.813. b. x 1s 22 (3.813) (18.187, 25.813); x 2s (14.374, 29.626); x 3s (10.561, 33.439) c. The Empirical Rule indicates that 95% of the data is contained within x 2s. This would mean that each tail has (1 0.95)/2 0.025 of the data. Therefore, the costume should be priced at $14.37. 3-89. a. Variable Mean StDev Median Close-Open. 4-23. The following joint frequency table (developed using Excel’s pivot table feature) summarizes the data.. 204 520 306 1,030. a. b. c. 4-17. a.. 0.1981 relative frequency yes relative frequency assessment method 4, 000 0.69 b. P(# 1) 5, 900. 4-19. a. 3,122 / 21, 768 0.1434 b. relative frequency assessment method 4-21. a. P (Caesarean) 22 0.44 50 b. New births may not exactly match the 50 in this study.. a. 133 200 0.665 b. 108 200 0.54 c. 28 200 0.14 4-25. a. b.. 43 0.43 100 56 6 = 0.17 100. c. For Pepsi, Probability For Coke, Probability . 56 6 17 0.486 12 12 11 35. 6 6 6 18 0.514 12 12 11 35. d. For Pepsi, Probability . 7 6 8 5 26 0.4 19 16 14 16 65. For Coke, Probability . 12 10 6 11 39 0.6 19 16 14 16 65. 4-27. a. (0.9)(1 0.5) 0.45 b. (0.6)(0.8) 0.48 ⎛ 5 ⎞ ⎛ 4 ⎞ 20 0.22 4-29. P(senior1 and senior2) ⎜ ⎟ ⎜ ⎟ ⎝ 10 ⎠ ⎝ 9 ⎠ 90 4-31. a. P(E1 and B) P(E1|B)P(B) 0.25(0.30) 0.075 b. P(E1 or B) P(E1) P(B) P(E1 and B) 0.35 0.30 0.075 0.575 c. P(E1 and E2 and E3) P(E1)P(E2) P(E3) 0.35(0.15)(0.40) 0.021 4-33. a. P ( B) . Number of drives from B 195 0.28 Total drives 700. b. P(Defect ) . 50 Number of defective drives 0.07 700 Total drives. c. P (Defect B) . P ( Defect and B ) 0.02 0.076 P ( B) 0.28. Number of defective drives from B Number of drives from B 15 0.076 195. P (Defect B) . 4-35. a. 0.61; 0.316 b. 0.42; 0.202; 0.518 c. 0.39 4-37. They cannot get to 99.9% on color copies. 4-39. P(Free gas) 0.00015 0.00585 0.0005 0.0065 4-40. a. 0.1667 b. 0.0278 c. 0.6944 d. (1/6)(5/6) (5/6)(1/6) 0.2778.

(429) ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. 4-41. a. b. c. d.. P(NFL) 105/200 0.5250 P(College degree and NBA) 40/200 0.20 10/50 0.20 The two events are not independent.. 4-43. P (Line 1 | Defective) (0.05)(0.4)/0.0725 0.2759 P (Line 2 | Defective) (0.10)(0.35)/0.0725 0.4828 P (Line 3 | Defective) (0.07)(0.25)/0.0725 0.2413 The unsealed cans probably came from Line 2. 4-45. P (Supplier A | Defective) (0.15)(0.3)/0.115 0.3913 P (Supplier B | Defective) (0.10)(0.7)/0.115 0.6087 Supplier B is the most likely to have supplied the defective parts. 4-47. a. P(E1 and E 2) P(E1| E2)P(E2) 0.508(0.607) 0.308 b. P(E1 and E 3) P(E1| E3) 0.607/0.853 0.712 4-49. a. b. c. d. 4-51. a. b. c. d. 4-53. a. b. c. 4-55. a. b. c. 4-61. a. b.. 0.76 0.988 0.024 0.9999 0.1856 0.50 0.0323 0.3653 0.119 0.148 0.3814 0.50 0.755 0.269 0.80; 0.40; 0.20; 0.60 A B and A B are complements.. 4-63. a. 0.0156 b. 0.1563 c. 0.50 4-65. a. 0.149 b. 0.997 c. 0.449 4-67. a. the relative frequency assessment approach b. 0.028 c. 0.349 d. yes 4-69. Clerk 1 is most likely responsible for the boxes that raised the complaints. 100 4-71. a. 0.33 300 30 b. 0.10 300 c. P(East or C) P(East) P(C) P(East and C) 0.25 0.333 0.103 0.48 d. P (C East ) P (C and East)/P (East) 0.103 / 0.25 0.41 4-73. a. 0.3333 b. Boise will get 70.91% of the cost, Salt Lake will get 21.82%, and Toronto will get 7.27% regardless of production volume.. 5-5. 3.7 days 5-7. a. 130 b. 412.50 c. 412.50 20.31 5-9. a. b. c. d. e. 5-11. a. b. 5-13. a.. s 1.4931 1.22 1.65 days to 4.09 days $58,300 $57,480 Small firm profits $135,000 Mid-sized profits $155,000 Large firm profits $160,000 b. Small firm: s $30,000 Mid-sized firm: s $90,000 Large firm: s $156,604.60 c. The large firm has the largest expected profit. 5-21. a. x P(x) x P(x). b. c. 5-15. a. b. 5-17. a.. 5-23. 5-25.. 5-27. 5-29.. 5-31.. 5-33.. Chapter 5 5-1. a. b. 5-3. a. b.. discrete random variable The possible values for x are x {0, 1, 2, 3, 4, 5, 6} number of children under 22 living in a household discrete. 15.75 20.75 78.75 increases both the expected value by an amount equal to the constant added both the expected value being multiplied by that same constant 3.51 s2 1.6499; s 1.2845 2.87 days. 5-35.. 14. 0.008. 19. 0.216. 15. 0.024. 20. 0.240. 16. 0.064. 21. 0.128. 17. 0.048. 22. 0.072. 18. 0.184. 23. 0.016. b. 19.168; s 2 3.1634 1.7787 c. Median is 19 and the quality control department is correct. 0.2668 a. P(x 5) 0.0746 b. P(x

(430) 7) 0.2143 c. 4 d. s npq 20(.20)(.80) 1.7889 0.1029 a. 0.1442 b. 0.8002 c. 4.55 a. 0.0688 b. 0.0031 c. 0.1467 d. 0.8470 e. 0.9987 a. 3.2 b. 1.386 c. 0.4060 d. 0.9334 a. 3.08 b. 0.0068 c. 0.0019 d. It is quite unlikely.. 883.

(431) 884. ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. 5-37. a. b. c. 5-39. a. b. c. 5-41. a. b. c. 5-43. a. b. 5-45. a. b. c. d. e. 5-49.. 5-51. 5-53. 5-55.. 5-57.. 5-59. 5-61. 5-63.. 5-65.. 5-67. 5-69.. 5-71.. 5-75. 5-77.. 0.5580 0.8784 An increase in sample size would be required. 2.96 Variance 1.8648; Standard deviation 1.3656 0.3811 0.3179 0.2174 0.25374 0.051987 0.028989 0.372 12 estimate may be too high. 0.0832 0.0003 Redemption rate is lower than either Vericours or TCA Fulfillment estimate. a. 9 corporations b. 0.414324 c. 70th percentile is 12. a. 0.0498 b. 0.1512 0.175. a. 0.4242 b. 0.4242 c. 0.4696 a. P(x 3) 0.5 b. P(x 5) 0 c. 0.6667 d. Since 0.6667 0.25, then x’ 2. P(x

(432) 10) 1 0.8305 0.1695 0.0015 a. P(x 4) 0.4696 b. P(x 3) 0.2167 c. 0.9680 a. 0.0355 b. 0.0218 c. 0.0709 a. 0.0632 b. 120 Spicy Dogs a. 0.0274 b. 0.0000 c. 0.0001 a. 8 b. lt 1(3) 3 c. 0.0119 d. It is very unlikely. Therefore, we believe that the goal has not been met. a. This means the trials are dependent. b. does not imply that the trials are independent a. X P(x) xP(x) 0. 0.56. 0.00. 1. 0.21. 0.21. 2. 0.13. 0.26. 3. 0.07. 0.21. 4. 0.03. 0.12 0.80. b. Standard deviation 1.0954; Variance 1.20 5-79. 0.0020 5-81. 0.6244. 5-83. a. b. c. 5-85. a. b. c. 5-87. a. b. 5-89. a. b. c. d.. 2.0 1.4142 because outcomes equally likely E(x) 750; E(y) 100 StDev(x) 844.0972; StDev(y) 717.635 CV(x) 844.0972/750 1.1255 CV(y) 717.635/100 7.1764 0.3501 0.3250 0.02 E(X) 0.6261 (0, 2.2783) 1 0.9929 0.0071. Chapter 6 6-1. a. b.. 6-5.. 6-7.. 6-9. 6-11.. 6-13.. 6-15.. 6-17.. 6-19.. 6-21. 6-23.. 6-25.. 190 − 200 10 0.50 20 20. 240 200 40 2.00 20 20 a. 0.4901 b. 0.6826 c. 0.0279 a. 0.4750 b. 0.05 c. 0.0904 d. 0.97585 e. 0.8513 a. 0.9270 b. 0.6678 c. 0.9260 d. 0.8413 e. 0.3707 a. x 1.29(0.50) 5.5 6.145 b. m 6.145 (1.65)(0.50) 5.32 a. 0.0027 b. 0.2033 c. 0.1085 a. 0.0668 b. 0.228 c. 0.7745 a. 0.3446 b. 0.673 c. 51.30 d. 0.9732 a. 0.1762 b. 0.3446 c. 0.4401 d. 0.0548 The mean and standard deviation of the random variable are 15,000 and 1,250, respectively. a. 0.0548 b. 0.0228 c. m 15,912 (approximately) about $3,367.35 a. 0.1949 b. 0.9544 c. Mean Median; symmetric distribution P(x 1.0) 0.5000 0.4761 0.0239 c.. 6-3.. 225 200 25 1.25 20 20.

(433) ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS 6-27. a. P(0.747 x 0.753) 0.6915 0.1587 0.5328 0.753 0.75 0.001 b. 2.33 6-31. a. b. c. d. 6-33. a. b.. 6-35.. 6-37. 6-39.. 6-41. 6-43.. c. d. a. b. c. d. a. b. a. b. c. a. b. a. b.. 6-45. a. b. 6-47. a. b. c. 6-49. a. b.. skewed right approximate normal distribution 0.1230 2.034% 0.75 Q1 4.25/0.0625 8; Q2 4.50/0.0625 12; Q3 4.75/0.0625 16 14.43 0.92 0.9179 0.0498 0.0323 0.9502 0.3935 0.2865 0.7143 0.1429 0.0204 0.4084; yes 40,840 0.2939 0.4579 0.1455 0.0183 0.3679 0.0498 0.4493 approximately, 0.08917 positively skewed Descriptive Statistics: ATM FEES Variable. Mean. StDev. ATM FEES. 2.907. 2.555. c. 1 0.6433 0.3567 6-55. a. 0.1353 b. 0.1353 6-57. 0.5507 6-59. Machine #1: 0.4236 Machine #2: 0.4772 6-61. a. 0.0498 b. 0.0971 c. 0.1354 6-63. d. 5 e. 0.25 f. 0.125 g. 0.167 6-65. P(x 74) 0.1271 P(x 90) 0.011 6-67. a. approximately normally distributed b. Mean 2.453; Standard deviation 4.778 c. 0.0012 d. No 6-69. a. The histogram seems to be “bell shaped.” b. The 90th percentile is 540.419. c. 376.71 is the 43rd percentile. 6-71. a. Uniform distribution. Sampling error could account for differences in this sample. 1 b. f ( x ) 1 0.098 b a 35 24.8 c. 0.451. 885. Chapter 7 7-1. 18.50 7-3. x m 10.17 11.38 1.21 7-5. a. – 4.125 b. –13.458 to 11.042 c. –9.208 to 9.208 7-9. 0.64 ∑ x 864 = = 43.20 days 7-11. a. = N 20 b. x . ∑ x 206 41.20 days; n 5. Sampling error 41.20 43.20 2 28.4 days to 40.4 days $3,445.30 $29.70 1,432.08 87.12 175.937 to 178.634 Mean of Sales Mean of sales 2,764.83 b. Mean of Sample Mean of sample 2,797.16 c. $32.33 million d. Smallest $170.47; Largest $218.41 7-21. a. Sampling error x m $15.84 $20.00 $4.16 b. Random and nonrandom samples can produce sampling error and the error is computed the same way. 7-23. P( x 2,100) 0.5000 0.3907 0.1093. c. 7-13. a. b. 7-15. a. b. c. 7-17. a.. 7-25. x . n. . 40 25. 8. 7-27. 0.0016 7-29. a. 0.3936 b. 0.2119 c. 0.1423 d. 0.0918 7-31. a. 0.8413 b. 0.1587 7-33. a. 0.3830 b. 0.9444 c. 0.9736 7-35. P( x

(434) 4.2) 0.5000 0.4989 0.0011 7-37. P( x 33.5) 0.5000 0.2019 0.2981 7-39. a. Descriptive Statistics: Video Price Variable. Mean. StDev. Video Price. 45.580. 2.528. b. Cumulative Distribution Function Normal with mean 46 and Standard deviation 0.179 x P(X x) 45.58 0.00948 c. Cumulative Distribution Function Normal with mean 45.75 and Standard deviation 0.179 x P(X x) 45.58 0.171127 7-43. a. Mean 55.68; Standard deviation 6.75.

(435) 886. ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. 7-45. a. b. c. d. 7-47. a. b. 7-49. a. b. 7-51. a. b.. 0.8621 0.0146 0.8475 0.7422 Sampling error p p 0.65 0.70 0.05 0.1379 0.8221 0.6165 0.9015 0.0049 x 27 7-53. a. p 0.45 n 60. b. P(p

(436) 0.45) 0.5000 0.2852 0.2148 7-55. P(p

(437) 0.09) 0.5000 0.4838 0.0162 7-57. a. 0.1020 b. 0.4522 c. ≈ 0.0 7-59. a. 0.0749 b. 0.0359 c. essentially 0 7-61. a. 0.72 b. 0.9448 c. The proportion of on-time arrivals is smaller than in 2004. 7-63. a. 131 over $100,000 and 65 of $100,000 or less b. 0.668 c. 0.2981 7-67. Sample averages would be less variable than the population. 7-69. a. 405.55 b. 159.83 7-71. A sample size of 1 would be sufficient. 7-73. a. 0.2643 b. 0.3752 c. 0.0951 7-75. a. Right skewed distribution; a normal distribution cannot be used. b. Sampling distribution of the sample means cannot be approximated by a normal distribution. c. 0.50 7-77. a. 0.8660 b. 0.9783 7-79. a. P(x 16.10) 0.5000 0.1554 0.3446 b. P( x 16.10) 0.5000 0.4177 0.0823 7-81. Note, because of the small population, the finite correction factor is used. a. 0.1112 b. Either the mean or the standard deviation or both may have changed. c. 0.2483 7-83. a. 0 b. highly unlikely c. 0.4999 7-85. a. 0 b. 0.117 7-89. a. 0.216 b. 0.3275 c. Reducing the warranty is a judgment call. Chapter 8 8-1. 8-3. 8-5. 8-7.. 15.86 —————————— 20.94 293.18 —————————— 306.82 1,180.10 —————————— 1,219.90 a. 1.69 —————————— 4.31 b. 1.38 —————————— 4.62 8-9. 97.62 ————— 106.38. 8-11. a. b. c. d. 8-13. a. b. 8-15. a. b. c. 8-17. a. b. 8-19. a. b. c. 8-21. a. b. 8-23. a. 8-25. a. b. c.. 11.6028 ————— 15.1972 29.9590 ————— 32.4410 2.9098 ————— 6.0902 18.3192 ————— 25.0808 $13.945 ———— $14.515 $7,663.92; $7,362.96 (4,780.25; 5,219.75) 219.75 715.97 ≈ 716 $5.29 ————— $13.07 Sample data do not dispute the American Express study. 83.785 (505.415, 567.985) increased since 2007 163.5026 ————— 171.5374 Increase the sample size, decrease the level of confidence, decrease the standard deviation. 6.3881 ————— 6.6855 33.281 x 256.01; s is 80.68. 130 242.01 ————— 270.01 14.00 seconds. 8-27. n . z 2 s 2 1.962350 2 188.24 189 e2 50 2. 2 ⎛ (1.96 ) ( 680 ) ⎞ z 2 s 2 ⎛ zs ⎞ 8-29. n 2 ⎜ ⎟ ⎜ ⎟ 917.54 ≈ 918 44 ⎝ e ⎠ e ⎠ ⎝ 2. 8-31. 3,684.21 8-33. a. 61.47; n 62 b. 5,725.95; n 5726 c. 1.658; n 2 d. 305.38; n 306 e. 61.47; n 62 8-35. n 249 8-37. a. 883 b. 1. Reduce the confidence level to something less than 95%. 2. Increase the margin of error beyond 0.25 pounds. 3. some combination of decreasing the confidence level and increasing the margin of error 2 ⎛ ( 2.575) (1.4 ) ⎞ z 2 s 2 ⎛ zs ⎞ 8-39. n 2 ⎜ ⎟ ⎜ ⎟ 324.9 ≅ 325 ⎝ e ⎠ 0.2 e ⎠ ⎝ 2. n 246 n 6,147 Margin of error is from $0.44 to $0.51. n 60 n 239 429 137 292 additional 302 137 165 additional Net required sample is 1,749 150 1,599. Reduce the confidence level (lowers the z-value) or increase the margin of error or some combination of the two. 1,698 0.224 ————— 0.336 a. yes b. (0.286, 0. 414) c. 0.286 —— 0.414 d. 0.064x 7 a. p 0.175 n 40 b. (0.057, 0.293) c. n 888. 8-41. a. b. c. 8-43. a. b. 8-45. a. b. 8-47. a. b.. 8-49. 8-51. 8-53.. 8-55..

(438) ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS 8-57. a. 0.324 —– 0.436 b. 9,604 8-59. a. 0.3155 —– 0.3745 b. 179.20 —– 212.716 c. 0.3692 —– 0.4424 d. 1,224.51 1,225 8-61. 0.895 ——————————— 0.925 8-63. a. 0.6627 ——— 0.7173 b. 2,401 c. 0.4871(2) 0.9742 8-65. a. 0.1167 b. 0.1131 c. (0.0736, 0.1598) 8-67. a. 0.7444 b. 0.6260 ——— 0.8628 c. The sample size could be increased. 8-75. a. 0.7265 —– 0.7935 b. 25,427.50 —– 27,772.50 8-77. a. n 62 40 22 more b. $620 without pilot; savings of $1,390 $620 $770 8-79. a. 5.21 b. n 390 c. 0.25 work days 2.00 work hours 8-81. a. 0.7003 —– 0.7741 b. 32,279.4674 —– 33,322.7227 8-83. a. 45.23 to 45.93 b. is plausible c. n 25. Chapter 9 9-1. a. b. c. d. 9-3. a. b. c. d. e. 9-5. a.. b. c. 9-7. a.. b. c. 9-9. a. b. c. 9-11. a. b. c. d. e. f. 9-13. a. b. c. d.. z 1.96 t 1.6991 t 2.4033 z 1.645 za 1.645 t/2 2.5083 za/2 2.575 ta 1.5332 Invalid Reject the null hypothesis if the calculated value of the test statistic, z, is greater than 2.575 or less than 2.575. Otherwise, do not reject. z 3.111 Reject the null hypothesis. Reject the null hypothesis if the calculated value of the test statistic, t, is less than the critical value of 2.0639. Otherwise, do not reject. t 1.875 Do not reject the null hypothesis. Reject the null hypothesis if the calculated value of the test statistic, t, is greater than 1.3277. Otherwise, do not reject. t 0.78 Do not reject the null hypothesis. Type I error Type II error Type I error No error Type II error No error H0: m

(439) 30,000 HA: m 30,000 $30,411.25 Do not reject. Type II. 887. 9-15. a. H0: m

(440) 3,600 HA: m 3,600 b. Since t 0.85 1.8331, the null hypothesis is not rejected. 9-17. a. H0: m

(441) 55 HA: m 55 b. Because t 0.93 2.4620, the null hypothesis is not rejected. 9-19. The annual average consumer unit spending for food at home in Detroit is less than the 2006 national consumer unit average. 9-21. a. Since t 0.74 2.1604, we do not reject the null hypothesis. b. Type II error 9-23. a. z 1.96 b. z 1.645 c. z 2.33 d. z 1.645 9-25. Since 2.17 2.33, don’t reject. 9-27. a. Reject the null hypothesis if the calculated value of the test statistic, z, is less than the critical value of the test statistic z 1.96. Otherwise, do not reject. b. z 2.0785 c. reject 9-29. a. p-value 0.05 b. p-value 0.5892 c. p-value 0.1902 d. p-value 0.0292 9-31. Because z 3.145 is less than 2.055, reject H0. 9-33. Since z 0.97 1.645, we do not reject the null hypothesis. 9-35. Because z 1.543 is less than 1.96, do not reject H0. p-value 0.50 0.4382 0.0618 9-37. a. H0: p 0.40 HA: p 0.40 b. Since z 1.43 1.645, we do not reject the null hypothesis. 9-39. a. H0: p 0.10 HA: p 0.10 b. Since the p-value 0.1736 is greater than 0.05, don’t reject. 9-41. a. Since z 3.8824 1.96, reject H0. b. p-value 2(0.5 0.5) 0.0 9-43. Because z 0.5155 is neither less than 1.645 nor greater than 1.645, do not reject H0. 9-45. a. H0: p

(442) 0.50 HA: p 0.50 b. Since z 6.08 2.05, we reject the null hypothesis. 9-47. a. The appropriate null and alternative hypotheses are H0: p

(443) 0.95 HA: p 0.95 b. Since z 4.85 1.645, we reject the null hypothesis. 9-49. a. 0.80 b. 0.20 c. The power increases, and beta decreases. d. Since x 1.23 then 1.0398 1.23 1.3062, do not reject H0. 9-51. 0.8888 9-53. 0.3228 9-55. a. 0.0084 b. 0.2236 c. 0.9160.

(444) 888. ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. 9-57. a. b. c. 9-59. a.. 9-61.. 9-63.. 9-65. 9-67. 9-77.. 0.1685 0.1446 0.1190 H0: m

(445) 243 HA: m 243 b. 0.0537. a. H0: m

(446) 15 HA: m 15 b. 0.0606 a. H0 m

(447) $47,413 HA m $47,413 b. 0.1446 0.0495 a. Since t 3.97 1.6991, we reject H0. b. 0.3557 a. If a is decreased, the rejection region is smaller making it easier to accept H0, so b is increased. b. If n is increased, the test statistic is also increased, making it harder to accept H0, so b is decreased. c. If n is increased, the test statistic is also increased, making it harder to accept H0, so b is decreased and power is increased. d. If a is decreased, the rejection region is smaller, making it easier to accept H0, so b is increased and power is decreased.. 9-79. a. z . x m s. 9-99. a. b. 9-101. a. b.. Chapter 10 10-1. 10-3. 10-5. 10-7. 10-9.. 10-11. 10-13. 10-15. 10-17.. 10-19. 10-21. 10-23. 10-25.. n 10-27. x m b. t s n c. z . p . (1 ) n. 9-81. a. H0: m 4,000 HA: m 4,000 b. Since t 1.2668 1.7959, do not reject. 9-83. a. H0: p

(448) 0.50 HA: p 0.50 b. Since z 5.889 1.645, reject the null hypothesis. Since z 5.889, the p-value is approximately zero. 9-85. a. Since z 1.5275 1.645, do not reject. b. Type II error 9-87. a. yes b. p-value 0.001, reject 9-89. a. yes b. Since z 1.1547 1.96, do not reject H0. 9-91. a. H0: m 6 inches HA: m 6 inches b. Reject H0 if z 2.58 or z 2.58; otherwise do not reject H0. c. Since x 6.03 6.0182, reject the null hypothesis. 9-93. a. Because z 4.426 1.96 we reject H0. b. 50,650.33 ——— 51,419.67 9-95. p-value 0, so reject H0. 9-97. a. H0: m 0.75 inch HA: m 0.75 inch b. Since t 0.9496 2.6264, do not reject H0.. Since the p-value is less than a, we reject H0. 0.1170 Since the p-value is greater than a, we do not reject H0. 0.5476. 10-29. 10-31.. 10-33.. 10-35.. 10-37. 10-39.. 10-41.. 10-43.. 10-45. 10-47.. 10-49. 10-51.. 6.54 (m1 m2) 0.54 13.34 m1 m2 6.2 0.07 (m1 m2) 50.61 19.47 m1 m2 7.93 a. 0.05 b. 0.0974 (m1 m2) 0.0026 c. The two lines do not fill bags with equal average amounts. a. 0.1043 —— 2.7043; no b. no a. highly unlikely b. (36.3551, 37.4449) 0.10 m1 m2 0.30 a. 2.35% b. 1,527.32 ——— 4,926.82 c. plausible that there is no difference d. 3,227 Since t 4.80 2.1199, we reject. Since z 5.26 1.645, reject the null hypothesis. a. If t 1.9698 or t 1.9698, reject H0. b. Since 5.652 1.9698, reject H0. Because t 0.896 is neither less than t 2.0167, nor greater than t 2.0167, do not reject. a. Since 0.9785 1.677, do not reject H0. b. Type II error Since t 4.30 1.6510, we reject the null hypothesis. a. 2,084/2,050 1.02 b. Because p-value P(t 5.36) ⬵ 1.00, the null hypothesis is not rejected. a. The ratio of the two means is 9.60/1.40 6.857. b. The ratio of the two standard deviations is 8.545/1.468 5.821. c. Since p-value 0.966 0.25, do not reject the null. a. It is plausible to conjecture the goal has been met. b. Since the p-value is greater than 0.025, we do not reject the null hypothesis. 674.41 m 191.87 a. H0: md

(449) 0 HA: md 0 b. Since 3.64 1.3968, reject H0. c. 2.1998 —— 0.7122 a. (7.6232, 13.4434) b. Because t 0.37 1.459, the null hypothesis cannot be rejected. a. The samples were matched pairs. b. Because 0.005 p-value 0.01 a, the null hypothesis should be rejected. a. Since t 7.35 1.98, reject H0. b. 100.563 —— 57.86; yes. a. Because t 9.24 2.3646, reject. b. The ratio of the two standard errors is 3.84 ( 0.5118/0.13313). Since 1.068 2.136, do not reject the null hypothesis. a. no b. no c. yes d. no.

(450) 889. ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. 10-53. Because the p-value 0.367 is greater than a 0.05, we do not reject the null hypothesis. 10-55. a. Since z 2.538 1.96, reject H0. b. Since z 2.538 1.645, fail to reject H0. c. Since z 2.538 1.96, reject H0. d. Since z 2.08 2.33, fail to reject H0. 10-57. Since the test statistic, 0.4111 2.575, do not reject the null hypothesis. 10-59. a. Since p-value 0.0244 0.05 reject H0. b. 0.00597 10-61. a. yes b. Since the p-value 0.039 0.05, the null hypothesis is rejected. 10-63. a. yes b. Since the p-value 0.095 0.01, the null hypothesis is not rejected. c. Larger sample sizes would be required. 10-67. a. (2.1064, 7.8936) b. t 2.0687 c. t 2.0687 10-69. 120.8035 —— 146.0761 10-71. a. yes b. Since 0.7745 2.17, do not reject H0. 10-73. 26.40 md 0.36 10-75. a. paired samples experiment b. Since the p-value 0.000 0.05 a, the null hypothesis is rejected. c. (0.5625, 0.8775) 10-77. a. Since t 5.25 2.3499, reject. b. Type I error. Chapter 11 11-1. 74,953.7 s2 276,472.2 11-3. Since 2 12.39 10.1170, do not reject the null hypothesis. 11-5. a. Because 2 17.82 20.05 19.6752 and because. 2 17.82 20.95 4.5748, do not reject. b. Because 2 12.96 20.025 31.5264 and because. 2 12.96 20.975 8.2307, we do not reject the null hypothesis. 11-7. a. 0.01 p-value 0.05; since p-value a, reject H0. b. Since the test statistic 1.591 the 2 critical value 1.6899, reject the null hypothesis. c. p-value 0.94; do not reject. 11-9. 22.72 s2 81.90 11-11. 353.38 s2 1,745.18 11-13. a. H0: m 10 HA: m 10 b. Since 1.2247 1.383, do not reject H0. c. Since 3.75 14.6837, do not reject H0. 11-15. a. s2 4.884 b. Since the test statistic 10.47 the 2 critical value 13.8484, do not reject. Since t 17.57 1.7109, reject H0. 11-17. a. s2 0.000278 b. Since p-value 0.004 0.01, we reject. 11-19. a. If the calculated F 2.278, reject H0, otherwise do not reject H0. b. Since 1.0985 2.278, do not reject H0. 11-21. a. F 3.619 b. F 3.106 c. F 3.051 11-23. Since F 1.154 6.388 F0.05, fail to reject H0. 11-25. Since 3.4807 1.984, reject H0.. 11-27. a. Since 2.0818 2.534, do not reject. b. Type I error. Decrease the alpha or increase the sample sizes. 11-29. Since F 3.035 2.526 F0.05, reject H0. 11-31. Since 1.4833 3.027, do not reject H0. 11-33. a. The F-test approach is the appropriate. b. 33.90 11-39. 0.753 s2 2.819 11-41. Since 1.2129 2.231, do not reject. 11-43. a. Since 2 37.24 2U 30.1435, reject H0. b. P(x

(451) 3) 0.0230 0.0026 0.0002 0.0256 11-45. a. Since F 3.817 1.752 F0.025, reject H0. b. yes 11-47. Since 202.5424 224.9568, do not reject. 11-49. Since 2.0609 1.4953, reject the null hypothesis. 11-51. a. Since the p-value 0.496 0.05 a 0.05, do not reject. b. yes. Chapter 12 12-1. a. SSW SST SSB 9,271,678,090 2,949,085,157 6,322,592,933 MSB 2,949,085,157/2 1,474,542,579 MSW 6,322,592,933/72 87,813,791 F 1,474,542,579/87,813,791 16.79 b. Because the F test statistic 16.79 Fa 2.3778, we reject. 12-3. a. The appropriate null and alternative hypotheses are H0: m1 m2 m3 HA: Not all mj are equal. b. Because F 9.84 critical F 3.35, we reject. Because p-value 0.0006 a 0.05, we reject. c. Critical range 6.0; m1 m2 and m3 m2 12-5. a. dfB 3 b. F 11.1309 c. H0: m1 m2 m3 m4 HA: At least two population means are different. d. Since 11.1309 2.9467, reject H0. 12-7. a. Because 2.729 15.5, we conclude that the population variances could be equal; since F 5.03 3.885, we reject H0. b. Critical range 2.224; pop 1 and 2 means differ no other differences. 12-9. a. Since F 7.131 Fa0.01 5.488, reject. b. CR 1,226.88; Venetti mean is greater than Edison mean 12-11. a. Because 3.125 15.5, we conclude that the population variances could be equal. Since F 1,459.78 3.885, we reject H0. b. Critical range 9.36 12-13. a. Since 10.48326 5.9525 reject H0 and conclude that at least two populations means are different. b. Critical range 222.02; eliminate Type D and A. 12-15. a. Since 0.01 0.05, reject H0 and conclude that at least two populations means are different. b.. Mini 1 Mini 2. Absolute Differences. Critical Range. Significant?. 0.633. 1.264. no. Mini 1 Mini 3. 1.175. 1.161. yes. Mini 2 Mini 3. 1.808. 1.322. yes.

(452) 890. 12-17.. 12-19.. 12-21.. 12-23.. 12-25.. 12-27.. 12-29.. 12-31.. 12-33.. 12-35.. 12-37.. 12-39.. 12-41.. 12-49.. 12-51. 12-53.. ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS Student reports will vary, but they should recommend either 1 or 2 since there is no statistically significant difference between them. c. 0.978 —— 2.678 cents per mile; $293.40 —— $803.40 annual savings e. p-value 0.000 a 0.05 f. Average length of life differs between Delphi and Exide and also between Delphi and Johnson. There is not enough evidence to indicate that the average lifetime for Exide and Johnson differ. a. H0: m1 m2 m3 m4 HA: At least two population means are different. H0: mb1 mb2 mb3 mb4 mb5 mb6 mb7 mb8 HA: Not all block means are equal. b. F Blocks 2.487 F Groups 3.072 c. Since 46.876 2.487, reject H0. d. Since p-value 0.0000326 0.05, reject. e. LSD 5.48 Because F 14.3 Critical F 3.326, we reject and conclude that blocking is effective. Because F 0.1515 Critical F 4.103, we do not reject. a. Because F 32.12 Fa0.01 9.78, reject the null hypothesis. b. Because F 1.673 Fa0.01 10.925, do not reject the null hypothesis. a. Because F 22.32 Fa0.05 6.944, reject the null hypothesis. b. Because F 14.185 Fa0.05 6.944, reject the null hypothesis. c. LSD 8.957 a. p-value 0.000 a 0.05 b. p-value 0.004 a 0.05 c. LSD 1.55; m1 m3 and m2 m3 a. p-value 0.628 a 0.05 b. p-value 0.000 a 0.05 c. LSD 372.304; m1 m2 m3 a. p-value 0.854 a 0.05. Therefore, fail to reject H0. b. Since F 47.10 F0.05 5.143, reject H0. c. p-value 0.039 a 0.05. Therefore, reject H0. a. Since 0.4617 3.8853, do not reject H0. b. Since 2.3766 3.8853, do not reject H0. c. Since 5.7532 4.7472, reject H0. a. Since F 39.63 F0.05 3.633, reject H0. b. Since F 2.90 F0.05 9.552, fail to reject H0. c. Since F 3.49 F0.05 9.552, fail to reject H0. Since F 57.73 F0.05 9.552, reject H0. a. Because F 1.016 Fa0.05 2.728, do not reject. b. Because F 1.157 Fa0.05 3.354, do not reject. c. Because F 102.213 Fa0.05 3.354, reject. a. Since p-value 0.0570 0.01, do not reject. b. Since 2.9945 6.0129, do not reject. c. Since p-value 0.4829 0.1, do not reject a. a 0.025 p-value 0.849. Therefore, fail to reject. b. Since F 15.61 F0.025 3.353, reject. c. Since F 3.18 F0.025 3.353, do not reject. d. Since 2.0595 t 0.69 2.0595, fail to reject. a. a 0.05 p-value 0.797. Therefore, fail to reject. b. Since F 25.55 F0.05 3.855, reject. c. Since F 0.82 F0.05 3.940, do not reject. a. Since F 3.752 2.657, we reject. b. Critical range 9.5099; m1 mB, m1 m1F, and m1 mG a. Since (F1,200,0.05 3.888 F1,998,0.05 F1,100,0.05 3.936) F 89.53, reject.. b. Since t 9.46 (t250,0.05 1.9695 t998,0.05 t100,0.05 1.9840), reject. c. Note that (t-value)2 (9.46)2 89.49 ≈ 89.53. 12-55. a. randomized block design b. H0: m1 m2 m3 m4 HA: At least two population means are different. c. Since 3.3785 4.0150, do not reject. d. H0: Since 20.39312 1.9358, reject. e. no difference 12-57. a. Since F 5.37 F0.05 2.642, reject. b. Since F 142.97 F0.05 5.192, reject. c. Since F 129.91 F0.05 5.192, reject; since F 523.33 F0.05 5.192, reject.. Chapter 13 13-1. Because 2 6.4607 13.2767, do not reject. 13-3. Because 2 218.62 18.3070, we reject. 13-5. Because the calculated value of 595.657 13.2767, we reject. 13-7. Because the calculated value of 4.48727 is less than 12.8345, do not reject. 13-9. a. Since chi-square statistic 3.379443 11.3449, do not reject. b. Based on the test, we have no reason to conclude that the company is not meeting its product specification. 13-11. Chi-square value 0.3647; do not reject. 13-13. Since chi-square 3.6549 7.3778, do not reject. 13-15. a. Because the calculated value of 1.97433 is less than the critical value of 14.0671, we do not reject. b. Since z 16.12 1.645, do not reject. 13-17. a. Since the calculated value of 27.9092 3.8415, we reject. b. 0.000000127 13-19. a. H0: Gender is independent of drink preference. HA: There is a relationship between gender and drink preference. b. 12.331 3.8415; reject. 13-21. a. 2 7.783 9.2104, do not reject. b. 0.0204 13-23. a. 8.3584 6.6349; reject. b. p-value 0.00384 13-25. The p-value 0.00003484; reject. 13-27. a. Chi-square 0.932; do not reject b. A decision could be made for other reasons, like cost. 13-29. Because 2 24.439 12.5916, reject the null hypothesis. 13-31. Collapse cells, chi-square 11.6 5.9915, reject. 13-33. Since 0.308 5.991, do not reject. 13-41. a. Chi-square 3,296.035 b. 402.3279 4.6052; reject. 13-43. Chi-square 0.172; do not reject. 13-45. a. H0: Airline usage pattern has not changed from a previous study. HA: Airline usage pattern has changed from a previous study. b. Chi-square test statistic is 66.4083 15.08632; reject. 13-47. b. 2 37.094 3.8415; reject.. Chapter 14 14-1. H0: r 0.0 HA: r 0.0 a 0.05, t 2.50, 2.50 1.7709; reject. 14-3. a. r 0.206 b. H0: r 0.

(453) ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. 14-5. a. b. c.. 14-7. a. b.. 14-9. a. b. c.. 14-11. a. b.. 14-13. a. b. c.. 14-15. a.. b. c.. 14-17. a. b.. c. d.. e.. 14-19. a. b. c. d.. 14-21. a. b.. 14-23. a. b.. c. 14-25. a. b.. HA: r 0 a 0.10, 1.8595 t 0.59 1.8595; do not reject H0. There appears to be a positive linear relationship. r 0.9239 H0: r 0 HA: r 0 d.f. 10 2 8 Since 6.8295 2.8965, reject H0. fairly strong positive correlation H0: r 0 HA: r 0 a 0.01, t 7.856 2.4066; reject the null hypothesis. The dependent is the average credit card balance. The independent variable is the income variable. does not appear to be a strong relationship H0: r 0.0 HA: r 0.0 a 0.05, t 1.56 2.1604; do not reject. r 0.979 H0: r 0 HA: r 0 a 0.05, 6.791 t 2.9200 as it was; we reject H0. positive linear relationship using Excel, r 0.75644. H0: r 0 HA: r 0. If t 2.0096, reject the null hypothesis; Because t 8.0957 2.0096, reject the null hypothesis. As 2001 revenue increases there is an increase in the 2004 revenue, which can be seen as the upward “tilt” of the scatter plot. r 0.186 H0: r 0 HA: r 0 a 0.05, 1.325 t 1.67655; do not reject H0. There appears to be a positive linear relationship between x and y. yˆ 15.31 4.92(x); b0 of 15.31 would be the value of y if x were 0; b1 of 4.92 means that for every one unit increase in x, y will increase by 4.92. R2 0.8702 H0: r 0 HA: r 0 Because t 5.7923 4.0321, reject H0. H0: b1 0 HA: b1 0 Because t 5.79 4.0321, reject H0. yˆ 26.830 0.4923x when x 10, yˆ 26.830 0.4923(10) 21.907 10(0.4923) 4.923 H0: b1

(454) 0 HA: b1 0 a 0.025, 3.67 2.4469; reject. 10.12 H0: b1

(455) 0 HA: b1 0 Since 3.2436 2.407, reject H0. t test statistic 10.86 2.1318, reject An increase of the average public college tuition of $1 would accompany an increase in the average private college tuition of $3.36. yˆ 3,372 3.36(7,500) $28,572 R-square 0.0395 Se 8,693.43. 14-27.. 14-29.. 14-31.. 14-33. 14-35.. 14-37.. 14-39.. 14-41.. 14-43. 14-45. 14-47. 14-49. 14-51. 14-53.. 14-55.. 891. c. H0: b1 0.0 HA: b1 0.0 a 0.05, t 1.11 2.0423; do not reject. d. insignificant b. yˆ 1,995 1.25x c. R2 94.0% H0: b1

(456) 1.4 HA: b1 1.4 a 0.05; t test statistic 1.301 is greater than the t-critical value of 1.8595. Do not reject. a. yˆ 145,000,000 1.3220x b. The regression equation is yˆ 145,000,000 1.3220x and the coefficient of determination is R2 83.1%. H0: b1 0 HA: b1 0 a 0.05; the t test statistic of 7.02 2.2281. Therefore, reject. c. yˆ 1,000,000(1.322) $1,322,000 b. yˆ 10.1 (0.7)(x) c. 0.9594 ——————— 0.4406 d. 2.98165 ——— 7.41835 0.0847 ———————————————— 0.2733 a. yˆ $58,766.11 $943.58(x) b. H0: b1 0.0 H0: b1 0.0 a 0.05 Because t 3.86 2.1315, reject. c. $422.58 ——— $1,464.58 a. yˆ 44.3207 0.9014x b. H0: 1 0 HA: 1 0 a 0.05 The p-value 0.05 (t 6.717 is greater than any value for 13 degrees of freedom shown in the table) and the null hypothesis is rejected. c. (0.6637, 1.1391) a. yˆ 6.2457 (9.8731)(x) b. yˆ 6.2457 (9.8731)(6) 52.9929 minutes c. 78.123 ——— 87.102 d. 47.96866 ——— 77.76391 b. yˆ 11,991 5.92x and the coefficient of determination is R2 89.9%. c. (76.47 115.40) b. (3.2426, 6.4325) c. (3.78 to 6.5) The value of 0.45 would indicate a relatively weak positive correlation. no a. no b. cannot assume cause and effect The answer will vary depending on the article the students select. a. The regression equation is yˆ 3,372 3.36x. b. H0: b1 0 HA: b1 0 a 0.05, t 10.86, p-value 0.000 c. ($23,789, $30,003) d. Since $35,000 is larger than $30,966, it does not seem to be a plausible value. a. There appears to be a weak positive linear relationship. b. 1. r 0.6239 2. H0: r 0 HA: r 0.

(457) 892. ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. c.. 14-57. a. b.. 14-59. b. c. d. 14-61. a. b. 14-63. a. b.. c. 14-65. b. c. d. e.. d.f. 10 2 8 Since 2.2580 3.3554, do not reject H0. 1. yˆ 0.9772 0.0034(x) 2. The y-intercept has no interpretation in this case. The slope indicates that the average university GPA increases by 0.0034 for each increase of 1 unit in the SAT score. There seems to be a random pattern in the relationship between the typing speed using the standard and ergonomic. H0: r 0 HA: r 0 a 0.05, r 0.071. t calc 0.2013 t 1.8595; do not reject H0. yˆ 1,219.8035 (9.1196)(x) 568.285 112.305 Since 1.5 5.117, do not reject H0. There appears to be a possible positive linear relationship between time (in hours) and rating. yˆ 66.7111 10.6167(x) yˆ 3,786 1.35x H0: b1 0 HA: b1 0 a 0.05, p-value 0.000. Since the p-value is less than a 0.05, we reject the null hypothesis. ($265,597, $268,666) 378.365 592.08 no 2,115.458 2,687.465 $100 is outside of the range of the sample.. Chapter 15 15-1. a. yˆ 87.7897 0.9705x1 0.0023x2 8.7233x3 b. F 5.3276 F0.05 3.0725; also, p-value 0.00689 any reasonable alpha. Therefore, reject H0: 1 2 3 0. c. R 2 . SSR 16, 646.09124 0.432 SST 38, 517.76. d. x1 ( p-value 0.1126 a 0.05; fail to reject H0: b1 0) and x3 (p-value 0.2576 a 0.05; fail to reject H0: b3 0) are not significant. e. b2 0.0023; yˆ increases 0.0023 for each one-unit increase of x2. b3 8.7233; yˆ decreases 8.7233 for each one-unit increase of x3. f. The confidence intervals for b1 and b3 contain 0. 15-3. a. b1 412; b2 818; b3 93; b4 71 b. yˆ 22,167 412(5.7) 818(61) 93(14) 71(1.39) 68,315.91 15-5. a. yi 5.05 0.051x1 0.888x2 b. yi x1 x1. 0.206. x2. 0.919. d. Predictor. Coef. SE Coef. Constant. 5.045. 8.698. 0.58. P. VIF. 0.580. x1. 0.0513. 0.2413. 0.21. 0.838. 1.1. x2. 0.8880. 0.1475. 6.02. 0.001. 1.1. 15-7. a. yˆ 977.1 11.252(WK HI) 117.72(P-E) b. H0: b1 b2 0 HA: at least one bi 0 a 0.05, F 39.80 3.592; we reject. c. yˆ 1,607 15-9. a. yˆ 503 10.5x1 2.4x2 0.165x3 1.90x4 b. H0: b1 b2 b3 b4 0 HA: at least one bi 0 a 0.05 Since F 2.44 3.056, fail to reject H0. c. H0: b3 0 HA: b3 0 a 0.05, p-value 0.051 0.05; fail to reject H0. d. yˆ 344 0.186x1. The p-value 0.004. Since the p-value 0.004 0.05, reject. 15-11. a. There is a positive linear relationships between team win/loss percentage and game attendance, opponent win/loss percentage and game attendance, games played and game attendance. There is no relationship between temperature and game attendance. b. There is a significant relationship between game attendance and team win/loss percentage and games played. c. Attendance 14,122.24 63.15(Win/loss%) 10.10 (Opponent win/loss) 31.51(Games played) 55.46 (Temperature) d. R2 0.7753, so 77.53% is explained. e. H0: b1 b2 b3 b4 0 HA: at least one bi does not equal 0 significance F 0.00143; reject H0. f. For team win/loss % the p-value 0.0014 0.08 For opponent win/loss % the p-value 0.4953 0.08 For games played the p-value 0.8621 0.08 For temperature the p-value 0.3909 0.08 g. 1,184.1274; interval of 2(1,184.1274) h. VIF Team win/loss percentage and all other X. 1.569962033. Temperature and all other X. 1.963520336. Games played and all other X. 1.31428258. Opponent win/loss percentage and all other X. 1.50934547. 0.257. H0: r 0 HA: r 0 a 0.05, t 0.5954, 2.306 t 0.5954 2.306; we fail to reject H0. c. H0: b1 b2 0 HA: at least one bi 0 a 0.05, F 19.07 Since F 19.07 4.737, reject H0.. T. 15-15. a. b. c. 15-17. a. b. c.. Multicollinearity is not a problem since no VIF is greater than 5. x2 1, yˆ 145 1.2(1,500) 300(1) 2,245 x2 0, yˆ 145 1.2(1,500) 300(0) 1,945 b2 indicates the average premium paid for living in the city’s town center. As the vehicle weight increases by 1 pound, the average highway mileage rating would decrease by 0.003. If the car has standard transmission the highway mileage rating will increase by 4.56, holding the weight constant. yˆ 34.2 0.003x1 4.56(1) 38.76 0.003x1.

(458) ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. 15-19.. 15-21.. 15-23.. 15-25.. 15-27.. 15-29.. 15-31.. 15-33.. d. yˆ 34.2 0.003(4,394) 4.56(0) 21.02 e. Incorporating the dummy variable essentially gives two regression equations with the same slope but different intercepts depending on whether the automobile is an automatic or standard transmission. a. yˆ 197 43.6x1 51x2 b. b1 The difference in the average PP100 between Domestic and Korean vehicles b2 The difference in the average PP100 between European and Korean vehicles c. H0: b1 b2 0 HA: at least one bi 0 a 0.05, F 4.53 3.555; reject H0. a. There appears to be a weak positive linear relationship between hours and net profit. There appears to be a weak negative linear relationship between client type and net profit. b. yˆ 1,012.0542 69.1471(x1) c. The p-value 0.0531. The R2 is only 0.3549. a. yˆ 390 37.0x1 0.263x2 H0: b1 0 HA: b1 0 c. a 0.05. Since t 20.45 1.9921, we reject H0. d. yˆ 390 37.0x1 0.263x2 390 37.0(1) 0.263(500) 484.5 ≈ 485 a. A linear line is possible, nonlinear is more likely. c. yˆ 4.937 1.2643x; the p-value 0.015 a 0.05, reject. yˆ 25.155 18.983ln x b. two quadratic models; interaction between x2 and the quadratic relationship between y and x2 yˆ i 4.9 3.58x1 0.014x12 1.42x1x2 0.528x21x2 c. b3x1x2 and b4x21 x2. So you must conduct two hypothesis tests: i. H0: b3 0 HA: b3 0 a 0.05, p-value 0.488; we fail to reject H0. ii. For b4 0, the p-value 0.001. d. Conclude that there is interaction between x2 and the quadratic relationship between x1 and y. a. The complete model is yi b0 b1x1 b2x2 b3x3 b4x4 i. The reduced model is yi b0 b1x1 b2x2 i. H0: b3 b4 0, HA: at least one bi 0. SSEC 201.72. So MSEC SSEC /(n c 1) 201.72/(10 4 1) 40.344 and SSER 1,343. a 0.05, F 14.144; 14.144 5.786; we reject H0. b. The complete model is yi b0 b1x1 b2x2 b3x3 b4x4 i. The reduced model is yi b0 b1x3 b2x4 i. SSEC 201.72. So MSEC SSEC /(n c 1) 201.72/(10 4 1) 40.344 and SSER 494.6. H0: b1 b2 0; HA: at least one bi 0; a 0.05, F 3.63 The numerator degrees of freedom are c r 4 2 2 and the denominator degrees of freedom are n c 1 10 4 1 5. The p-value P(F

(459) 3.630) 0.1062. Fail to reject. a. two dummy variables x2 1 if manufacturing, 0 otherwise x3 1 if service, 0 otherwise Net profit 586.256 22.862x1 2,302.267x2 1,869.813x3 b. Net profit 5,828.692 334.406x1 4.577x1 sq 2,694.801x2 12,874,953x3 a. Create scatter plot. b. Second-order polynomial seems to be the correct model. yˆ 8,083 0.273x 0.000002x2.. 15-35.. 15-39.. 15-41.. 15-43.. 15-45.. 15-47. 15-49.. 15-51. 15-55.. 15-57.. 893. c. y b0 b1x1 b2x12 b3x2 b4x1x2 b5x12 x2 The two interaction terms are b4x1x2 and b5x12x2. So you must conduct two hypothesis tests: i. Test for b4 0. Since the p-value 0.252 0.05, we fail to reject H0. ii. Test for b5 0. Since the p-value for b5 0.273 0.05, we fail to reject H0. a. Create scatter plot. b. fourth order polynomial c. The regression equation is Admissions 30.0 24.5(Average prices) 7.3(AP2) 0.98(AP3) – 0.0499(AP4). d. Test the hypothesis b4 0. Since p-value 0.572 0.05, we fail to reject H0; there is sufficient evidence to remove the fourth order component. Similarly, there is evidence to remove the third order component. yˆ i 2.68 1.47x1 0.129x21 a. x2 and x4 only; x1 and x3 did not have high enough coefficients of partial determination to add significantly to the model. b. would be identical c. Stepwise regression cannot have a larger R2 than the full model. a. None are significant. b. Alpha-to-enter: 0.25; alpha-to-remove: 0.25 yˆ 26.19 0.42x3, R2 14.68 c. yˆ 27.9 0.035x1 0.002x2 0.42x3, R2 14.8 The adjusted R2 is 0% for the full model and 7.57% for the standard selection model. Neither model offers a good approach to fitting this data. a. yˆ 32.08 0.76x1 5x3 0.95x4 b. yˆ 32.08 0.76x1 5x3 0.95x4 c. yˆ 32.08 0.76x1 5x3 0.95x4 d. yˆ 32.08 0.76x1 5x3 0.95x4 a. yˆ 18.33 1.07x2 b. one independent variable (x2) and one dependent variable (y) c. x1 was the first variable removed, p-value (0.817) 0.05. x3 was the last variable removed, p-value 0.094 0.05. b. yˆ 1,110 1.60x22 a. Using Minitab you get an error message indicating there is multicollinearity among the predictor variables. b. Either crude oil or diesel prices should be removed. c. yˆ 0.8741 0.00089x2 0.00023x3 a. yˆ 16.02 2.1277x b. p-value 0.000 0.05 a. Calls 269.838 4.953(Ads previous week) 0.834(Calls previous week) 0.089(Airline bookings) The overall model is not significant and none of the independent variables are significant. b. The assumption of constant variance has not been violated. c. It is inappropriate to test for randomness using a plot of the residuals over time since the weeks were randomly selected and are not in sequential, time-series order. d. Model meets the assumption of normally distributed error terms. a. yˆ 0.874 0.000887x1 0.000235x2 b. The residual plot supports the choice of the linear model. c. The residuals do not have constant variances..

(460) 894. ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. d. e. 15-59. a. b.. 15-63.. 15-65.. 15-69.. 15-71.. 15-73. 15-75.. 15-77.. 15-79.. 15-81.. 15-83.. The linear model appears to be insufficient. The error terms are normally distributed. yˆ 6.81 5.29x1 1.51x2 0.000033x3 Plot the residuals versus the independent variable (x) or the fitted value (ˆyi). c. yˆ 0.97 3.20x1 0.285x2 0.000029x3 3.12x12 0.103x22 0.000000x32 d. The residual plot does not display any nonrandom pattern. e. The error terms are normally distributed. a. The relationship between the dependent and each independent variable is linear. b. The residuals are independent. c. The variances of the residuals are constant over the range of the independent variables. d. The residuals are normally distributed. a. The average y increases by three units holding x2 constant. b. x2, since x2 only affects the y-intercept of this model. c. The coefficient of x1 indicates that the average y increases by 7 units when x2 1. d. The coefficient of x1 indicates that the average y increases by 11 units when x2 1. e. Those coefficients affected by the interaction terms have conditional interpretations. a. The critical t for all pairs would be 2.1604, correlated pairs. Volumes sold (y) Production expenditures Volumes sold (y) Number of reviewers Volumes sold (y) Pages Volumes sold ( y) Advertising budget b. All p-values 0.05 c. Critical F 3.581; since F 9.1258 3.581, conclude that the overall model is significant. e. 2(24,165.9419) 48,331.8 f. Constant variance assumption is satisfied. g. The residuals appear to be approximately normally distributed. h. The model satisfies the normal distribution assumption. The t-critical for all pairs would be 2.0687, correlated pairs are For family size and age For purchase volume and age For purchase volume and family income The significance F 0.0210 Age entered the model. The R2 at step 1 was 0.2108 and the standard error at step 1 was 36.3553. The R2 at step 2 is 0.3955 and the standard error at step 2 is 32.5313. Other variables that enter into the model partially overlap with the other included variables in its ability to explain the variation in the dependent variable. a. Normal distribution of the residuals. b. The selected independent variables are not highly correlated with the dependent variable. a. yˆ 2,857 26.4x1 80.6x2 0.115x21 2.31x22 0.542x1x2 b. The residual plot supports the choice of the linear model. c. The residuals do have constant variances. d. The linear model appears to be insufficient. The addition of an independent variable representing time is indicated. e. A transformation of the independent or dependent variables is required. a. Quadratic relationship exists between cost and weight. b. r 0.963 H0: r 0. HA: r 0 a 0.05; Since the p-value 0.000 0.05, we reject H0. c. Cost 64.06 14.92(Weight) d. Cost 113.8 9.22(Weight) 1.44(Weight2) Comparing the R2adj for the quadratic equation (95.6%) and the R2 for the simple linear equation (94.5%), the quadratic equation appears to fit the data better. 15-85. Vehicle year 73.18 9.1(Gender) 1.39(Years education) 24(Not seat belt) R-squared 14.959%. Chapter 16 16-3. Generally, quantitative forecasting techniques can be used whenever historical data related to the variable of interest exist and we believe that the historical patterns will continue into the future. 16-7. a. The forecasting horizon is 6 months. b. a medium term forecast c. a month d. 12 months 16-9. c. Year Radio % radio Newspaper Laspeyres. d.. 1. 300. 0.3. 400. 100. 2. 310. 0.42. 420. 104.59. 3. 330. 0.42. 460. 113.78. 4. 346. 0.4. 520. 126.43. 5. 362. 0.38. 580. 139.08. 6. 380. 0.37. 640. 151.89. 7. 496. 0.43. 660. 165.08. Year. Radio. % radio. 1. 300. 2 3. Newspaper. Paasche. 0.3. 400. 100. 310. 0.42. 420. 104.41. 330. 0.42. 460. 113.22. 4. 346. 0.4. 520. 124.77. 5. 362. 0.38. 580. 136.33. 6. 380. 0.37. 640. 148.12. 7. 496. 0.43. 660. 165.12. 16-13.. Year. Labor Costs. Material Costs. % % Laspeyres Materials Labor Index. 1999. 44,333. 66,500. 60. 40. 100. 2000. 49,893. 68,900. 58. 42. 106.36. 2001. 57,764. 70,600. 55. 45. 113.59. 2002. 58,009. 70,900. 55. 45. 114.07. 2003. 55,943. 71,200. 56. 44. 112.95. 2004. 61,078. 71,700. 54. 46. 117.03. 2005. 67,015. 72,500. 52. 48. 122.09. 2006. 73,700. 73,700. 50. 50. 127.88. 2007. 67,754. 73,400. 52. 48. 123.44. 2008. 74,100. 74,100. 50. 50. 128.57. 2009. 83,447. 74,000. 47. 53. 134.95.

(461) ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS 16-15. a. The sum ∑p1993 1.07 6.16 8.32 15.55. b. 170.87 170.87 103.86 c. 103.86; 100 64.52% 103.86 16-17. a. 102.31 b. 15.41% c. 14.43% 16-21. January 0.849; July 0.966 16-23. a. upward linear trend with seasonal component as a slight drop in the 3rd quarter b. Normalize to get the following values: Quarter. Seasonal Index. 1. 1.035013. 2. 1.020898. 3. 0.959934. 4. 0.984154. d.. Quarter. Period. Quarter 1 2010 Quarter 2 2010 Quarter 3 2010 Quarter 4 2010. 17 18 19 20. Seasonal Index. 250.15 256.17 262.18 268.20. 1. 1.02349. 2. 1.07969. 3. 1.16502. 4. 1.12147. 5. 0.98695. 6. 0.83324. 7. 0.86807. 8. 0.91287. 9. 0.97699. 10. 1.07311. 11. 1.01382. 12. 0.94529. Seasonally Adjusted Forecast. 48.27804189. Prediction Interval Lower Limit. 161.7600534. Prediction Interval Upper Limit. 258.3161371. Model with transformation: For Individual Response y. 258.91 261.52 251.68 263.95. Interval Half Width. 16-33. a.. b.. c. d.. 16-35. b. c. d. e. 16-37. a.. c. F 1.98 0.0459(Month) d. F25 1.98 0.0459(25) 3.12. Adjusted F25 (1.02349)(3.12) 3.19. F73 1.98 0.0458589(73) 5.32. Adjusted F73 (1.02349)(5.32) 5.44. 16-29. a. seasonal component to the data b. MSE 976.34 and MAD 29.887 c. Quarter Index. 256.5620033 260.0884382 263.614873 267.1413079. Interval Half Width. 1.0350 1.0209 0.9599 0.9842. Index. Forecast. 13 14 15 16. For Individual Response y. 16-27. b. The seasonal indexes generated by Minitab are: Month. Period. e. MSE 926.1187, MAD 29.5952 f. The adjusted model has a lower MSE and MAD. 16-31. a. Forecast without transformation 36.0952 10.8714(16) 210.0376 Forecast with transformation 65.2986 0.6988(16)2 244.1914 Actual cash balance for Month 16 was 305. The transformed model had a smaller error than the model without the transformation. b. Model without transformation:. c. MSE 36.955 and MAD 4.831 d. and e. Seasonally Unadjusted Forecast. 2009 Qtr. 1 Qtr. 2 Qtr. 3 Qtr. 4. 895. b. c.. 23.89550188. Prediction Interval Lower Limit. 220.29634337. Prediction Interval Upper Limit. 268.08734713. The model without the transformation has the wider interval. Linear trend evidenced by the slope from small to large values. Randomness is exhibited since not all of the data points would lie on a straight line. H0: b1 0 HA: b1 0 a 0.10, Minitab lists the p-value as 0.000. The fitted values are F38 36,051, F39 36,955, F40 37,858, and F41 38,761. The forecast bias is 1,343.5. On average, the model over forecasts the e-commerce retail sales an average of $1,343.5 million. A trend is present. Forecast 136.78, MAD 23.278 Forecast 163.69, MAD 7.655 The double exponential smoothing forecast has a lower MAD. The time series contains a strong upward trend, so a double exponential smoothing model is selected. The equation is yˆ t 19.364 0.7517t. Since C0 b0, C0 19.364. T0 b1 0.7517. Forecasts Period. Forecast. Lower. Upper. 13. 29.1052. 23.9872. 34.2231. d. MAD as calculated by Minitab: Accuracy Measures. 1. 1.0290. 2. 0.9207. MAPE. 8.58150. 3. 1.0789. MAD. 2.08901. 4. 0.9714. MSE. 6.48044.

(462) 896. ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. 16-39. a. The time series contains a strong upward trend, so a double exponential smoothing model is selected. b. yˆ t 990 2,622.8t. Since C0 b0, C0 990. T0 b1 2,622.8. c. Forecast 58,852.1 d. MAD 3,384 16-41. a. There does not appear to be any trend component in this time series. c. MAD 3.2652 d. F14 0.25y13 (1 0.25)F13 0.25(101.3) 0.75(100.22) 100.49 16-43. a. Single exponential smoothing model is selected. b. The forecast is calculated as an example F1 F2 0.296. Then F3 0.15y2 (1 0.15)F2 0.15(0.413) 0.85(0.296) 0.3136. c. MAD 15.765/71 0.222 d. F73 0.15y72 (1 0.15)F72 0.15(0.051) 0.85(0.259) 0.212 16-45. a. The double exponential smoothing model will incorporate the trend effect. b. From regression output, Initial constant 28,848; Initial trend 2,488.96. Forecast for period 17 72,450.17. MAD 5,836.06. c. The MAD produced by the double exponential smoothing model at the end of Month 16 is smaller than the MAD produced by the single exponential smoothing model. d. and e. Of the combinations considered the minimum MAD at the end of Month 16 occurs when alpha 0.05 and beta 0.05. The forecast for Month 17 with alpha 0.05 and beta 0.05 is 71,128.45. 16-47. a. a seasonal component b. The pattern is linear with a positive slope. c. a cyclical component d. a random component e. a cyclical component 16-49. A seasonal component is one that is repeated throughout a time series and has a recurrence period of at most one year. A cyclical component is one that is represented by wavelike fluctuations that has a recurrence period of more than one year. Seasonal components are more predictable. 16-51. a. There does appear to be an upward linear trend. b. Forecast 682,238,010.3 342,385.3(Year) Since F 123.9719 4.6001, conclude that there is a significant relationship. c. MAD 461,216.7279 d.. Year. Forecast. 2010. 4,929,275.00. 2011. 5,271,660.29. 2012. 5,614,045.59. 2013. 5,956,430.88. 2014. 6,298,816.18. Period. Index. 1. 0.98230. 2. 1.01378. 3. 1.00906. 4. 1.00979. 5. 0.99772. 6. 1.01583. 7. 0.99739. 8. 1.00241. 9. 0.98600. 10. 0.98572. c. The nonlinear trend model (using t and t2) fitted to the deseasonalized data. ARM 3.28 0.114(Month) 0.00177(Monthsq) d. The unadjusted forecast: F61 3.28 0.114(61) 0.00117(61)2 5.8804. The adjusted forecast is F61 (0.9823)(5.8804) 5.7763. e. The following values have been computed: R2 92.9%, F-statistic 374.70, and standard error 0.226824 The model explains a significant amount of variation in the ARM. Durbin-Watson d statistic 0.378224. Because d 0.378224 dL 1.35, conclude that significant positive autocorrelation exists. 16-59. a. MAD 4,767.2 c. Alpha MAD 0.1. 5,270.7. 0.2. 4,960.6. 0.3. 4,767.2. 0.4. 4,503.3. 0.5. 4,212.4. 16-61. a. A strong trend component is evident in the data. b. Using 1980 Year 1, the estimated regression equation is yˆt 490.249 1.09265t, R2 95.7%. e. For Individual Response y Interval Half Width. c. Forecast(2008) 740.073 d. MAD 89.975 16-57. a. A cyclical component is evidenced by the wave form, which recurs approximately every 10 months. b. If recurrence period, as explained in part a, is 10 months, the seasonal indexes generated by Minitab are. 1,232,095.322. Prediction Interval Lower Limit. 5,066,720.854. Prediction Interval Upper Limit. 7,530,911.499. 16-55. a. The time series contains a strong upward trend, so a double exponential smoothing model is selected. b. Since C0 b0, C0 2,229.9; T0 b1 1.12.. H0: b1 0 HA: b1 0 a 0.10, t 23.19, n 2 26 2 24, critical value is t0.10 1.3178 c. yˆt 490.249 1.09265(31) 524.1212. Chapter 17 ~

(463) 14 17-1. The hypotheses are H0: m ~ 14 HA: m W 36 n 11, a .05; reject if W 13..

(464) ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS ~4 17-3. The hypotheses are H0: m ~4 HA: m W 9, W 19: Critical values for n 7, assuming a 0.1 are 3 and 25. Cannot reject. ~4 17-5. a. The hypotheses are H0: m ~4 HA: m b. Using the Wilcoxon Signed Rank test, W 26: Upper tail test and n 12, letting a .05, reject if W 61. So, cannot reject. ~

(465) 11 17-7. H0: m ~ 11 HA: m Using the Wilcoxon Signed Rank test, W 92: Reject if W 53. ~ 30 17-9. H0: m ~ 30 HA: m Using the Wilcoxon Signed Rank test, W 71.5, W 81.5: Because some of the differences are 0, n 17. The upper and lower values for the Wilcoxon test are 34 and 119 for a 0.05. Do not reject. 17-11. a. Using data classes one standard deviation wide, with the data mean of 7.6306 and a standard deviation of 0.2218: e. o. 14.9440 32.4278 32.4278 14.9440. 21 31 27 16. (o - e)2/e 2.45417 0.06287 0.90851 0.07462 Sum = 3.5002. Testing at the a 0.05 level, χa 5.9915. b. Since we concluded the data come from a normal distribution we test the following: H0: m

(466) 7.4 HA: m 7.4 Decision rule: If z 1.645, reject H0; otherwise do not reject. Z 10.13 17-13. a. Putting the claim in the alternate hypothesis: ~ m ~

(467) 0 H0: m 1 2 ~ ~ 0 HA: m1 m 2 b. Test using the Mann-Whitney U Test. U1 40, U2 24 Use U2 as the test statistic. For n1 8 and n2 8 and U 24, p-value 0.221. ~ m ~ 0 17-15. a. H0: m 1 2 ~ m ~ 0 HA: m 1 2 b. Since the alternate hypothesis indicates Population 1 should have the larger median, U1 40. n1 12 and n2 12. Reject if U 31. ~ m ~ 0 17-17. H0: m 1 2 ~ ~ 0 HA: m1 m 2 Mann-Whitney Test and CI: C1, C2 2. C1 C2. N 40, N 35,. Median 481.50 Median 505.00. Point estimate for ETA1 ETA2 is 25.00 95.1% CI for ETA1 ETA2 is (62.00, 9.00) W 1,384.0 Test of ETA1 ETA2 vs. ETA1 not ETA2 is significant at 0.1502. ~ m ~ 0 17-19. a. H0: m 1 2 ~ m ~ 0 HA: m 1 2. 897. With n 8, reject if T 4. Since T 11.5, we do not reject the null hypothesis. b. Use the paired sample t test. p-value 0.699. ~ m ~ 0 17-21. H0: m 1 2 ~ ~ 0 HA: m1 m 2 With n 7, reject if T 2, T 13. ~ m ~ 17-23. H0: m 2 1 ~ m ~ H :m A. 2. 1. If T 0 reject H0; T 8. ~ m ~ 0 17-25. H0: m W WO ~ m ~ 0 HA: m W WO U1 (7)(5) (7)(7 1)/2 42 21 U2 (7)(5) (5)(5 1)/2 36 14 Utest 21 Since 21 is not in the table you cannot determine the exact p-value, but you know that the p-value will be greater than 0.562. ~ m ~ 17-27. a. H0: m 1 2 ~ m ~ H :m A. 1. 2. b. Since T 51 is greater than 16, do not reject H0. c. Housing values are typically skewed. ~ m ~ 17-29. H0: m 1 2 ~ ~ HA: m1 m 2 m 40(40 1)/4 410 s. 40(40 1)(80 1)/ 24 74.3976. z (480 410)/74.3976 0.94 p-value (0.5 0.3264)2 (0.1736)(2) 0.3472. Do not reject H0. 17-31. a. A paired-t test. H0: md

(468) 0 HA: md 0 t (1.7)/(3.011091/ 10 ) 1.785 Since 1.785 t critical 2.2622, do not reject H0. ~

(469) m ~ b. H0: m O N ~ ~ HA: mO m N T 5.5. Since 5.5 6, reject H0 and conclude that the medians are not the same. c. Because you cannot assume the underlying populations are normal you must use the technique from part b. ~

(470) m ~ 17-33. a. H0: m N C ~ m ~ H :m. b. 17-35. a. b. c.. d. 17-37. a.. A. N. 0. 1. C. U1 4,297, U2 6,203, m 5,250, s 434.7413 z 2.19 p-value 0.0143 a Type I error The data are ordinal. The median would be the best measure. ~ m ~ H0: m 1 2 ~ ~ HA: m1 m 2 Using a 0.01, if T 2, reject H0. Since 12.5 2, do not reject H0. The decision could be made based on some other factor, such as cost. ~ m ~ m ~ H :m 2. 3. HA: Not all population medians are equal. 2 H 10.98. Since, with a 0.05, χa 5.9915, and H 10.98, we reject. ~ m ~ m ~ m ~ 17-39. a. H0: m 1 2 3 4 HA: Not all population medians are equal. b. Use Equation 17-10. 2 Selecting a 0.05, χa 7.8147, since H 42.11, we reject the null hypothesis of equal medians..

(471) 898. ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS. 17-41. a. Salaries in general are usually thought to be skewed. b. The top-salaried players get extremely high salaries compared to the other players. ~ m ~ m ~ c. H0: m 1 2 3 HA: Not all population medians are equal. 2 H 52.531. If a 0.05, χa 5.9915. Reject ~ ~ ~ ~ 17-43. H0: m1 m2 m3 m4 HA: Not all population medians are equal. Using PHStat, H test statistic 11.13971. Adjusting for ties, the test statistic is 11.21, which is smaller than the critical value (11.34488). ~ m ~ m ~ 17-45. H0: m 1 2 3 HA: Not all population medians are equal. 2 H 13.9818, testing at a 0.05, χa 5.9915 Since 13.9818 5.9915, reject H0. ~ m ~ 0 17-53. H0: m 1 2 ~ ~ 0 HA: m1 m 2 U1 107, U2 14; U test 14 with a 0.05, Ua 30. Since 14 30, reject H0. 17-55. a. A nonparametric test. ~ m ~

(472) 0 b. H0: m O N ~ m ~ 0 HA: m O N U1 71, U2 29; U test 29 If a 0.05, Ua 27. Since 29 27, do not reject H0. 17-57. The hypotheses being tested are ~ 1,989.32 H0: m ~ 1,989.32 HA: m Find W 103, W 68. With a 0.05, reject if W 40 or W 131. ~ 8.03 17-59. a. H0: m ~ 8.03 HA: m b. W 62.5, W 57.5 This is a two-tailed test with n 15. If a 0.05, reject if W 25 or W 95. ~ m ~ 17-61. H0: m 1 2 ~ ~ HA: m1 m 2 Constructing the paired difference table, T 44.5. With a 0.05, reject if T 21 or if T

(473) 84. b. a Type II error 17-63. a. They should use the Wilcoxon Matched-Pairs Signed Rank. ~ ~ b. H0: m w/oA

(474) mA ~ ~ H :m m A. w/oA. A. T6 Using a 0.025, Ta 4. c. Do not reject H0.. Chapter 18 18-9. Some possible causes by category are: People: Too Few Drivers, High Driver Turnover Methods: Poor Scheduling, Improper Route Assignments Equipment: Buses Too Small, Bus Reliability, Too Few Buses Environment: Weather, Traffic Congestion, Road Construction 18-11. a. A2 0.577; D3 0 and D4 2.114 b. The R-chart upper control limit is 2.114 × 5.6 11.838. The R-chart lower control limit is 0 × 5.6 0. c. X-bar chart upper control limit 44.52 (0.577 × 5.6) 47.751; Lower control limit 44.52 (0.577 × 5.6) 41.289 18-15. a. x-bar chart centerline 0.753 UCL 0.753 (0.577 0.074) 0.7957 LCL 0.753 (0.577 0.074) 0.7103. b. R-chart centerline 0.074 UCL 2.114 0.074 0.1564 LCL 0 0.074 0.000 c. There are no subgroup means outside of the upper or lower control limits on the x-bar chart. For the R-chart, there are no subgroup ranges outside the control limits. 18-17. a. The process has gone out of control since all but two observations and the 1st eight in sequence are below the LCL. 18-19. a. c-chart b. c 3.2841 UCL 6.5682 3( 6.5682 ) 14.2568 LCL 6.5682 3( 6.5682 ) 1.1204, so set to 0 c. in statistical control 18-21. a. For R-chart UCL 2.282 100.375 229.056 CL 100.375 LCL 0 100.375 0 b. For x-bar chart UCL 415.3 0.729(100.375) 488.473 CL 415.3 LCL 415.3 0.729(100.375) 342.127 c. out of control 18-23. b. 82.46 c. 12.33 d. UCL 2.114 12.33 26.07 and LCL 0 12.33 0 e. UCL 82.46 (0.577 12.33) 89.57 LCL 82.46 (0.577 12.33) 75.35 f. There is a run of nine values above the centerline. 18-25. b. p 441/(300 50) 0.0294 s (0.0294)(1 0.0294)/50 0.0239 UCL 0.0294 3(0.0239) 0.1011 LCL 0.0294 3(0.0239) 0.0423, so set to 0 c. Sample Number. 301. 302. 303. p-bar. 0.12. 0.18. 0.14. The process has gone out of control. b. 0.28, which is again above the UCL 18-27. a. c 29.3333 UCL 29.3333 3( 29.3333 ) 45.5814 LCL 29.3333 3( 29.3333 ) 13.0852 b. The process seems to be out of control. c. Need to convert the data to bags per passenger by dividing bags by 40 and then developing a u-chart based on the explanation in optional topics. CL 29.333/40 0.7333 UCL 0.7333 3. d. 18-29. a. b. c.. 0.7333/40 1.1395. LCL 0.7333 3 0.7333/40 0.3271 Process is out of control. A2 1.023 D3 0.0 and D4 2.575 UCL 2.575 0.80 2.06 LCL 0 0.80 0 UCL 2.33 (1.023 0.80) 3.1468 LCL 2.33 (1.023 0.80) 1.512.

(475) ANSWERS TO SELECTED ODD-NUMBERED PROBLEMS 18-31. The centerline of the control chart is the average proportion of defective 720/(20 150) 0.240. For 3-sigma control chart limits we find UCL 0.240 3 0.240 (1 0.240) 0.345 150 LCL 0.240 3. 0.240 (1 0.240) 0.135 150. 18-33. p-chart; p 0.0524 Sp . 0.0524 (1 0.0524) 0.0223 100. Lower Control Limit 0.0524 3 0.0223 0.0145, so set to 0 Centerline 0.0524 Upper Control Limit 0.0524 3 0.0223 0.1193 in control. 899. 18-35. The appropriate control chart for monitoring this process is a c-chart. The 3-sigma upper control limit is 23.00 (3 4.7958) 37.3875. The 3-sigma lower control limit is 23.00 (3 4.7958) 8.6125. Note that sample number 5 with 38 defects is above the upper control limit. 18-37. a. x-bar chart: CL 0.7499 UCL 0.7499 (0.577)(0.0115) 0.7565 LCL 0.7499 (0.577)(0.0115) 0.7433 R-chart: CL 0.0115 UCL (0.0115)(2.114) 0.0243 LCL (0.0115)(0) 0 b. It now appears that the process is out of control..

(476) Glossary Adjusted R-squared A measure of the percentage of explained. Box and Whisker Plot A graph that is composed of two parts: a. variation in the dependent variable in a multiple regression model that takes into account the relationship between the sample size and the number of independent variables in the regression model.. box and the whiskers. The box has a width that ranges from the first quartile (Q1) to the third quartile (Q3). A vertical line through the box is placed at the median. Limits are located at a value that is 1.5 times the difference between Q1 and Q3 below Q1 and above Q3. The whiskers extend to the left to the lowest value within the limits and to the right to the highest value within the limits.. Aggregate Price Index An index that is used to measure the rate. of change from a base period for a group of two or more items. All-Inclusive Classes A set of classes that contains all the pos-. sible data values. Alternative Hypothesis The hypothesis that includes all popu-. lation values not included in the null hypothesis. The alternative hypothesis will be selected only if there is strong enough sample evidence to support it. The alternative hypothesis is deemed to be true if the null hypothesis is rejected. Arithmetic Average or Mean The sum of all values divided by. the number of values. Autocorrelation Correlation of the error terms (residuals). occurs when the residuals at points in time are related. Balanced Design An experiment has a balanced design if the. factor levels have equal sample sizes. Bar Chart A graphical representation of a categorical data set. in which a rectangle or bar is drawn over each category or class. The length or height of each bar represents the frequency or percentage of observations or some other measure associated with the category. The bars may be vertical or horizontal. The bars may all be the same color or they may be different colors depicting different categories. Additionally, multiple variables can be graphed on the same bar chart. Base Period Index The time-series value to which all other val-. ues in the time series are compared. The index number for the base period is defined as 100. Between-Sample Variation Dispersion among the factor sam-. ple means is called the between-sample variation.. Business Statistics A collection of procedures and techniques. that are used to convert data into meaningful information in a business environment. Census An enumeration of the entire set of measurements taken. from the whole population. The Central Limit Theorem For simple random samples of n ob-. servations taken from a population with mean m and standard deviation s, regardless of the population’s distribution, provided the sample size is sufficiently large, the distribution of the sam– ple means, x, will be approximately normal with a mean equal to the population mean (m x m x ) and a standard deviation equal to the population standard deviation divided by the square root of the sample size s x s n . The larger the sample size, the better the approximation to the normal distribution.. Class Boundaries The upper and lower values of each class. Class Width The distance between the lowest possible value and. the highest possible value for a frequency class. Classical Probability Assessment The method of determining. probability based on the ratio of the number of ways an outcome or event of interest can occur to the number of ways any outcome or event can occur when the individual outcomes are equally likely. Closed-End Questions Questions that require the respondent. to select from a short list of defined choices. Cluster Sampling A method by which the population is divided. distorting it; different from a random error which may distort on any one occasion but balances out on the average.. into groups, or clusters, that are each intended to be minipopulations. A simple random sample of m clusters is selected. The items chosen from a cluster can be selected using any probability sampling technique.. Binomial Probability Distribution Characteristics A distribu-. Coefficient of Determination The portion of the total variation. Bias An effect which alters a statistical result by systematically. tion that gives the probability of x successes in n trials in a process that meets the following conditions: 1. A trial has only two possible outcomes: a success or a failure. 2. There is a fixed number, n, of identical trials. 3. The trials of the experiment are independent of each other. This means that if one outcome is a success, this does not influence the chance of another outcome being a success. 4. The process must be consistent in generating successes and failures. That is, the probability, p, associated with a success remains constant from trial to trial. 5. If p represents the probability of a success, then 1p q is the probability of a failure. 900. in the dependent variable that is explained by its relationship with the independent variable. The coefficient of determination is also called R-squared and is denoted as R2. Coefficient of Partial Determination The measure of the mar-. ginal contribution of each independent variable, given that other independent variables are in the model. Coefficient of Variation The ratio of the standard deviation to. the mean expressed as a percentage. The coefficient of variation is used to measure variation relative to the mean. Complement The complement of an event E is the collection of. all possible outcomes not contained in event E..

(477) GLOSSARY Completely Randomized Design An experiment is completely. randomized if it consists of the independent random selection of observations representing each level of one factor. Composite Model The model that contains both the basic terms. and the interaction terms. Conditional Probability The probability that an event will occur. given that some other event has already happened. Confidence Interval An interval developed from sample values. |. 901. standard deviation can be calculated from a sample of size n, the degrees of freedom are equal to n k. Demographic Questions Questions relating to the respondents’. characteristics, backgrounds, and attributes. Dependent Events Two events are dependent if the occurrence of. one event impacts the probability of the other event occurring. Dependent Variable A variable whose values are thought to be. such that if all possible intervals of a given width were constructed, a percentage of these intervals, known as the confidence level, would include the true population parameter.. a function of, or dependent on, the values of another variable called the independent variable. On a scatter plot, the dependent variable is placed on the y axis and is often called the response variable.. Confidence Level The percentage of all possible confidence. Discrete Data Data that can take on a countable number of pos-. intervals that will contain the true population parameter. Consistent Estimator An unbiased estimator is said to be a con-. sistent estimator if the difference between the estimator and the parameter tends to become smaller as the sample size becomes larger. Contingency Table A table used to classify sample observations. according to two or more identifiable characteristics. It is also called a crosstabulation table. Continuous Data Data whose possible values are uncountable. and which may assume any value in an interval. Continuous Random Variables Random variables that can as-. sume an uncountably infinite number of values. Convenience Sampling A sampling technique that selects the. items from the population based on accessibility and ease of selection. Correlation Coefficient A quantitative measure of the strength of. the linear relationship between two variables. The correlation ranges from 1.0 to 1.0. A correlation of 1.0 indicates a perfect linear relationship, whereas a correlation of 0 indicates no linear relationship. Correlation Matrix A table showing the pairwise correlations. between all variables (dependent and independent). Critical Value The value corresponding to a significance level. that determines those test statistics that lead to rejecting the null hypothesis and those that lead to a decision not to reject the null hypothesis. Cross-Sectional Data A set of data values observed at a fixed. point in time. Cumulative Frequency Distribution A summary of a set of data. that displays the number of observations with values less than or equal to the upper limit of each of its classes. Cumulative Relative Frequency Distribution A summary of a. sible values. Discrete Random Variable A random variable that can only as-. sume a finite number of values or an infinite sequence of values such as 0, 1, 2. . . . Dummy Variable A variable that is assigned a value equal to ei-. ther 0 or 1, depending on whether the observation possesses a given characteristic. Empirical Rule If the data distribution is bell-shaped, then the. interval m 1s contains approximately 68% of the values m 2s contains approximately 95% of the values m 3s contains virtually all of the data values Equal-Width Classes The distance between the lowest possi-. ble value and the highest possible value in each class is equal for all classes. Event A collection of experimental outcomes. Expected Value The mean of a probability distribution. The av-. erage value when the experiment that generates values for the random variable is repeated over the long run. Experiment A process that produces a single outcome whose. result cannot be predicted with certainty. Experimental Design A plan for performing an experiment in. which the variable of interest is defined. One or more factors are identified to be manipulated, changed, or observed so that the impact (or influence) on the variable of interest can be measured or observed. Experiment-Wide Error Rate The proportion of experiments in. which at least one of the set of confidence intervals constructed does not contain the true value of the population parameter being estimated. Exponential Smoothing A time-series and forecasting tech-. set of data that displays the proportion of observations with values less than or equal to the upper limit of each of its classes.. nique that produces an exponentially weighted moving average in which each smoothing calculation or forecast is dependent on all previous observed values.. Cyclical Component A wavelike pattern within the time series. External Validity A characteristic of an experiment whose results. that repeats itself throughout the time series and has a recurrence period of more than one year.. can be generalized beyond the test environment so that the outcomes can be replicated when the experiment is repeated.. Data Array Data that have been arranged in numerical order. Degrees of Freedom The number of independent data values. available to estimate the population’s standard deviation. If k parameters must be estimated before the population’s. Factor A quantity under examination in an experiment as a pos-. sible cause of variation in the response variable. Forecasting Horizon The number of future periods covered by. a forecast. It is sometimes referred to as forecast lead time..

(478) 902. |. GLOSSARY. Forecasting Interval The frequency with which new forecasts. are prepared. Forecasting Period The unit of time for which forecasts are to. be made. Frequency Distribution A summary of a set of data that dis-. Median The median is a center value that divides a data array. ~ to denote the population median into two halves. We use m and Md to denote the sample median.. Mode The mode is the value in a data set that occurs most. frequently.. plays the number of observations in each of the distribution’s distinct categories or classes.. Model A representation of an actual system using either a phys-. Frequency Histogram A graph of a frequency distribution with. Model Diagnosis The process of determining how well a model. the horizontal axis showing the classes, the vertical axis showing the frequency count, and (for equal class widths) the rectangles having a height equal to the frequency in each class.. fits past data and how well the model’s assumptions appear to be satisfied.. Hypergeometric Distribution The hypergeometric distribution is. formed by the ratio of the number of ways an event of interest can occur over the total number of ways any event can occur. Independent Events Two events are independent if the occur-. rence of one event in no way influences the probability of the occurrence of the other event. Independent Samples Samples selected from two or more pop-. ulations in such a way that the occurrence of values in one sample has no influence on the probability of the occurrence of values in the other sample(s). Independent Variable A variable whose values are thought to. impact the values of the dependent variable. The independent variable, or explanatory variable, is often within the direct control of the decision maker. On a scatter plot, the independent variable, or explanatory variable, is graphed on the x axis. Interaction The case in which one independent variable (such. as x2) affects the relationship between another independent variable (x1) and a dependent variable (y). Internal Validity A characteristic of an experiment in which data. ical or a mathematical portrayal.. Model Fitting The process of estimating the specified model’s. parameters to achieve an adequate fit of the historical data. Model Specification The process of selecting the forecasting. technique to be used in a particular situation. Moving Average The successive averages of n consecutive val-. ues in a time series. Multicollinearity A high correlation between two independent. variables such that the two variables contribute redundant information to the model. When highly correlated independent variables are included in the regression model, they can adversely affect the regression results. Multiple Coefficient of Determination The proportion of the. total variation of the dependent variable in a multiple regression model that is explained by its relationship to the independent variables. It is, as is the case in the simple linear model, called R-squared and is denoted as R2. Mutually Exclusive Classes Classes that do not overlap so that. a data value can be placed in only one class.. are collected in such a way as to eliminate the effects of variables within the experimental environment that are not of interest to the researcher.. Mutually Exclusive Events Two events are mutually exclusive. Interquartile Range The interquartile range is a measure of vari-. Nonstatistical Sampling Techniques Those methods of select-. if the occurrence of one event precludes the occurrence of the other event.. ation that is determined by computing the difference between the third and first quartiles.. ing samples using convenience, judgment, or other nonchance processes.. Least Squares Criterion The criterion for determining a regres-. Normal Distribution The normal distribution is a bell-shaped. sion line that minimizes the sum of squared prediction errors. Left-Skewed Data A data distribution is left skewed if the mean. for the data is smaller than the median. Levels The categories, measurements, or strata of a factor of in-. terest in the current experiment. Line Chart A two-dimensional chart showing time on the hori-. zontal axis and the variable of interest on the vertical axis. Linear Trend A long-term increase or decrease in a time series. in which the rate of change is relatively constant. Margin of Error The amount that is added and subtracted to the. point estimate to determine the endpoints of the confidence interval. Also, a measure of how close we expect the point estimate to be to the population parameter with the specified level of confidence. Mean A numerical measure of the center of a set of quantitative. measures computed by dividing the sum of the values by the number of values in the data.. distribution with the following properties: 1. It is unimodal; that is, the normal distribution peaks at a single value. 2. It is symmetrical; this means that the two areas under the curve between the mean and any two points equidistant on either side of the mean are identical. One side of the distribution is the mirror image of the other side. 3. The mean, median, and mode are equal. 4. The normal approaches the horizontal axis on either side of the mean toward plus and minus infinity (). In more formal terms, the normal distribution is asymptotic to the x axis. 5. The amount of variation in the random variable determines the height and spread of the normal distribution. Null Hypothesis The statement about the population parameter. that will be assumed to be true during the conduct of the hypothesis test. The null hypothesis will be rejected only if the sample data provide substantial contradictory evidence. Ogive The graphical representation of the cumulative relative. frequency. A line is connected to points plotted above the.

(479) GLOSSARY. |. 903. upper limit of each class at a height corresponding to the cumulative relative frequency.. of the event occurring. The definition given is for a countable number of events.. One-Tailed Test A hypothesis test in which the entire rejection. p-Value The probability (assuming the null hypothesis is true) of. region is located in one tail of the sampling distribution. In a one-tailed test, the entire alpha level is located in one tail of the distribution.. obtaining a test statistic at least as extreme as the test statistic we calculated from the sample. The p-value is also known as the observed significance level.. One-Way Analysis of Variance An analysis of variance design. Qualitative Data Data whose measurement scale is inherently. in which independent samples are obtained from two or more levels of a single factor for the purpose of testing whether the levels have equal means. Open-End Questions Questions that allow respondents the. freedom to respond with any value, words, or statements of their own choosing. Paired Samples Samples that are selected in such a way that. categorical. Quantitative Data Measurements whose values are inherently. numerical. Quartiles Quartiles in a data array are those values that divide. the data set into four equal-sized groups. The median corresponds to the second quartile.. values in one sample are matched with the values in the second sample for the purpose of controlling for extraneous factors. Another term for paired samples is dependent samples.. Random Component Changes in time-series data that are un-. Parameter A measure computed from the entire population. As. Random Variable A variable that takes on different numerical. long as the population does not change, the value of the parameter will not change.. Range The range is a measure of variation that is computed by. Pareto Principle 80% of the problems come from 20% of the. causes.. predictable and cannot be associated with a trend, seasonal, or cyclical component. values based on chance. finding the difference between the maximum and minimum values in a data set.. Percentiles The pth percentile in a data array is a value that di-. Regression Hyperplane The multiple regression equivalent of. vides the data set into two parts. The lower segment contains at least p% and the upper segment contains at least (100 p)% of the data. The 50th percentile is the median.. Regression Slope Coefficient The average change in the de-. Pie Chart A graph in the shape of a circle. The circle is divided. into “slices” corresponding to the categories or classes to be displayed. The size of each slice is proportional to the magnitude of the displayed variable associated with each category or class. Pilot Sample A sample taken from the population of interest of. a size smaller than the anticipated sample size that is used to provide an estimate for the population standard deviation. Point Estimate A single statistic, determined from a sample,. the simple regression line. The plane typically has a different slope for each independent variable. pendent variable for a unit change in the independent variable. The slope coefficient may be positive or negative, depending on the relationship between the two variables. Relative Frequency The proportion of total observations that. are in a given category. Relative frequency is computed by dividing the frequency in a category by the total number of observations. The relative frequencies can be converted to percentages by multiplying by 100. Relative Frequency Assessment The method that defines. that is used to estimate the corresponding population parameter.. probability as the number of times an event occurs divided by the total number of times an experiment is performed in a large number of trials.. Population Mean The average for all values in the population. Research Hypothesis The hypothesis the decision maker at-. computed by dividing the sum of all values by the population size. Population Proportion The fraction of values in a population. that have a specific attribute. Population The set of all objects or individuals of interest or the. measurements obtained from all objects or individuals of interest. Power The probability that the hypothesis test will correctly re-. ject the null hypothesis when the null hypothesis is false. Power Curve A graph showing the probability that the hypoth-. esis test will correctly reject a false null hypothesis for a range of possible “true” values for the population parameter. Probability The chance that a particular event will occur. The. probability value will be in the range 0 to 1. A value of 0 means the event will not occur. A probability of 1 means the event will occur. Anything between 0 and 1 reflects the uncertainty. tempts to demonstrate to be true. Because this is the hypothesis deemed to be the most important to the decision maker, it will be declared true only if the sample data strongly indicates that it is true. Residual The difference between the actual value of y and the predicted value yˆ for a given level of the independent variable, x. Right-Skewed Data A data distribution is right skewed if the. mean for the data is larger than the median. Sample A subset of the population. Sample Mean The average for all values in the sample computed. by dividing the sum of all sample values by the sample size. Sample Proportion The fraction of items in a sample that have. the attribute of interest. Sample Space The collection of all outcomes that can result. from a selection, decision, or experiment..

(480) 904. |. GLOSSARY. Sampling Distribution The distribution of all possible values of. Statistical Inference Procedures Procedures that allow a de-. a statistic for a given sample size that has been randomly selected from a population.. cision maker to reach a conclusion about a set of data based on a subset of that data.. Sampling Error The difference between a measure computed. Statistical Sampling Techniques Those sampling methods that. from a sample (a statistic) and the corresponding measure computed from the population (a parameter). Scatter Diagram, or Scatter Plot A two-dimensional graph of. use selection techniques based on chance selection. Stratified Random Sampling A statistical sampling method in. plotted points in which the vertical axis represents values of one quantitative variable and the horizontal axis represents values of the other quantitative variable. Each plotted point has coordinates whose values are obtained from the respective variables.. which the population is divided into subgroups called strata so that each population item belongs to only one stratum. The objective is to form strata such that the population values of interest within each stratum are as much alike as possible. Sample items are selected from each stratum using the simple random sampling method.. Scatter Plot A two-dimensional plot showing the values for the. Structured Interview Interviews in which the questions are. joint occurrence of two quantitative variables. The scatter plot may be used to graphically represent the relationship between two variables. It is also known as a scatter diagram. Seasonal Component A wavelike pattern that is repeated. throughout a time series and has a recurrence period of at most one year. Seasonal Index A number used to quantify the effect of season-. ality in time-series data. Seasonally Unadjusted Forecast A forecast made for seasonal. data that does not include an adjustment for the seasonal component in the time series. Significance Level The maximum allowable probability of. committing a Type I statistical error. The probability is denoted by the symbol a. Simple Linear Regression The method of regression analysis in. which a single independent variable is used to predict the dependent variable. Simple Random Sample A sample selected in such a manner. that each possible sample of a given size has an equal chance of being selected. Simple Random Sampling A method of selecting items from a. population such that every possible sample of a specified size has an equal chance of being selected. Skewed Data Data sets that are not symmetric. For skewed data,. the mean will be larger or smaller than the median. Standard Deviation The standard deviation is the positive. square root of the variance. Standard Error A value that measures the spread of the sample. means around the population mean. The standard error is reduced when the sample size is increased. Standard Normal Distribution A normal distribution that has a. mean 0.0 and a standard deviation 1.0. The horizontal axis is scaled in z-values that measure the number of standard deviations a point is from the mean. Values above the mean have positive z-values. Values below the mean have negative z-values. Standardized Data Values The number of standard deviations. scripted. Student’s t-Distributions A family of distributions that is bell-. shaped and symmetric like the standard normal distribution but with greater area in the tails. Each distribution in the t-family is defined by its degrees of freedom. As the degrees of freedom increase, the t-distribution approaches the normal distribution. Subjective Probability Assessment The method that defines. probability of an event as reflecting a decision maker’s state of mind regarding the chances that the particular event will occur. Symmetric Data Data sets whose values are evenly spread. around the center. For symmetric data, the mean and median are equal. Systematic Random Sampling A statistical sampling technique. that involves selecting every kth item in the population after a randomly selected starting point between 1 and k. The value of k is determined as the ratio of the population size over the desired sample size. Tchebysheff’s Theorem Regardless of how data are distributed,. at least (11/k 2) of the values will fall within k standard deviations of the mean. For example: ⎛ ⎝. At least ⎜ 1 . 1⎞ ⎟ 0 0% of the values will fall within k 1 12 ⎠. standard deviation of the mean. ⎛. 1⎞. 3. At least ⎜ 1 2 ⎟ 75% of the values will lie within k 2 ⎝ 2 ⎠ 4 standard deviations of the mean. ⎛. 1⎞. 8. At least ⎜ 1 2 ⎟ 89% of the values will lie within k 3 ⎝ 3 ⎠ 9 standard deviations of the mean. Test Statistic A function of the sampled observations that pro-. vides a basis for testing a statistical hypothesis.. a value is from the mean. Standardized data values are sometimes referred to as z scores.. Time-Series Data A set of consecutive data values observed at. Statistic A measure computed from a sample that has been. Total Quality Management A journey to excellence in which. selected from a population. The value of the statistic will depend on which sample is selected.. everyone in the organization is focused on continuous process improvement directed toward increased customer satisfaction.. successive points in time..

(481) GLOSSARY Total Variation The aggregate dispersion of the individual data. values across the various factor levels is called the total variation in the data. Two-Tailed Test A hypothesis test in which the entire rejection. region is split into the two tails of the sampling distribution. In a two-tailed test, the alpha level is split evenly between the two tails. Type I Error Rejecting the null hypothesis when it is, in fact, true. Type II Error Failing to reject the null hypothesis when it is, in. fact, false. Unbiased Estimator A characteristic of certain statistics in. |. 905. Variance The population variance is the average of the squared. distances of the data values from the mean. Variance Inflation Factor A measure of how much the vari-. ance of an estimated regression coefficient increases if the independent variables are correlated. A VIF equal to 1.0 for a given independent variable indicates that this independent variable is not correlated with the remaining independent variables in the model. The greater the multicollinearity, the larger the VIF. Variation A set of data exhibits variation if all the data are not the. same value.. which the average of all possible values of the sample statistic equals a parameter, no matter the value of the parameter.. Weighted mean The mean value of data values that have been. Unstructured Interview Interviews that begin with one or more. Within-Sample Variation The dispersion that exists among the. broadly stated questions, with further questions being based on the responses.. data values within a particular factor level is called the within-sample variation.. weighted according to their relative importance..

(482) Index A Addition rule Individual outcomes, 160–162 Mutually exclusive events, 167 Two events, 163–167 Adjusted R-square Equation, 644, 701 Aggregate price index Defined, 715 Unweighted, 716, 763 All-inclusive classes, 39 Alpha, controlling, 378 Alternative hypothesis Defined, 347 Analysis of variance Assumptions, 477, 478–481, 498, 512 Between-sample variation, 477 Experiment-wide error rate, 488 Fisher’s Least Significant Difference test, 505–506, 522 Fixed effects, 493 Hartley’s F-test statistic, 480, 485, 522 Kruskal-Wallis one-way, 789–794 One-way ANOVA, 476–493 One-way ANOVA table, 483 Random effects, 493 Randomized block ANOVA, 497–505 Total variation, 477 Tukey-Kramer, 488–493, 522 Two-factor ANOVA, 509–517 Within-sample variation, 477 Arithmetic mean, 4, 97 Autocorrelation Defined, 728 Durbin-Watson statistic, 728–729, 763 Average, 4. See also Mean Equation, 4 Moving average, 739, 740 Ratio-to-moving-average, 763 Sample equation, 265 Average subgroup range, 812, 832. B Backward elimination stepwise, 679–683 Balanced design, 476, 480 Defined, 476 Bar Chart, 3 Cluster, 58 Column, 55 Defined, 57 Excel examples, 59 Horizontal, 55–56 Minitab example, 59 Pie Chart versus, 61 Summary steps, 56 Base period index Defined, 714 Simple index number, 714, 763 Bayes’ Theorem, 175–179 Equation, 176 Best subsets regression, 683–686 Beta Calculating, 376–377, 379–382 Controlling, 378–382 Power, 382 Proportion, 380–381 Summary steps, 378 Two-tailed test, 379–380 Between-sample variation Defined, 477. 906. Bias Interviewer, 12 Nonresponse, 12 Observer, 12–13 Selection, 12 Binomial distribution Characteristics, 199–200 Defined, 199 Excel example, 206 Formula, 202 Mean, 205–207 Minitab example, 207 Shapes, 208 Standard deviation, 207–208 Table, 204–205 Binomial formula, 202–203 Bivariate normal distribution, 585 Box and whisker plots ANOVA assumptions, 479 Defined, 100 Summary steps, 100 Brainstorming, 807 Business statistics Defined, 2. C c-charts, 824–827, 833 Control limits, 825 Excel example, 826 Minitab example, 826 Standard deviation, 825, 833 Census Defined, 15 Centered moving average, 740 Central limit theorem, 282–286 Examples, 284–285 Theorem 4, 283 Central tendency, applying measures, 94–97 Charts, 3 Bar chart, 54–60, 61 Box and whisker, 100–101 Histogram, 41–46 Line, 66–69 Pie, 60–61 Scatter diagram, 70–72, 581, 639–640 Scatter plot, 580 Stem and leaf diagrams, 62–63 Chi-square Assumptions, 450 Confidence interval, 454–455 Contingency analysis, 564 Contingency test statistic, 564, 574 Degrees of freedom, 450, 550, 554, 564 Goodness-of-fit, 548–559 Goodness-of-fit-test statistic, 550, 574 Sample size, 550 Single variance, 449–455, 471 Summary steps, 452 Test for single population variance, 450, 452 Test limitations, 569 Class boundaries, 39 Classes All-inclusive, 39 Boundaries, 39 Equal-width, 39 Mutually exclusive, 39 Classical probability assessment Defined, 152 Equation, 152. Class width Equation, 39 Closed-end questions, 9 Cluster sampling, 18–19 Primary clusters, 19 Coefficient of determination Adjusted R-square, 644, 701 Defined, 602 Equation, 602, 625 Hypothesis test, 603, 643–644 Multiple regression, 642–643 Single independent variable case, 602 Test statistic, 603 Coefficient of partial determination, 678 Coefficient of variation Defined, 119 Population equation, 119 Sample equation, 119 Combinations Counting rule equation, 201 Complement Defined, 162 Rule, 162–163 Completely randomized design Defined, 476 Composite polynomial model Defined, 669 Excel example, 669–670 Minitab example, 670–671 Conditional probability Bayes’ theorem, 175–179 Defined, 167 Independent events, 171–172 Rule for independent events, 171 Tree diagrams, 170–171 Two events, 168 Confidence interval Average y, given x, 616 Critical value, 309, 340 Defined, 306 Difference between means, 398 Estimate, 399, 441, 471 Excel example, 307, 318 Flow diagram, 340 General format, 309, 340, 398, 441 Impact of sample size, 314 Larger sample sizes, 320 Margin of error, 311–312 Minitab example, 307, 318 Paired samples, 423–427 Population mean, 308–314 Population mean estimate, 317 Proportion, 331–333 Regression slope, 614, 625, 649–651, 701 Sample size requirements, 324–327 Standard error of mean, 308 Summary steps, 310, 319 t-distribution, 314–320, 400–406 two proportions, 432–433 Unequal variances, 404 Variance, 455 Consistent estimator Defined, 280 Consumer price index, 719–720 Contingency analysis, 562–569 Chi-square test statistic, 564, 574 Contingency table, 562–569 Excel example, 568 Expected cell frequencies, 567, 574.

(483) INDEX Marginal frequencies, 563 Minitab, 568 r c contingency analysis, 566–569 2 2 contingency analysis, 564–566 Contingency table Defined, 563 Continuous data, 36 Continuous probability distributions Exponential distribution, 252–254 Normal distribution, 234–245 Uniform distribution, 249–252 Continuous random variables Defined, 192 Convenience sampling, 15 Correlation coefficient Assumptions, 585 Cause-and-effect, 586 Defined, 580, 638 Equation, 580–581, 625, 638, 701 Excel example, 582–583, 638 Hypothesis test, 584 Minitab example, 582–583, 639 Test statistic, 584, 625 Correlation matrix, 638 Counting rule Combinations, 201–202 Critical value Calculating, 352–353 Commonly used values, 309, 340 Confidence interval estimate, 310 Defined, 352 Hypothesis testing, 351–352 Crosby, Philip B., 805–806 Cumulative frequency distribution Defined, 40 Relative frequency, 40 Cyclical component Defined, 713. D Data Categorizing, 23–24 Classification, 27 Discrete, 33 Hierarchy, 21–23 Interval, 22 Measurement levels, 21–24 Nominal, 21–22 Ordinal, 22 Qualitative, 21 Quantitative, 21 Ratio, 22–23 Skewed, 92–93 Symmetric, 92–93 Time-series, 21 Data array, 37, 98 Data collection methods, 7, 27 Array, 37 Bar codes, 11–12 Direct observation, 7, 11 Experiments, 7, 8 Issues, 12–13 Personal interview, 11 Telephone surveys, 7–9 Written questionnaires, 7, 9–11 Data frequency distribution. See Grouped data frequency distribution Decision rule Hypothesis testing, 351–352, 354–357 Deflating time-series data, 721, 763 Formula, 721, 763 Degrees of freedom Chi-square, 450 One sample, 315 Student’s t-distribution, 314–315 Unequal means, 441 Unequal variances, 419, 442. Deming, W. Edwards Cycle, 806 Fourteen points, 805 Variation, 810 Demographic Questions, 9 Dependent events Defined, 150 Dependent variable, 70, 634 Descriptive statistical techniques Data-level issues, 102–103 Deseasonalization Equation, 742, 763 Excel examples, 743 Direct Observation, 7, 11 Discrete probability distributions, 192–195 Binomial distribution, 199–208 Hypergeometric distribution, 217, 219–223 Poisson distribution, 213–217 Discrete random variable Defined, 192 Displaying graphically, 192–193 Expected value equation, 194 Mean, 193–194 Standard deviation, 194–195 Dummy variables, 654–657 Defined, 654 Excel example, 746 Seasonality, 744–746 Durbin-Watson statistic Equation, 729, 763 Test for autocorrelation, 730–732. E Empirical rule Defined, 120 Empty classes, 39 Equal-width classes, 39 Error. See also Standard error Experimental-wide error rate, 488 Forecast, 727, 763 Margin of error, 311–312, 340 Mean absolute percent error, 758, 763 Measurement, 13 Standard error of mean, 308 Sum of squares, 522 Type I, 350 Type II, 350, 376–383 Estimate Confidence interval, 306, 398 Difference between means, 398 Paired difference, 423–427 Point, 306 Testing flow diagram, 441 Estimation, 5, 306 Sample size for population, 326–327 Event Defined, 149–150 Dependent, 150–151 Independent, 150–151 Mutually exclusive, 150 Expected cell frequencies Equation, 567, 574 Expected value Binomial distribution, 205–207 Defined, 193 Equation, 194 Experimental design, 7 Experimental-wide error rate Defined, 488 Experiments, 7, 8 Exponential probability distribution Density function, 252 Excel example, 253–254 Minitab example, 254 Probability, 253. Exponential smoothing Defined, 750 Double smoothing, 755–758, 763 Equation, 763 Excel examples, 751, 753–754, 756–757 Minitab examples, 754, 758 Single smoothing, 750–755 Smoothing constant, 750 External validity, 13. F Factor Defined, 476 Finite population correction factor, 280 Fishbone diagram, 807 Fisher’s Least Significant Difference test, 505–506, 522 Fixed effects, 493 Flowcharts, 807 Forecast bias Equation, 733, 763 Forecasting Autocorrelation, 728–732 Bias, 733, 763 Cyclical component, 713 Dummy variables, 744–746 Durbin-Watson statistic, 728–730, 763 Error, 727, 763 Excel example, 734–737 Exponential smoothing, 750–758 Horizon, 710 Interval, 710 Linear trend, 711, 725, 763 Mean absolute deviation, 727, 763 Mean absolute percent error, 758, 763 Mean squared error, 727, 763 Minitab example, 735–736 Model diagnosis, 710 Model fitting, 710 Model specification, 710 Nonlinear trend, 734–738 Period, 710 Random component, 713 Residual, 727–728 Seasonal adjustment, 738–746 Seasonal component, 712–713 Seasonally unadjusted, 743 Trend-base technique, 724–746 True forecasts, 732–734 Forward selection stepwise, 678 Forward stepwise regression, 683 Frequency distribution, 32–41 Classes, 39 Data array, 37 Defined, 33 Discrete data, 33 Grouped data, 36–41 Joint, 47–50 Qualitative, 36 Quantitative, 35 Relative, 33–35 Tables, 32–34 Frequency histogram Defined, 41 Issues with Excel, 44 Relative frequency, 45–46 Summary steps, 44 F-test Assumptions, 459 Coefficient of determination, 603 Excel example, 464–465 Minitab, 464–465 Multiple regression, 643, 701 Test statistic, 459 Two variances, 458–467. 907.

(484) 908. INDEX. G Goodness-of-fit tests, 548–559 Chi-square test, 548–559 Chi-square test statistic, 550, 574 Degrees of freedom, 550, 554 Excel example, 553–554 Minitab, 554–555 Sample size, 550 Grouped data frequency distribution All-inclusive classes, 39 Class boundaries, 39 Classes, 39 Class width, 39 Continuous data, 36 Cumulative frequency, 40 Data array, 37 Empty classes, 39 Equal-width classes, 39 Excel example, 37 Minitab example, 37 Mutually exclusive classes, 39 Number of classes, 39 Steps, 39–40. H Hartley’s F-test statistic, 480, 485, 522 Histogram, 3, 41–46 Examples, 42 Excel example, 43 Issues with Excel, 44 Minitab example, 44 Quality, 807 Relative frequency, 45–46 Summary steps, 44 Types of information, 41–42 Hypergeometric distribution Multiple possible outcomes, 222–223 Two possible outcomes, 220–221 Hypothesis Alternative, 347 ANOVA, 477 Null, 347 Research, 348 Summary steps, 349 Hypothesis testing, 5, 347–372 Alternative hypothesis, 347 Calculating beta, 376–377 Chi-square test, 449–455 Controlling alpha and beta, 378–382 Correlation coefficient, 584, 638–639 Critical value, 351–353 Decision rule, 351–352, 354–357 Difference between two population proportions, 433–437 Excel example, 364, 415–416, 435 Flow diagram, 441 F-test, 458–467 Median, 771–785 Minitab, 365, 416–417, 436 Multiple regression analysis, 643–644 Nonparametric tests, 770–803 Null hypothesis, 347 One-tailed test, 358 Paired samples, 427–428 Population mean, 347–365 Population proportion, 368–372 Power, 382–383 Power curve, 382 Procedures, deciding among, 388 p-value, 357–358, 412 Significance level, 352 Simple regression coefficient, 606 Single population variance, 452 Summary steps, 355, 362, 370, 410, 452 t-test statistic, 361–362, 412, 427 Two means, 409–419 Two-tailed test, 358. Two variances, 458–467 Type I error, 350 Type II error, 350, 376–383 Types of tests, 358–359 z-test statistic, 354. I Imai, Masaaki, 806 Independent events Conditional probability rule, 171 Defined, 150 Independent samples, 398 Defined, 459 Independent variable, 70, 634 Index numbers, 714–721, 763 Aggregate price index, 715–717 Base period index, 714 Consumer price, 719–720 Deflating time-series data, 721, 763 Laspeyres, 718–719, 763 Paasche, 717–718, 763 Producer price, 720 Simple index number, 714, 763 Stock market, 720–721 Unweighted aggregate price, 716, 763 Inferences, 2 Interaction Cautions, 517 Defined, 669 Explained, 512, 514–517 Partial F-test, 671–674, 701 Polynomial regression model, 667–671 Internal validity, 13 Interquartile Range Defined, 108 Equation, 108 Interval data, 22 Interviewer bias, 12 Interviews Structured, 11 Unstructured, 11 Ishikawa, Kauro, 806, 807. J Joint frequency distribution, 47–50 Excel example, 48–49 Minitab example, 49–50 Judgment sampling, 15 Juran, Joseph, 805 Ten steps, 806. K kaizen, 806 Kruskal-Wallis one-way ANOVA Assumptions, 790 Correction, 799 Correction for ties, 794, 799 Excel example, 792 H-statistic, 791, 794, 799 H-statistic corrected for tied rankings, 799 Hypotheses, 790 Limitations, 793–794 Minitab example, 793 Steps, 790–793. L Laspeyres index, 718–719, 763 Equation, 718, 763 Least squares criterion Defined, 592 Equations, 594, 725, 763 Linear trend Defined, 711 Model, 725, 763. Line charts, 66–69 Excel examples, 67–69 Minitab examples, 67–69 Summary steps, 68 Location measures Percentiles, 98–99 Quartiles, 99–100. M MAD, 727–728 Mann-Whitney U-test Assumptions, 777 Critical value, 779 Equations, 798 Hypotheses, 777 Large sample test, 780–785 Minitab example, 780 Steps, 777–779 Test statistic, 785, 798 U-statistics, 778, 780–782, 798 MAPE, 758, 763 Margin of error Defined, 311 Equation, 312, 340 Proportions, 340 Mean Advantages and disadvantages, 103 Binomial distribution, 205–207 c-charts, 824–827, 833 Defined, 86 Discrete random variable, 193–194 Excel example, 89, 95–96 Expected value, 194 Extreme values, 90–91 Hypothesis test, 347–365 Minitab example, 89–90 Poisson distribution, 217 Population, determining required sample size, 325–326 Population equation, 86, 265 Sample equation, 90, 265, 266 Sampling distribution of a proportion, 292 Summary steps, 87 Uniform distribution, 251 U-statistics, 778, 798 Weighted, 97–98 Wilcoxon, 784 Mean absolute deviation Equation, 727, 763 Mean absolute percent error Equation, 758, 763 Mean squared error Equation, 727, 763 Mean subgroup proportion, 822, 832 Measurement error, 13 Median Advantages and disadvantages, 103 Data array, 91 Defined, 91 Excel example, 95–96 Hypothesis test, 771–785 Index point, 91 Issues with Excel, 96–97 Mode Advantages and disadvantages, 103 Defined, 93 Model Building, 636, 637 Diagnosis, 637, 643, 710 Model fitting, 710 Specification, 636, 637, 710 Model building concepts, 636 Summary steps, 637 Moving average Centered, 740 Defined, 739.

(485) INDEX Multicollinearity Defined, 647 Face validity, 647 Variance inflation factor, 648–649, 701 Multiple coefficient of determination, 642–643, 701 Equation, 642 Multiple regression analysis, 634–686 Aptness of the model, 689–697 Assumptions, 634, 689 Coefficient of determination, 642–643, 701 Correlation coefficient, 638–639 Dependent variable, 634 Diagnosis, 637, 643 Dummy variables, 654–657 Estimated model, 634, 701 Excel example, 640–641 Hyperplane, 635 Independent variable, 634 Interval estimate for slope, 649–651 Minitab example, 642 Model building, 636–651 Multicollinearity, 647–649 Nonlinear relationships, 661–667 Partial F-test, 671–674, 701 Polynomial, 662–663, 701 Population model, 634, 701 Scatter plots, 639–640 Significance test, 643–644 Standard error of the estimate, 646, 701 Stepwise regression, 678–686 Summary steps, 637 Multiplication probability rule, 172 Independent events, 174–175 Tree diagram, 173–174 Two events, 172–173 Multiplicative time-series model Equation, 739, 763 Seasonal indexes, 738–746 Summary steps, 744 Mutually exclusive classes, 39 Mutually exclusive events Defined, 150. N Nominal data, 21–22 Nonlinear trend, 712, 734–738 Nonparametric statistics, 770–803 Kruskal-Wallis one-way ANOVA, 789–794 Mann-Whitney U test, 776–785, 798 Wilcoxon matched pairs test, 782–785, 798 Wilcoxon signed rank test, 771–774, 798 Nonresponse bias, 12 Nonstatistical sampling, 15 Convenience sampling, 15 Judgment sampling, 15 Ratio sampling, 15 Normal distribution, 234–245 Approximate areas under normal curve, 245 Defined, 234 Empirical rule, 245 Excel example, 242–243 Function, 235 Minitab example, 242, 244 Standard normal, 235–245 Standard normal table, 237–242 Steps, 237 Summary steps, 237 Null hypothesis Claim, 348–349 Defined, 347 Research hypothesis, 348 Status quo, 347–348 Numerical statistical measures Summary, 129. O Observer bias, 12–13 Ogive, 45–46 One-tailed hypothesis test Defined, 358 One-way ANOVA Assumptions, 477 Balanced design, 476 Between-sample variation, 477 Completely randomized design, 476 Defined, 476 Excel example, 486–487, 492 Experimental-wide error rate, 488 Factor, 476 Fixed effects, 493 Hartley’s F-test statistic, 480, 485, 522 Levels, 476 Logic, 476 Minitab, 486–487, 492 Partitioning sums of squares, 477–478 Random effects, 493 Sum of squares between, 482, 522 Sum of squares within, 482, 522 Table, 483 Total sum of squares, 481, 522 Total variation, 477 Tukey-Kramer, 488–493, 522 Within-sample variation, 477 Open-end questions, 10–11 Ordinal data, 22. P Paasche index Equation, 717, 763 Paired sample Confidence interval estimation, 425 Defined, 423 Equation, 424 Point estimate, 426 Population mean, 426 Standard deviation, 425 Why use, 423–424 Parameters, 15 Defined, 86, 266 Unbiased estimator, 276 Pareto principle, 805 Partial F-test, 671–674, 701 Statistic formula, 672, 701 p-charts, 820–823 Control limits, 823, 833 Pearson product moment correlation, 581 Percentiles Defined, 98 Location index, 98 Summary steps, 99 Personal interviews, 11 Pie chart bar chart versus, 61 Defined, 60 Summary steps, 60 Pilot sample Defined, 326 Proportions, 340 Point estimate Defined, 306 Paired difference, 424 Poisson distribution, 213–217 Assumptions, 213 Equation, 214 Excel example, 218 Mean, 217 Minitab example, 218 Standard deviation, 217 Summary steps, 216 Table, 214–217 Polynomial regression model Composite model, 669 Equation, 662, 701. 909. Excel example, 664, 666, 669–670 Interaction, 667–671 Minitab example, 665, 666, 670–671 Second order model, 662–663, 666 Third order model, 663 Population Defined, 14 Mean, 86–89, 265 Proportion, 289–290 Population model, multiple regression analysis, 634, 701 Power of the test Curve, 382 Defined, 382 Equation, 382, 388 Prediction interval for y given x, 616–618, 625 Probability Addition rule, 159–162 Classical assessment, 152–153 Conditional, 167 Defined, 147 Experiment, 147 Methods of assigning, 152–156 Relative frequency assessment, 153–155 Rules, 159–179 Rules summary and equations, 186 Sample space, 147 Subjective assessment, 155–156 Probability sampling, 16 Producer price index, 720 Proportions Confidence interval, 340 Estimation, 333–335 Hypothesis tests, 368–372 Pooled estimator, 434, 442 Population, 289–290 Sample proportion, 330 Sampling distribution, 289–294 Sampling error, 290 Standard error, 331 z-test statistic equation, 388 p-value, 357–358. Q Qualitative Data Defined, 21 Dummy variables, 654–657 Frequency distribution, 36 Qualitative forecasting, 710 Quality Basic tools, 806–807 Brainstorming, 807 Control charts, 807 Deming, 805 Fishbone diagram, 807 Flowcharts, 807 Juran, 805, 806 Scatter plots, 807 SPC, 807, 808–827 Total quality management, 805 Trend charts, 807 Quantitative Data Defined, 21 Frequency distribution, 35 Quantitative forecasting, 710 Quartiles Defined, 99 Issues with Excel, 100 Questions Closed-end, 9 Demographic, 9 Leading, 10 Open-end, 10–11 Poorly worded, 10–11. R Random component Defined, 713.

(486) 910. INDEX. Randomized complete block ANOVA, 497–505 Assumptions, 498 Excel example, 499–500, 501 Fisher’s Least Significant Difference test, 505–506, 522 Minitab, 499–500, 501 Partitioning sums of squares, 499, 522 Sum of squares blocking, 499, 522 Sum of squares within, 499, 522 Table, 500 Type II error, 502 Random sample, 16–17 Excel example, 17 Random variable Continuous, 192 Defined, 192 Discrete, 192 Range Defined, 107 Equation, 107 Interquartile, 109 Ratio data, 22–23 Ratio sampling, 15 Ratio-to-moving-average method, 739–740 Equation, 741, 763 Regression analysis Aptness, 689–697 Assumptions, 590 Coefficient of determination, 602, 625, 642–643, 701 Confidence interval estimate, 614, 626, 649–651, 701 Descriptive purposes, 612–615 Dummy variables, 654–657, 671 Equations, 625 Excel examples, 595–598, 599, 600, 613, 656, 658 Exponential relationship, 661 Hyperplane, 635 Least squares criterion, 592 Least squares equations, 594, 625 Least squares regression properties, 596–599 Minitab examples, 598, 599, 601, 613, 657, 658 Multicollinearity, 647–649 Multiple regression, 634–686 Nonlinear relationships, 661–667 Partial F-test, 671–674, 701 Polynomial, 662–663 Prediction, 615–618, 625 Problems using, 618–620 Residual, 592, 597, 625, 727 R-squared, 602, 625, 642–643 Sample model, 592 Significance tests, 599–609 Simple linear model, 590, 625 Slope coefficients, 591 Standard error, 646, 701 Stepwise, 678–686 Summary steps, 608 Sum of squares error, 593 Sum of squares regression, 602 Test statistic for the slope, 607 Total sum of squares, 600, 625 Regression slope coefficient Defined, 591 Excel example, 605 Intercept, 591 Interval estimate, 614, 626, 649–651 Minitab example, 606 Significance, 604–605, 645–646, 701 Slope, 591 Standard error, 604 Relative frequency, 33–35 Distributions, 36–41 Equation, 34 Histogram, 45–46. Relative frequency assessment Defined, 153 Equation, 153 Issues, 155 Research hypothesis Defined, 348 Residual Assumptions, 689 Checking for linearity, 690 Corrective actions, 697 Defined, 592, 689 Equal variances, 692–693 Equation, 689, 701 Excel examples, 690–691 Forecasting error, 727 Independence, 693 Minitab examples, 690–691, 694–696 Normality, 693, 695 Plots, 691–694 Standardized residual, 695–697, 701 Sum of squared residuals, 597, 625 Review Sections Chapters 1–3, 139–142 Chapters 8–12, 530–543. S Sample Defined, 14 Mean, 89–90, 265, 266 Proportion, 290 Size, 324–327 Sample size requirements Equation, 325 Estimating sample mean, 324–327 Estimating sample proportion, 340 Pilot sample, 326–327 Sample space, 147–148 Tree Diagrams, 148–149 Sampling distribution of a proportion Mean, 292 Standard error, 292 Summary steps, 294 Theorem 5, 292 Sampling distribution of the mean, 273–282 Central limit theorem, 282–286 Defined, 273 Excel example, 274–275 Minitab example, 275 Normal populations, 277–280 Proportions, 289–294 Steps, 285 Theorem 1, 276 Theorem 2, 276 Theorem 3, 278–279 Sampling error Computing, 267 Defined, 265, 306 Equation, 265 Role of sample size, 268–269 Sampling techniques, 15–19, 27 Nonstatistical, 15 Statistical, 16 Scatter diagram/plot Defined, 70, 580 Dependent variable, 70, 580 Examples, 580, 581, 640 Excel example, 71 Independent variables, 70, 580 Minitab example, 71 Multiple regression, 640 Quality, 807 Summary steps, 71 Seasonal component Defined, 712 Seasonal index Adjustment process steps, 744 Computing, 739–740. Defined, 739 Deseasonalization, 743, 763 Dummy variables, 744–746 Excel example, 740–742 Minitab example, 744 Multiplicative model, 739 Normalize, 741–742 Ratio-to-moving-average, 739–740 Selection bias, 12 Significance level Defined, 352 Significance tests, 599–609 Simple index number Equation, 714, 716 Simple linear regression Assumptions, 590 Defined, 590 Equations, 625 Least squares criterion, 592 Summary steps, 608 Simple random sample, 16–17 Defined, 266 Skewed data Defined, 92 Left-skewed, 93 Right-skewed, 93 Standard deviation, 112–115 Binomial distribution, 207–208 c-charts, 825, 833 Defined, 109 Discrete random variable, 194–195 Excel example, 114–115, 121 Minitab example, 114–115, 121 Poisson distribution, 217 Population standard deviation equation, 111 Population variance equation, 110 Regression model, 646–647 Sample equation, 112 Summary steps, 111 Uniform distribution, 251 U-statistics, 778, 798 Wilcoxon, 784 Standard error Defined, 308 Difference between two means, 398, 441 Proportion, 331 Sampling distribution of a proportion, 292 Statistical process control, 822, 833 Standard error of regression slope Equation, 604, 605 Graphed, 606 Standard error of the estimate Equation, 604 Multiple regression equation, 646 Standardized data values, 122 Population equation, 122 Sample equation, 122 Summary steps, 123 Standardized residuals Equation, 695, 701 Standard normal distribution, 235–245 Table, 237–239 States of nature, 350 Statistical inference procedures, 5 Statistical inference tools Nonstatistical sampling, 15 Statistical sampling techniques, 16 Statistical process control Average subgroup means, 812, 832 Average subgroup range, 812, 832 c-charts, 824–827 Control limits, 810, 814–815, 818, 823, 825, 827, 832 Excel examples, 812, 820–821 Mean subgroup proportion, 822, 832 Minitab example, 812, 820–821 p-charts, 820–823 R-charts, 811–820.

(487) INDEX Signals, 818 Stability, 810 Standard deviation, 825 Standard error, 822, 833 Summary steps, 827 Variation, 807, 808–810 x charts, 811–820 Statistical sampling Cluster sampling, 18–19 Simple random sampling, 16–17 Stratified random sampling, 17–18 Systematic random sampling, 18 Statistics, 15 Defined, 86 Stem and leaf diagrams, 62–63 Summary steps, 62 Stepwise regression, 678–686 Backward elimination, 679–683 Best subsets, 683–686 Forward selection, 678 Standard, 683 Stratified random sampling, 17–18 Subjective probability assessment Defined, 155 Sum of squares between Equation, 478, 482, 522 Sum of squares blocking Equation, 499, 522 Sum of squares error Equation, 596, 601, 625 Interaction, 672–674 Sum of squares regression Equation, 602 Sum of squares within Equation, 482, 499, 522 Symmetric Data, 92–93 Systematic random sampling, 18. T Tchebysheff’s Theorem, 121–122 t-distribution assumptions, 315, 412 defined, 314 degrees of freedom, 314–315 equation, 315 table, 316 two means, 400–406 unequal variances, 404 Telephone surveys, 7–9 Test statistic Correlation coefficient, 584, 625 Defined, 354 R-squared, 602 t-test, 361–362, 412 z-test, 354, 409–410 Time-series data Components, 711–714 Defined, 21 Deseasonalization, 743, 763 Index numbers, 714–721. Laspeyres index, 718–719, 763 Linear trend, 711 Nonlinear trend, 712 Paasche index, 717–718, 763 Random component, 713 Seasonal component, 712–713 Trend, 711–712 Total quality management Defined, 805 Total sum of squares Equation, 481, 600, 625 Total variation Defined, 477 Tree diagrams, 148–149, 170–171 Trend Defined, 711 Excel example, 724, 726, 727–728 Forecasting technique, 724–746 Linear, 711, 725, 763 Minitab example, 724, 726 Nonlinear, 712 Quality chart, 807 t-test statistic assumption, 361 correlation coefficient, 584, 638 equation, 361, 388, 412, 427, 442 paired samples, 427, 442 Population variances unknown and not assumed equal, 419, 442 Regression coefficient significance, 645, 701 Tukey-Kramer multiple comparisons, 488–493 Critical range equation, 522 Equation, 488 Two-factor ANOVA, 510–517 Assumptions, 512 Equations, 513, 522 Excel example, 514–516 Interaction, 512, 514–517 Minitab, 516 Partitioning sum of squares, 510–511, 522 Replications, 509–517 Two-tailed hypothesis test Defined, 358 p-value, 359–361 summary steps, 362 Type I error Defined, 350 Type II error Calculating beta, 376–377 Defined, 350. U Unbiased estimator, 276 Uniform probability distribution Density function, 250 Mean, 251 Standard deviation, 251 Unweighted aggregate price index Equation, 716, 763. V Validity External, 13 Internal, 13 Variable Dependent, 70 Independent, 70 Variance Defined, 109 F-test statistic, 459 Population variance equation, 110 Sample equation, 112, 461, 471 Sample shortcut equation, 112 Shortcut equation, 110 Summary steps, 111 Variance inflation factor Equation, 648, 701 Excel example, 648–649 Minitab example, 648–649 Variation, 107, 808 Components, 810 Sources, 808–810. W Weighted aggregate price index, 717–719 Laspeyres index, 718–719, 763 Paasche index, 717–718, 763 Weighted Mean Defined, 97 Population equation, 97 Sample equation, 97 Wilcoxon matched-pairs signed rank, 782–785 Assumptions, 782 Large sample test, 784–785 Test statistic, 785, 798 Ties, 784 Wilcoxon signed rank test, 771–774 Equation, 798 Hypotheses, 771 Large sample test statistic, 772 Minitab example, 773 Steps, 771–772 Within-sample variation Defined, 477 Written questionnaires, 7, 9–11 Steps, 9. Z z-scores Finite population correction, 280 Sampling distribution of mean, 280 Sampling distribution of p, 293 Standardized, 236 Standard normal distribution, 237, 245 z-test statistic Defined, 354 Equation, proportion, 370, 388 Equation, sigma known, 354, 388 Equation, two means, 409, 441 Equation, two proportions, 434, 442. 911.

(488) Values of t for Selected Probabilities. df = 10. 0.05. 0.05. t = ⫺1.8125. 0. t = 1.8125. t. Probabilites (Or Areas Under t-Distribution Curve) Conf. Level One Tail Two Tails. 0.1 0.45 0.9. 0.3 0.35 0.7. 0.5 0.25 0.5. 0.7 0.15 0.3. d. f.. 0.8 0.1 0.2. 0.9 0.05 0.1. 0.95 0.025 0.05. 0.98 0.01 0.02. 0.99 0.005 0.01. Values of t. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 70 80 90 100 250 500. 0.1584 0.1421 0.1366 0.1338 0.1322 0.1311 0.1303 0.1297 0.1293 0.1289 0.1286 0.1283 0.1281 0.1280 0.1278 0.1277 0.1276 0.1274 0.1274 0.1273 0.1272 0.1271 0.1271 0.1270 0.1269 0.1269 0.1268 0.1268 0.1268 0.1267 0.1265 0.1263 0.1262 0.1261 0.1261 0.1260 0.1260 0.1258 0.1257. 0.5095 0.4447 0.4242 0.4142 0.4082 0.4043 0.4015 0.3995 0.3979 0.3966 0.3956 0.3947 0.3940 0.3933 0.3928 0.3923 0.3919 0.3915 0.3912 0.3909 0.3906 0.3904 0.3902 0.3900 0.3898 0.3896 0.3894 0.3893 0.3892 0.3890 0.3881 0.3875 0.3872 0.3869 0.3867 0.3866 0.3864 0.3858 0.3855. 1.0000 0.8165 0.7649 0.7407 0.7267 0.7176 0.7111 0.7064 0.7027 0.6998 0.6974 0.6955 0.6938 0.6924 0.6912 0.6901 0.6892 0.6884 0.6876 0.6870 0.6864 0.6858 0.6853 0.6848 0.6844 0.6840 0.6837 0.6834 0.6830 0.6828 0.6807 0.6794 0.6786 0.6780 0.6776 0.6772 0.6770 0.6755 0.6750. 1.9626 1.3862 1.2498 1.1896 1.1558 1.1342 1.1192 1.1081 1.0997 1.0931 1.0877 1.0832 1.0795 1.0763 1.0735 1.0711 1.0690 1.0672 1.0655 1.0640 1.0627 1.0614 1.0603 1.0593 1.0584 1.0575 1.0567 1.0560 1.0553 1.0547 1.0500 1.0473 1.0455 1.0442 1.0432 1.0424 1.0418 1.0386 1.0375. 3.0777 1.8856 1.6377 1.5332 1.4759 1.4398 1.4149 1.3968 1.3830 1.3722 1.3634 1.3562 1.3502 1.3450 1.3406 1.3368 1.3334 1.3304 1.3277 1.3253 1.3232 1.3212 1.3195 1.3178 1.3163 1.3150 1.3137 1.3125 1.3114 1.3104 1.3031 1.2987 1.2958 1.2938 1.2922 1.2910 1.2901 1.2849 1.2832. 6.3137 2.9200 2.3534 2.1318 2.0150 1.9432 1.8946 1.8595 1.8331 1.8125 1.7959 1.7823 1.7709 1.7613 1.7531 1.7459 1.7396 1.7341 1.7291 1.7247 1.7207 1.7171 1.7139 1.7109 1.7081 1.7056 1.7033 1.7011 1.6991 1.6973 1.6839 1.6759 1.6706 1.6669 1.6641 1.6620 1.6602 1.6510 1.6479. 12.7062 4.3027 3.1824 2.7765 2.5706 2.4469 2.3646 2.3060 2.2622 2.2281 2.2010 2.1788 2.1604 2.1448 2.1315 2.1199 2.1098 2.1009 2.0930 2.0860 2.0796 2.0739 2.0687 2.0639 2.0595 2.0555 2.0518 2.0484 2.0452 2.0423 2.0211 2.0086 2.0003 1.9944 1.9901 1.9867 1.9840 1.9695 1.9647. 31.8210 6.9645 4.5407 3.7469 3.3649 3.1427 2.9979 2.8965 2.8214 2.7638 2.7181 2.6810 2.6503 2.6245 2.6025 2.5835 2.5669 2.5524 2.5395 2.5280 2.5176 2.5083 2.4999 2.4922 2.4851 2.4786 2.4727 2.4671 2.4620 2.4573 2.4233 2.4033 2.3901 2.3808 2.3739 2.3685 2.3642 2.3414 2.3338. 63.6559 9.9250 5.8408 4.6041 4.0321 3.7074 3.4995 3.3554 3.2498 3.1693 3.1058 3.0545 3.0123 2.9768 2.9467 2.9208 2.8982 2.8784 2.8609 2.8453 2.8314 2.8188 2.8073 2.7970 2.7874 2.7787 2.7707 2.7633 2.7564 2.7500 2.7045 2.6778 2.6603 2.6479 2.6387 2.6316 2.6259 2.5956 2.5857. ∞. 0.1257. 0.3853. 0.6745. 1.0364. 1.2816. 1.6449. 1.9600. 2.3263. 2.5758.

(489) Standard Normal Distribution Table. 0.3944 0.45 0. z. z = 1.25. z. 0. 0.01. 0.02. 0.03. 0.04. 0.05. 0.06. 0.07. 0.08. 0.09. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0. 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.4987. 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 0.4826 0.4864 0.4896 0.4920 0.4940 0.4955 0.4966 0.4975 0.4982 0.4987. 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.4830 0.4868 0.4898 0.4922 0.4941 0.4956 0.4967 0.4976 0.4982 0.4987. 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.4788 0.4834 0.4871 0.4901 0.4925 0.4943 0.4957 0.4968 0.4977 0.4983 0.4988. 0.0160 0.0557 0.0948 0.1331 0.1700 0.2054 0.2389 0.2704 0.2995 0.3264 0.3508 0.3729 0.3925 0.4099 0.4251 0.4382 0.4495 0.4591 0.4671 0.4738 0.4793 0.4838 0.4875 0.4904 0.4927 0.4945 0.4959 0.4969 0.4977 0.4984 0.4988. 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.3531 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.4842 0.4878 0.4906 0.4929 0.4946 0.4960 0.4970 0.4978 0.4984 0.4989. 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.4846 0.4881 0.4909 0.4931 0.4948 0.4961 0.4971 0.4979 0.4985 0.4989. 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.4808 0.4850 0.4884 0.4911 0.4932 0.4949 0.4962 0.4972 0.4979 0.4985 0.4989. 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2517 0.2823 0.3106 0.3365 0.3599 0.3810 0.3997 0.4162 0.4306 0.4429 0.4535 0.4625 0.4699 0.4761 0.4812 0.4854 0.4887 0.4913 0.4934 0.4951 0.4963 0.4973 0.4980 0.4986 0.4990. 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 0.3830 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 0.4817 0.4857 0.4890 0.4916 0.4936 0.4952 0.4964 0.4974 0.4981 0.4986 0.4990.

(490)

Business Statistics - D A V I D F. G R O E B N E R & P A T R I C K W . S H A N N O N & P H I L L I P C . F R Y & K E N T D . S M I T H , 2011 Part2

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về