Data Set Exercise – Each Data Set Exercise is worth 20 points Use the Data Sets found in Appendix A in the back of your book to complete the following exercises.
1. Refer to the Baseball 2012 data, which reports information on the 2012 Major League Baseball season. Let attendance be the dependent variable and total team salary, in millions of dollars, be the independent variable. Determine the regression equation and answer the following questions.
a. Draw a scatter diagram. From the diagram, does there seem to be a direct relationship between the two variables?
b. What is the expected attendance for a team with a salary of $80.0 million?
c. If the owners pay an additional $30 million, how many more people could they expect to attend?
d. At the .05 significance level, can we conclude that the slope of the regression line is positive? Conduct the appropriate test of hypothesis.
e. What percentage of the variation in attendance is accounted for by salary?
f. Determine the correlation between attendance and team batting average and between attendance and team ERA. Which is stronger? Conduct an appropriate test of hypothesis for each set of variables.
2. Refer to the Baseball 2012 data, which report information on the 30 Major League Baseball teams for the 2012 season. Let the number of games won be the dependent variable and the following variables be independent variables: team batting average, number of stolen bases, number of errors committed, team ERA, number of home runs, and whether the team plays in the American or the National League.
a. Use a statistical software package to determine the multiple regression equation. Discuss each of the variables. For example, are you surprised that the regression coefficient for ERA is negative? Is the number of wins affected by whether the team plays in the National or the American League?
b. Find the coefficient of determination for this set of independent variables.
c. Develop a correlation matrix. Which independent variables have strong or weak correlations with the dependent variable? Do you see any problems with multicollinearity?
d. Conduct a global test on the set of independent variables. Interpret.
e. Conduct a test of hypothesis on each of the independent variables. Would you consider deleting any of the variables? If so, which ones?
f. Rerun the analysis until only significant regression coefficients remain in the analysis. Identify these variables.
g. Develop a histogram or a stemandleaf display of the residuals from the final regression equation developed in part (f). Is it reasonable to conclude that the normality assumption has been met?
h. Plot the residuals against the fitted values from the final regression equation developed in part (f). Plot the residuals on the vertical axis and the fitted values on the horizontal axis.
3. Refer to the Buena School District bus data. First, add a variable to change the type of bus (diesel or gasoline) to a qualitative variable. If the bus type is diesel, then set the qualitative variable to 0. If the bus type is gasoline, then set the qualitative variable to 1. Develop a regression equation using statistical software with maintenance as the dependent variable and age, miles, and bus type as the independent variables.
a. Write out the multiple regression equation analysis. Discuss each of the variables.
b. Determine the value of R2. Interpret.
c. Develop a correlation matrix. Which independent variables have strong or weak correlations with the dependent variable? Do you see any problems with multicollinearity?
d. Conduct the global test on the set of independent variables. Interpret.
e. Conduct a test of hypothesis on each of the independent variables. Would you consider deleting any of the variables? If so, which ones?
f. Rerun the analysis until only significant regression coefficients remain in the analysis. Identify these variables.
g. Develop a histogram or a stemandleaf display of the residuals from the final regression equation developed in part (f). Is it reasonable to conclude that the normality assumption has been met?
h. Plot the residuals against the fitted values from the final regression equation developed in part (f) against the fitted values of Y. Plot the residuals on the vertical axis and the fitted values on the horizontal axis.
The software that is recommended to complete these exercises is MegaStat for Microsoft Excel. This was developed by J.B. Orris and is a fullfeatured Excel addin that is available at www.mhhe.com/megastat. It will work with Excel 2003, 2007, and 2010. After you access the website you have 10 days to successfully download MegaStat on your local computer. Once installed, MegaStat will remain active in Excel with no expiration date or time limitations. The software performs statistical analysis within an Excel workbook. It does basic functions, such as descriptive statics, frequency distribution, and probability calculations as well as hypothesis testing, ANOVA, and regression. Screencam tutorials are included that provide a walkthrough of major business statistics topics. Help files are built in, and an introductory user’s manual is also included.
However if you use and have access to Minitab, SPSS, or JMP you can use these software tools to solve the business statistics exercises in the text.
Submit the completed Assignment to your Faculty Mentor. Make sure to label the files correctly.
