Research Paper on Multiple Regression Analysis

Paper Type: Research paper

Pages: 5

Wordcount: 1344 Words

Date: 2022-11-03

Categories:

Introduction

A multiple regression model contains more than one independent variables. It explains the relationship between one response variable which may be continuous and two or more explanatory variables (Rencher and Bruce 3). Many applications of regression analysis involve situations in which there are more than one predictor variables. In this case, the three independent variables in use are Eliminations, Deaths, OffAssists, and a dependent variable AvgTeamLvl, all of which are quantitative with 60 data points. This model can be best described as

Yi = v0 + v1X1 + v2X2 + v3X3 + i

Where Yi is a sequence of dependent variables and represents AvgTeamLvl. X1, X2, and X3 are sequences of independent variables representing Eliminations, Deaths, and OffAssists respectively. i is random error terms, which are independently and identically normally distributed. The parameters v0, v1, v2, and v3, are termed as regression coefficients, and v0 is a constant and an intercept in the model. This model can be used for various purposes such as the prediction of trends and future values, data description, parameter estimation, variable selection, and control output.

According to the analysis as depicted in Table 1, the regression coefficients representing the mean change in the response variable are 28.8166, 2.3867, 4.9396 and 0.1628. Hence the model is

Yi = 28.8166 + 2.3867X1 + 4.9396X2 + 0.1628X3 + i where v0 = 28.8166 which is a constant, v1 = 2.386, v2 = 4.9396 and v3 = 0.1628

The P-values of the coefficients from Table 1 are all greater than the significance level indicating that the predictor variables of eliminations, deaths, and OffAssists are not statistically significant.

Assumptions of This Model

The error terms are independently and identically distributed random variables, that is, the covariance between two error terms is zero. For smaller sample properties, the residuals are normally distributed with mean zero and variance sigma squared. The mean of the error terms always tend to zero, and this assumption is most satisfied when the sample size is large. Constant error variance requires that all Yi's come from the same distribution. The independent variables are non-random and independent of error term since the error term does not follow a systematic pattern. A linear relationship is assumed between the independent variables and the dependent variable. The error terms are approximately rectangular shaped and homoscedastic. The residuals are not autocorrelated, and independent variables are not too highly correlated.

Goodness of the Fit

The goodness of fit involves measuring how well the observed data corresponds to the fitted model. Both R squared and adjusted R squared values are useful measures of the goodness of fit. According to Montgomery, Peck, and Vining, R-squared, also termed as the coefficient of multiple determination, measures the closeness of the data that is the fitted in a regression line. It provides a measure of the response variable variance expressed as a percentage that the model explains and usually ranges between 0 and 100. For the model to be considered to fit the data, the higher should R-squared be. In our case, R squared is 0.1654 (as depicted in Table 2), which means that the fitted model explains 16.54% of the variability in the data.

For this reason, the model can be said to be inadequate. R squared can be biased, so adjusted Squared modifies it based on degrees of freedom. Adjusted R squared from Table 2 is 0.1207 and adjusts for the observations in the model. Its value increases when the new term improves the fitted model and decreases when the term does not improve the fitted model. Correlation between Y and Y-hat is 0.4067 and tells how strong the linear relationship is.

The standard error from the regression statistics table is 51.1021 and refers to the estimated standard deviation of the residuals. It provides the absolute measure of distance where data points lie on the regression line on average. The standard error also depicts the precision of the model predictions, using the response variable units. Lower values of the standard error signify that the distance between the fitted values and the data points is small. In our case, the standard error indicates that the distance between the data points and fitted values is 51.1021, which is large. The standard errors from the regression coefficients is a measure of uncertainty associated with the regression coefficients and give the standard deviation of least squares estimates. It helps in building a confidence interval around the regression coefficients an evaluating how statistically significant is the independent variables within the model.

The significant associated P-value for the F-test from Table 3 is 0.0168. Comparing this P-value with 0.05 significance level, the P-value is less than 0.05. Hence, the sample data proves that the regression model fits the data better than when no explanatory variables are used. The overall F-test for the null hypothesis is 3.6996. Here, we do not reject the null hypothesis because the P-value for the F-test of the overall significance test is greater than the significance level.

Model Refinement

Adding interaction terms to the model expands understanding of the relationship between the variables in the model and allows more hypothesis to be tested. It changes the interpretation of all the coefficients. We could write the model as Yi = v0 + v1X1 + v2X2 + v12X1X2+v13X1X2X3. The model becomes better than the initial model when interaction terms are added since R squared becomes 0.2152 indicating 21.52% 0f variability in the data.

When there is a large number of potential predictor variables, which can be used to model the dependent variable Y, the approach is to use the fewest number of independent variables (Seber and Alan). Any variable whose P-value is greater than the significance level can be eliminated using stepwise regression or backward elimination. Eliminating X3 from the model and testing for X1 and X2, as depicted in Table 4 the model becomes

Yi = 28.9784 + 2.5420X1 + 4.9332X2.

The P-value for X1 becomes significant because it is less than 0.05 while the P-value for X2 can also be eliminated since it is greater than the significance level. When X2 is eliminated, the model becomes

Yi = 39.2715 + 3.2175X1 as shown in Table 5, with P-value of X1 being less than significance level. The remaining independent variable is statistically significant in the model. The stepwise regression does not produce the same model as our model.

Confidence and Prediction Intervals

The confidence interval is obtained from the statistics of the observed data and shows how well the mean is determined. From Table 6, the confidence interval is 78.0528 Y-hat 104.4846. We are 95% certain that the parameter lies between 78.0528 and 104.4846.

A prediction interval is an estimate of an interval with a certain probability, given the observation, in which a future observation will fall (De Brabanter). It usually forecasts value and the nature of new observations based on the current model. A prediction interval is always wider than the confidence interval. The prediction interval for Y for a given value of X's from Table 6 is

-11.1011 Y-hat 193.6385.

Where -11.1011 is the lower bound, and 193.6385 is the upper bound. Based on the prediction interval obtained from the results, we can estimate that all future observations will most certainly lie between -11.1011 and 193.6385.

Conclusion

The multiple regression conducted has enabled the analysis of the relationship between independent variables Eliminations, Deaths, and OffAssists, and the dependent variable AvgTeamLvl. The resulting model was inadequate since it explains 16.54% variability of data. Due to this, we refined the model using stepwise regression and eliminated the insignificant variables Deaths and OffAssists. These variables had a p-value of more than 5%. Therefore, eliminating them gives a better model as shown. Consequently, we are not entirely comfortable with the model. To create a better model, we could add the number of observations and more independent variables, which are more independent of AvgTeamLvl.

Work Cited

De Brabanter, Kris, et al. "Approximate confidence and prediction intervals for least squares support vector regression." IEEE Transactions on Neural Networks 22.1 (2011): 110-120.

Draper, Norman R., and Harry Smith. Applied regression analysis. Vol. 326. John Wiley & Sons, 2014.

Montgomery, Douglas C., Peck, Elizabeth A., and Vining, G. Geoffrey. Introduction to linear regression analysis. Vol. 821. John Wiley & Sons, 2012.

Rencher, Alvin C., and G. Bruce Schaalje. Linear models in statistics. John Wiley & Sons, 2008.

Seber, George AF, and Alan J. Lee. Linear regression analysis. Vol. 329. John Wiley & Sons, 2012.

Cite this page

Research Paper on Multiple Regression Analysis. (2022, Nov 03). Retrieved from https://proessays.net/essays/research-paper-on-multiple-regression-analysis

If you are the original author of this essay and no longer wish to have it published on the ProEssays website, please click below to request its removal: