Creating Multiple Linear Regression Models for Zappos Company

Paper Type:  Research paper
Pages:  7
Wordcount:  1889 Words
Date:  2022-05-17


The purpose of conducting this project is to show the understanding of creating multiple linear regression models that will fit data from Zappos Company and create a much predictive model that can be used to predict the number of orders of the company products. The company should be able to know who visits their site and what they do when they visit their website. The objective is to both refine and establish certain belief in the regression model. In this study, we analyze data to construct regression models that pertain to a summary estimate of how strongly independent characteristics affect the orders of shoes and clothing at the company website.

Trust banner

Is your time best spent reading someone else’s essay? Get a 100% original essay FROM A CERTIFIED WRITER!

Independent variables included in the model include:

  • The number of visits
  • The number of visits that only viewed one image
  • The number of product pages viewed
  • The number of search pages viewed
  • The number of distinct sessions

The platform (Dummy variable)

The number of orders of shoes and clothing is dependent on several variables typically depends on many explanatory variables. For example, while economic theory teaches that the quantity of a good demanded depends on the frequency of website product visits, the theory also tells us that the amount also depends on other factors: product page visits and the number of search views, etc. Multiple regression analysis allows us to assess such scenarios. By using multiple regression, the above-listed characteristics are analyzed to determine which of them are more significant in their direct impact on the order of shoes and clothing in Zappos Company.

The purpose of module 1 is to test a regression analysis comprising all the independent variables. The interest is to identify which of the variables is most likely to affect the independent variable. We will remove the variable that is less significant. It will be concluded based on the value of the T-stat, the higher the value; the more significant the influence each variable will have on the dependent variable. The variables will be removed in a stepwise manner that contains excessively high P-values. The initial hypothesis is that the number of visits category will have the most significant influence on the number of orders of shoes and clothing in the company.

The purpose of module 2 is to show the regression analysis with the five variables excluding one that has the highest P-value. The vital interest in this model is to find the best fit for the model. The initial hypothesis is that repeating the regression while excluding the variable with high p-value will help us find a better fit for the model.

The purpose of model 3 is to test a regression model with four of the independent x-variables. This model based on its p-value to create a more accurate and predictive model. The assumption is that by removing variables with high p-value and concentrating, the remaining four independent variables will result in a more precise model fit to predict the number of orders at Zappos Company.

The purpose of model 4 is to eliminate another single X-variable to find the best fit for the model. The interest is to identify a more accurate final model eventually. We hypothesized that all p-values would yield significant variable (P<0.5)

Model Refinement Procedure

To determine if our regression variables have a plausible power of value, we critiqued the individual values that could positively impact on the dependent variable. Therefore, by segregating the odd characteristic, the model reinforced the understanding that they are independently distributed. From the correlation output above, it is evident that the platform and the number of visits has a significant correlation to the number of orders than the other variables.

Generated Intuitions

The number of visits and the type of platform having a strong correlation, it is evident that the rankings made since the number of web visits are detrimental to the number of orders of the company product through the online platform. The closest runners-up with higher correlation were the number of distinct sessions, product page views and the number of searches. The correlations are also logically sensible as it is likely that the number of web visits dramatically impacts the number of orders although it is thought that it would have made sense for the correlation to be stronger.

We summarized the results of the multiple regression models for understanding the impacts of web visits, distinct visits, search view pages and product view page on the overall number of orders of shoes and clothing in In the multiple regression models, there are six independent variables. The reliability and performance of the models were demonstrated through S(e)s and R-squared value.

The platform was used as a dummy variable which takes the value of windows (1) and MacOSX (0). It's used to indicate the presence or absence of some definite effects that are expected to shift the outcomes. The coefficient of the platform that includes MacOSX has no form of influence on the number of orders.

The regression model for the variables is defined as:

Orders=v1+v2plartform+v3visits+v4bounces+v5Product page+v6distint sessions+v7search page viewsOrders=-972.363+262.63*plartform+0.449*visits-0.559*distint sessions+0.009*bounces+0.07*product page views-0.0108*search page viewsWhen a visitor uses MACOSX, the dummy will switch off because the variable is zero. When the windows platform has a value of 1, the coefficient of the dependent variable is affected. Therefore, it will increase the number of the orders by 262.63. If the visitor uses MacOSx, the dummy will be zero, and so there will be no effect. Hence, the effect of using windows is less significant from the output above since the P-value is 0.096704 which is greater than 0.05.

Including all variables in the model, the model yields observations whose independent values gives the model large leverage and observations with large standardized residuals. Observations 38, 70, 79 and 111 are unusual. This results in the significance of the standardized residuals are greater than two which indicates that they are outliers. With the aim of making the model more precise, the P-values of the independent variables will be assessed, and the final model will be based on the significance of the P-value of the variable. Using what we have learned in the class as well as the data we collected, we conducted a multiple regression analysis to finalize the model. To do this, we chose to use Minitab and Excel software in computing the model. When we did the first regression output, we found the P-values for all data. Once we looked at all our results, we noticed that not all the P-values were less than Alpha 0.05. Statistic based on the significance of the test method to obtain the P-value, general Alpha is P-value<0.05 was significant. If P-value>0.05, it means the data was insignificant.

Results show that 4 variables were significant at 5 percent confidence level (P< 0.05), except platform (dummy variable) (p=0.096704) and bounces (0.9325). The R-squared values are 0.88024, indicating that 88.024% of variations in the number of orders can be explained by the four independent variables reflecting the number of visits, the number of distinct sessions, the number of product page views and the number of search page views. We found that the number of product page views has the most important impact on the number of orders. The number of search page views, on the other hand, has the least effect on the number of orders and ends up being a bad influence to estimating the influence to the overall number of orders.

Estimation of the variance of the model is measured by the value of SS (Residual). Including all the six independent variables in the model results to an S2=31457441997.4761. Therefore, the value of S is41760770.59 =6462.257. The model results in a very high value that warrants it unfit for predicting values. The smaller the value, the better and more precise the independent variables are for predicting the number of orders using the model..Stepwise Regression

From the stepwise regression output, the steps that run parallel across the output show the independent variables that are used as predictors added at each point in the regression output. The resulting model, all variables are added to the model with an average R2 of 87.12. According to the model, the models have a good where the subsets set out precise information about the stepwise regression model yields a better fit and a better predictor set of independent variable output with the number of predictor variables set at 4.

According to the T-statistics output, the product view pages has the highest impact on the number of orders made at for the demand of shoes and clothing. The others that follow respectively are the number of visits, search page views, and distinct session.

Best Fit Model

The best model was selected based on the behavior of the predictor variables in the model. The final model included independent variables with a high value of both R2 and R2 (Adjusted), Mallow's Cp that is closer to the number of independent variables in the overall model.

Counter-Intuitive Findings

In this case, considering the R2 value where 88.024% of the variables can describe the relationship between the independent and the number of orders might be misleading. Hence, variables having a P-value of p>0.05 will be eliminated with the aim of creating a better model. Removal of the platform which is the dummy variable due to its significantly high P-value of 0.096704. Although in theory, it is expected that eliminating the variable with the highest p-value will improve the model, the overall adjusted R-square value slightly decreased from 0.8752 to 0.8736yielding a 0.72% decrease in the variance of the dependable. Furthermore, removing the number of bounces (P-value of 0.932506) due to its high P-value caused R-squared to drop to 0.8711. This shows that the two variables coefficients were insignificant, so we decided to remove the two insignificant variables from our observation. The final model includes the number of visits, distinct web visits, product page views and search page views.

The Final Model

The regression equation is

The regression equation of the model is y=v0+v1Y1+v2Y2+v3Y3.........vPYPOrders=-905.76+0.175visits-0.18882(distinct sessions)+0.0769product page views-0.00878(search page views)Durbin-Watson Analysis

The Durbin-Watson statistic was computed using Minitab statistical software. These tests are used for autocorrelation between variables in a regression model. The values of the tests range from values 0-4. A value of 2 shows no autocorrelation cases while values that differ between 0-2 shows a positive correlation while values that vary between 2-4 shows a negative relationship between the variables. From the regression model, the Durbin-Watson found is 1.89586. This shows a definite autocorrelation case where the test value is less than 2. The value of dLand dU in the Durbin and Watson, the dL value that involves all the 150 observations containing 4 variables is 1.68 while that of dU is 1.79. The model yields a positive autocorrelation, however, 4-1.68<1.89586, 2.32<1.8 the test is considered as inconclusive.

The prediction interval output assumes a weight of 1. An adjustment must be made if a weight other than 1 is used. Statistical value of 1.89586. Indicates that there is a positive autocorrelation in the variables.

Coefficient Interpretation

The estimate of the gross sales coefficient equals -905.76; that is, we estimate that if the orders increase by one unit the number of web visits, Distinct visits, product page views and the number of search page views, the number of orders decreases by 905.76 units.


Cite this page

Creating Multiple Linear Regression Models for Zappos Company. (2022, May 17). Retrieved from

Free essays can be submitted by anyone,

so we do not vouch for their quality

Want a quality guarantee?
Order from one of our vetted writers instead

If you are the original author of this essay and no longer wish to have it published on the ProEssays website, please click below to request its removal:

didn't find image

Liked this essay sample but need an original one?

Hire a professional with VAST experience and 25% off!

24/7 online support

NO plagiarism