Excel is a popular tool for doing regression analysis because it has a user-friendly interface and a variety of features. Here’s the dataset we’ll use to demonstrate how you can use the program to perform the analysis. Download Practice Workbook
Regression analysis is an important statistical tool. It predicts values that depend on two or more variables. It can be used to predict future trends by examining the direction and slope of the regression line.
Simple linear regression shows the relationship between a single independent and dependent variable. It can be calculated by the following mathematical equation:
Y=mX+C+EThe variables are,
Y = Dependent Variable
m = Slope of the Regression Formula
X = Independent Variable
C = Intercepted value of the Y-axis
Ε = Error Term, the difference between the actual value and predicted value.
Multiple linear regression shows the relationship between a dependent variable and several independent variables. The equation for calculating multiple regression analysis is as follows.
Y=b+b1X1+b2X2+…. +bnXnWhere,
Y is the dependent variable
b is the intercept
X1 and X2 are the independent variables
b1 and b2 are coefficients of the corresponding independent variables.
We’ll use the following dataset to perform regression analysis using the LINEST function.
Formula Explanation:
D5:D14 is the range of independent variable data points.
C5:C14 is the range of the dependent variable data points.
Note:
The LINEST function returns an array. If your Microsoft Office version is older than Office 365, then you have to press Ctrl + Shift + Enter simultaneously instead of Enter. The formula will look like this.
y using intercept function" width="527" height="521" />
To enable Analysis ToolPak:
To perform multiple linear regression analysis, we have the following dataset. Our dataset consists of the price of the car varies depending on the Maximum Speed, Peak Power, and Range.
Performing multiple linear regression analysis using Analysis ToolPak is essentially the same as simple linear regression analysis. The only difference is in the input X range.
Performing regression analysis is quite easy. However, understanding the output may seem difficult if you do not know what the terms mean.
Summary Output
Multiple R: Multiple R indicates the correlation between variables. Its value ranges from -1 to 1. The bigger the value, the stronger correlative the relationships are.
R Square: It symbolizes the Coefficient of Determination. It indicates how well the data model fits the Regression Analysis. An R-squared value of more than 95% is generally regarded as a good fit for a regression model. In our example, the value of 0.997 is pretty good. The regression analysis model is a good fit for the data, as almost 99% of the values fall within the predicted range.
Adjusted R Square: The value of R^2 is used in multiple variables Regression Analysis instead of R square. The adjusted R-squared is a metric that takes into account the number of independent variables included in the model.
Standard Error: It shows a healthy fit of Regression Analysis. A smaller number for the regression equation provides increased certainty in its accuracy and reliability. It shows the average distance of data points from the Linear equation.
Observations: The number of iterations in the data model.
ANOVA
ANOVA means Analysis of Variance. It is the second part of the analysis result.
df: df expresses the Degrees of Freedom. It can be calculated using the df=N-k-1 formula where N is the sample size, and k is the number of regression coefficients.
SS: Sum of Squares symbolizes the good to fit parameter. The Sum of Squares is the square of the difference between a value and the mean value. The higher value of the Sum of Squares refers to a higher variation in the values or vice-versa.
MS: It means the Mean Square. Mean Square is mainly the mean of the square of the variation of an individual value and the mean value of the set of observations.
F: F refers to the Null Hypothesis. It tests the overall significance of the regression model. If you divide the MS of regression by the MS of Residual, you’ll get the F-test.
Significance F: The P-Value of F. Significance F is a crucial term to find the output of your model whether it is statistically significant or not. When the value of the Significance F is not greater than 0.05, the independent variables have a statistically significant relationship with the dependent variable.
Coefficients
It helps to calculate the Y values easily. You can build a linear regression equation with the help of this.
Standard Error: This is the standard deviation of least square estimates.
t Stat: Refers to the coefficient being equal to zero in the case of the null hypothesis.
P-value: The P-value shows the statistically significant relationship between the independent and dependent variables. Here, P-value for Unit Price is 0.000003 which is below 0.05. So, Unit Price is statistically significant with the Sales.
Lower 95%: It means the lower limit when the confidence interval is 95%.
Upper 95%: It is the upper limit of the confidence interval.
Residual Output
This compares the estimated value with the calculated value.
Charts can visually represent the relationship between the variables of linear regression.
Excel supports various types of regression analysis, including simple linear regression, and multiple linear regression, among others.
The R-squared value, also known as the coefficient of determination. It represents the proportion of variance in the dependent variable that is explained by the independent variables. A higher R-squared value indicates a better fit of the regression model.
Yes, Excel has limitations in terms of handling very large datasets and complex regression models. Specialized statistical software may be more suitable for advanced analyses.
Md. Maruf Niaz, BSc, Textile Engineering, Bangladesh University of Textiles, has worked with the ExcelDemy project for 11 months. He works as an Excel and VBA Content Developer who provides easy solutions to Excel-related problems and regularly writes fantastic content articles. He has published almost 20 articles in ExcelDemy. As an Excel & VBA Content Developer for ExcelDemy, he offers solutions to complex issues. He demonstrates an enthusiastic mindset and adeptness in managing critical situations with finesse, showcasing. Read Full Bio
We will be happy to hear your thoughtsExcelDemy is a place where you can learn Excel, and get solutions to your Excel & Excel VBA-related problems, Data Analysis with Excel, etc. We provide tips, how to guide, provide online training, and also provide Excel solutions to your business problems.