Academic Resource Center

Fitted Models

Updated on

The Slope Formula, Correlation Coefficient, and the Coefficient of Determination

This guide is focused on foundational concepts needed for Applied Statistics as well as Intro to Statistics and assumes a general knowledge of how to read a graph, perform basic algebra, and to navigate and use equations in excel. See the Microsoft office tech tutorial links for excel tutorials.

Sometimes all you have to find your regression equation with is a graph. You can find the slope of a line using two coordinates from the graph by dividing the rise (y) over the run (x). For an introduction to linear regression, see the Scatterplots and Linear Regression guide.

The correlation coefficient (r) and coefficient of determination (r2) are similar concepts, but they ultimately represent different things. Both are useful in determining the strength of the linear relationship between two variables. We use the correlation coefficient to determine if there’s a relationship between two variables, how strong that relationship is, and what direction the relationship is in. The coefficient of determination tells us how good our regression model will be at making predictions.

Fitted Models

Slope Formula for determining slope from a graph

Find two points on the regression line that cross at whole numbers, especially across the intersection of grid-lines

Points on the graph are written as (x,y)

Equation: (rise over run)

(y2 - y1)/(x2 - x1)

Example:

 

Fitted Models.pdf - Google Drive - Google Chrome

The slope is 1.5. Every time x increases by 1, y increases by 1.5

Extrapolating

Extrapolating is when predictions are made outside the data range provided. For example, if the x values of a data set ranged from 100 to 1,000, extrapolating would be making a prediction when x=25 or when x=1,200. Extrapolating can be risky. For example, if data was gathered on the growth rate of babies up to age one based on how many months old they were, we wouldn’t want to use that regression equation to predict the size of a 12-year-old child because rate of growth slows down as children age. We would need to also collect data for growth in children ages one to at least twelve to make predictions for a 12-year-old.

Correlation Coefficient (r)

The correlation coefficient is the association between two variables Positive r means a positive trend and a negative r means a negative trend

−1 ≤ r ≤ 1

Excel Equation: =correl(array1, array2)

Array1 are your x values

Array2 are your y values

Strength interpretations:

Weak 0 < |R| ≤ 0.40

Moderate 0.40 < |R| ≤ 0.80

Strong 0.80 < |R| ≤ 1.00

Example:

A correlation coefficient of r = - 0.62 is a moderate, negative association. This means there does seem to be a linear relationship between these two variables, but it’s not super strong. Our data might be more scattered around the regression line. The negative association means as our x variable increases, our y variable decreases.

Coefficient of Determination (r^2)

The coefficient of Determination shows how closely the regression line follows the pattern of data. It measures the variation in the response variable (y) that can be explained by the predictor variable (x).

0 ≤ r2 ≤ 1

There are no negative coefficients of determination

The closer to one, the better (1 represents 100%). A coefficient of 1 would mean 100% of the variation in y can be explained by x. This means the x variable is useful for making predictions about the y variable.

Excel Equation: =rsq(known_ys, known_xs)

Example:

Data is collected on the test scores of students and how many hours they studied. The hours studied is the predictor (x) variable and the test scores is the response variable (y). A strong, positive correlation is found between how long a student studies and their test score. A regression line is fitted to a graph of the data with a regression equation. When the coefficient of determination is calculated, it is found to be .89. This means that 89% of the variation found in the student’s test scores can be explained by how long each of the students studied. The other 11% of variation would be explained by other factors, like if the student was distracted during the test, if they suffer from test anxiety, or how much sleep they got the night before. This makes hours studied a useful variable to predict the test scores of students.

Next Steps:

Now you know how to calculate the slope of a graph and how to find r and r2. These will be useful skills as you analyze data that can be described with a linear relationship.

Other inferential tests include hypothesis testing and confidence intervals.

Need More Help?

Click here to schedule a 1:1 with a tutor, coach, and or sign up for a workshop. *If this link does not bring you directly to our platform, please use our direct link to "Academic Support" from any Brightspace course at the top of the navigation bar.   

Previous Article Center, Shape and Spread
Next Article Normal Distributions and Central Limit Theorem
Still Need Help? Schedule a service in the Academic Support Center