Regression and correlation relationship

Correlation and Regression

regression and correlation relationship

Correlation and linear regression are the most commonly used techniques for investigating the relationship between two quantitative variables. Regression and correlation analysis: Regression analysis involves identifying the relationship between a dependent variable and one or more independent. Regression analysis is a related technique to assess the relationship between an outcome variable and one or more risk factors or confounding variables.

regression and correlation relationship

It doesn't matter which of the two variables you call "X" and which you call "Y". You'll get the same correlation coefficient if you swap the two. The decision of which variable you call "X" and which you call "Y" matters in regression, as you'll get a different best-fit line if you swap the two.

  • Correlation and Regression
  • Introduction to Correlation and Regression Analysis
  • Difference Between Correlation and Regression

The line that best predicts Y from X is not the same as the line that predicts X from Y however both those lines have the same value for R2 Assumptions The correlation coefficient itself is simply a way to describe how two variables vary together, so it can be computed and interpreted for any two variables. Further inferences, however, require an additional assumption -- that both X and Y are measured, and both are sampled from Gaussian distributions.

Difference between Correlation and Regression

This is called a bivariate Gaussian distribution. If those assumptions are true, then you can interpret the confidence interval of r and the P value testing the null hypothesis that there really is no correlation between the two variables and any correlation you observed is a consequence of random sampling.

What is the difference between correlation and linear regression?

With linear regression, the X values can be measured or can be a variable controlled by the experimenter. The X values are not assumed to be sampled from a Gaussian distribution. The vertical distances of the points from the best-fit line the residuals are assumed to follow a Gaussian distribution, with the SD of the scatter not related to the X or Y values.

Regression analysis involves identifying the relationship between a dependent variable and one or more independent variables. A model of the relationship is hypothesized, and estimates of the parameter values are used to develop an estimated regression equation.

Introduction to Correlation and Regression Analysis

Various tests are then employed to determine if the model is satisfactory. If the model is deemed satisfactory, the estimated regression equation can be used to predict the value of the dependent variable given values for the independent variables.

If the error term were not present, the model would be deterministic; in that case, knowledge of the value of x would be sufficient to determine the value of y. Either a simple or multiple regression model is initially posed as a hypothesis concerning the relationship among the dependent and independent variables.

The least squares method is the most widely used procedure for developing estimates of the model parameters. The magnitude of the correlation coefficient indicates the strength of the association.

A correlation close to zero suggests no linear association between two continuous variables. You say that the correlation coefficient is a measure of the "strength of association", but if you think about it, isn't the slope a better measure of association?

We use risk ratios and odds ratios to quantify the strength of association, i. The analogous quantity in correlation is the slope, i. And "r" or perhaps better R-squared is a measure of how much of the variability in the dependent variable can be accounted for by differences in the independent variable.


The analogous measure for a dichotomous variable and a dichotomous outcome would be the attributable proportion, i. Therefore, it is always important to evaluate the data carefully before computing a correlation coefficient.

Graphical displays are particularly useful to explore associations between variables. The figure below shows four hypothetical scenarios in which one continuous variable is plotted along the X-axis and the other along the Y-axis.

regression and correlation relationship

Scenario 3 might depict the lack of association r approximately 0 between the extent of media exposure in adolescence and age at which adolescents initiate sexual activity. Example - Correlation of Gestational Age and Birth Weight A small study is conducted involving 17 infants to investigate the association between gestational age at birth, measured in weeks, and birth weight, measured in grams.

We wish to estimate the association between gestational age and infant birth weight. In this example, birth weight is the dependent variable and gestational age is the independent variable.

The data are displayed in a scatter diagram in the figure below. Each point represents an x,y pair in this case the gestational age, measured in weeks, and the birth weight, measured in grams.