# linear regression

• Most applications fall into one of the following two broad categories: • If the goal is error reduction in prediction or forecasting, linear regression can be used to fit
a predictive model to an observed data set of values of the response and explanatory variables.

• Fitting a linear model to a given data set usually requires estimating the regression coefficients such that the error term is minimized.

• For standard least squares estimation methods, the design matrix X must have full column rank p; otherwise perfect multicollinearity exists in the predictor variables, meaning
a linear relationship exists between two or more predictor variables.

• It is also possible in some cases to fix the problem by applying a transformation to the response variable (e.g., fitting the logarithm of the response variable using a linear
regression model, which implies that the response variable itself has a log-normal distribution rather than a normal distribution).

• In the more general multivariate linear regression, there is one equation of the above form for each of m > 1 dependent variables that share the same set of explanatory variables
and hence are estimated simultaneously with each other: for all observations indexed as i = 1, … , n and for all dependent variables indexed as j = 1, … , m. Nearly all real-world regression models involve multiple predictors, and basic
descriptions of linear regression are often phrased in terms of the multiple regression model.

• This model is non-linear in the time variable, but it is linear in the parameters β1 and β2; if we take regressors xi = (xi1, xi2) = (ti, ti2), the model takes on the standard
form Assumptions See also: Ordinary least squares § Assumptions Standard linear regression models with standard estimation techniques make a number of assumptions about the predictor variables, the response variables and their relationship.

• Applications of the group effects include (1) estimation and inference for meaningful group effects on the response variable, (2) testing for “group significance” of the variables
via testing versus , and (3) characterizing the region of the predictor variable space over which predictions by the least squares estimated model are accurate.

• A simple way to identify these meaningful group effects is to use an all positive correlations (APC) arrangement of the strongly correlated variables under which pairwise
correlations among these variables are all positive, and standardize all predictor variables in the model so that they all have mean zero and length one.

• For example, weighted least squares is a method for estimating linear regression models when the response variables may have different error variances, possibly with correlated
errors.

• prices or populations) that vary over a large scale—which are better described using a skewed distribution such as the log-normal distribution or Poisson distribution (although
GLMs are not used for log-normal data, instead the response variable is simply transformed using the logarithm function); • when modeling categorical data, such as the choice of a given candidate in an election (which is better described using
a Bernoulli distribution/binomial distribution for binary choices, or a categorical distribution/multinomial distribution for multi-way choices), where there are a fixed number of choices that cannot be meaningfully ordered; • when modeling
ordinal data, e.g.

• Many statistical inference procedures for linear models require an intercept to be present, so it is often included even if theoretical considerations suggest that its value
should be zero.

• In general, for a group of strongly correlated predictor variables in an APC arrangement in the standardized model, group effects whose weight vectors are at or near the centre
of the simplex ( ) are meaningful and can be accurately estimated by their minimum-variance unbiased linear estimators.

• • If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied
to quantify the strength of the relationship between the response and the explanatory variables, and in particular to determine whether some explanatory variables may have no linear relationship with the response at all, or to identify which
subsets of explanatory variables may contain redundant information about the response.

• Others • In Dempster–Shafer theory, or a linear belief function in particular, a linear regression model may be represented as a partially swept matrix, which can be
combined with similar matrices representing observations and other assumed normal distributions and state equations.

• Group effects provide a means to study the collective impact of strongly correlated predictor variables in linear regression models.

• The decision as to which variable in a data set is modeled as the dependent variable and which are modeled as the independent variables may be based on a presumption that
the value of one of the variables is caused by, or directly influenced by the other variables.

• Multiple linear regression is a generalization of simple linear regression to the case of more than one independent variable, and a special case of general linear models,
restricted to one dependent variable.

• However, it has been argued that in many cases multiple regression analysis fails to clarify the relationships between the predictor variables and the response variable when
the predictors are correlated with each other and are not assigned following a study design.

• The link function is often related to the distribution of the response, and in particular it typically has the effect of transforming between the range of the linear predictor
and the range of the response variable.

• The model remains linear as long as it is linear in the parameter vector β. o The values xij may be viewed as either observed values of random variables Xj or as fixed values
chosen prior to observing the dependent variable.

• [2] In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data.

• In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent
and independent variables).

• Some methods such as generalized least squares are capable of handling correlated errors, although they typically require significantly more data unless some sort of regularization
is used to bias the model towards assuming uncorrelated errors.

• Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictors, rather than
on the joint probability distribution of all of these variables, which is the domain of multivariate analysis.

• Generalized linear models allow for an arbitrary link function, g, that relates the mean of the response variable(s) to the predictors: .

• Effects with weight vectors far away from the centre are not meaningful as such weight vectors represent simultaneous changes of the variables that violate the strong positive
correlations of the standardized variables in an APC arrangement.

• Methods for fitting linear models with multicollinearity have been developed,[5][6][7][8] some of which require additional assumptions such as “effect sparsity”—that a large
fraction of the effects are exactly zero.

• It has an interpretation as the expected change in the response variable when increases by one unit with other predictor variables held constant.

• To check for violations of the assumptions of linearity, constant variance, and independence of errors within a linear regression model, the residuals are typically plotted
against the predicted values (or each of the individual predictors).

• If the experimenter directly sets the values of the predictor variables according to a study design, the comparisons of interest may literally correspond to comparisons among
units whose predictor variables have been “held fixed” by the experimenter.

• The basic model for multiple linear regression is for each observation i = 1, … , n. In the formula above we consider n observations of one dependent variable and p independent
variables.

• Single index models[clarification needed] allow some degree of nonlinearity in the relationship between x and y, while preserving the central role of the linear predictor
β′x as in the classical linear regression model.

• This can be caused by accidentally duplicating a variable in the data, using a linear transformation of a variable along with the original (e.g., the same temperature measurements
expressed in Fahrenheit and Celsius), or including a linear combination of multiple variables in the model, such as their mean.

• [3] Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values;
less commonly, the conditional median or some other quantile is used.

• For a group of predictor variables, say, , a group effect is defined as a linear combination of their parameters where is a weight vector satisfying .

• [9] Group effects In a multiple linear regression model parameter of predictor variable represents the individual effect of .

• In this case, we “hold a variable fixed” by restricting our attention to the subsets of the data that happen to have a common value for the given predictor variable.

• After developing such a model, if additional values of the explanatory variables are collected without an accompanying response value, the fitted model can be used to make
a prediction of the response.

• The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression.

• Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the “lack of fit” in some other
norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares cost function as in ridge regression (L2-norm penalty) and lasso (L1-norm penalty).

• The following are the major assumptions made by standard linear regression models with standard estimation techniques (e.g.

• Beyond these assumptions, several other statistical properties of the data strongly influence the performance of different estimation methods: • The statistical relationship
between the error terms and the regressors plays an important role in determining whether an estimation procedure has desirable sampling properties such as being unbiased and consistent.

• (In fact, as this shows, in many cases—often the same cases where the assumption of normally distributed errors fails—the variance or standard deviation should be predicted
to be proportional to the mean, rather than constant.)

• This essentially means that the predictor variables x can be treated as fixed values, rather than random variables.

• Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y | x) is
linear in the unknown parameters that are estimated from the data.

• The relationship between the error term and the regressors, for example their correlation, is a crucial consideration in formulating a linear regression model, as it will
determine the appropriate estimation method.

• Interpretation The data sets in the Anscombe’s quartet are designed to have approximately the same linear regression line (as well as nearly identical means, standard
deviations, and correlations) but are graphically very different.

• Individual effects of such variables are not well-defined as their parameters do not have good interpretations.

• Thus, although the terms “least squares” and “linear model” are closely linked, they are not synonymous.

• Thus meaningful group effects of the original variables can be found through meaningful group effects of the standardized variables.

• A group effect of the original variables can be expressed as a constant times a group effect of the standardized variables .

• Bayesian linear regression techniques can also be used when the variance is assumed to be a function of the mean.

• The predictor variables themselves can be arbitrarily transformed, and in fact multiple copies of the same underlying predictor variable can be added, each one transformed
differently.

• Errors-in-variables Errors-in-variables models (or “measurement error models”) extend the traditional linear regression model to allow the predictor variables X to be
observed with error.

• Generally these extensions make the estimation procedure more complex and time-consuming, and may also require more data in order to produce an equally precise model.

• Bayesian linear regression can also be used, which by its nature is more or less immune to the problem of overfitting.

• Furthermore, when the sample size is not large, none of their parameters can be accurately estimated by the least squares regression due to the multicollinearity problem.

• Although this assumption is not realistic in many settings, dropping it leads to significantly more difficult errors-in-variables models.

• Under certain conditions, simply applying OLS to data from a single-index model will consistently estimate β up to a proportionality constant.

• [4] This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the
statistical properties of the resulting estimators are easier to determine.

• Alternatively, there may be an operational reason to model one of the variables in terms of the others, in which case there need be no presumption of causality.

• Both interpretations may be appropriate in different cases, and they generally lead to the same estimation procedures; however different approaches to asymptotic analysis
are used in these two situations.

• With this much flexibility, models such as polynomial regression often have “too much power”, in that they tend to overfit the data.

• Simple and multiple linear regression Example of simple linear regression, which has one independent variable The very simplest case of a single scalar predictor variable
x and a single scalar response variable y is known as simple linear regression.

• A fitted linear regression model can be used to identify the relationship between a single predictor variable xj and the response variable y when all the other predictor variables
in the model are “held fixed”.

• When is strongly correlated with other predictor variables, it is improbable that can increase by one unit with other variables held constant.

• The notion of a “unique effect” is appealing when studying a complex system where multiple interrelated components influence the response variable.

• • The arrangement, or probability distribution of the predictor variables x has a major influence on the precision of estimates of β.

• With strong positive correlations and in standardized units, variables in the group are approximately equal, so they are likely to increase at the same time and in similar
amount.

• The presence of heteroscedasticity will result in an overall “average” estimate of variance being used instead of one that takes into account the true variance structure.

Works Cited

[‘1. David A. Freedman (2009). Statistical Models: Theory and Practice. Cambridge University Press. p. 26. A simple regression equation has on the right hand side an intercept and an explanatory variable with a slope coefficient. A multiple regression
e right hand side, each with its own slope coefficient
2. ^ Rencher, Alvin C.; Christensen, William F. (2012), “Chapter 10, Multivariate regression – Section 10.1, Introduction”, Methods of Multivariate Analysis, Wiley Series in Probability and
Statistics, vol. 709 (3rd ed.), John Wiley & Sons, p. 19, ISBN 9781118391679.
3. ^ Hilary L. Seal (1967). “The historical development of the Gauss linear model”. Biometrika. 54 (1/2): 1–24. doi:10.1093/biomet/54.1-2.1. JSTOR 2333849.
4. ^ Yan,
Xin (2009), Linear Regression Analysis: Theory and Computing, World Scientific, pp. 1–2, ISBN 9789812834119, Regression analysis … is probably one of the oldest topics in mathematical statistics dating back to about two hundred years ago. The earliest
form of the linear regression was the least squares method, which was published by Legendre in 1805, and by Gauss in 1809 … Legendre and Gauss both applied the method to the problem of determining, from astronomical observations, the orbits of bodies
5. ^ Jump up to:a b Tibshirani, Robert (1996). “Regression Shrinkage and Selection via the Lasso”. Journal of the Royal Statistical Society, Series B. 58 (1): 267–288. JSTOR 2346178.
6. ^ Jump up to:a b Efron, Bradley; Hastie, Trevor;
Johnstone, Iain; Tibshirani, Robert (2004). “Least Angle Regression”. The Annals of Statistics. 32 (2): 407–451. arXiv:math/0406456. doi:10.1214/009053604000000067. JSTOR 3448465. S2CID 204004121.
7. ^ Jump up to:a b Hawkins, Douglas M. (1973).
“On the Investigation of Alternative Regressions by Principal Component Analysis”. Journal of the Royal Statistical Society, Series C. 22 (3): 275–286. doi:10.2307/2346776. JSTOR 2346776.
8. ^ Jump up to:a b Jolliffe, Ian T. (1982). “A Note on the
Use of Principal Components in Regression”. Journal of the Royal Statistical Society, Series C. 31 (3): 300–303. doi:10.2307/2348005. JSTOR 2348005.
9. ^ Berk, Richard A. (2007). “Regression Analysis: A Constructive Critique”. Criminal Justice Review.
32 (3): 301–302. doi:10.1177/0734016807304871. S2CID 145389362.
10. ^ Tsao, Min (2022). “Group least squares regression for linear models with strongly correlated predictor variables”. Annals of the Institute of Statistical Mathematics. arXiv:1804.02499.
doi:10.1007/s10463-022-00841-7. S2CID 237396158.
11. ^ Hidalgo, Bertha; Goodman, Melody (2012-11-15). “Multivariate or Multivariable Regression?”. American Journal of Public Health. 103 (1): 39–40. doi:10.2105/AJPH.2012.300897. ISSN 0090-0036. PMC
3518362. PMID 23153131.
12. ^ Brillinger, David R. (1977). “The Identification of a Particular Nonlinear Time Series System”. Biometrika. 64 (3): 509–515. doi:10.1093/biomet/64.3.509. JSTOR 2345326.
13. ^ Galton, Francis (1886). “Regression Towards
Mediocrity in Hereditary Stature”. The Journal of the Anthropological Institute of Great Britain and Ireland. 15: 246–263. doi:10.2307/2841583. ISSN 0959-5295. JSTOR 2841583.
14. ^ Lange, Kenneth L.; Little, Roderick J. A.; Taylor, Jeremy M. G.
(1989). “Robust Statistical Modeling Using the t Distribution” (PDF). Journal of the American Statistical Association. 84 (408): 881–896. doi:10.2307/2290063. JSTOR 2290063.
15. ^ Swindel, Benee F. (1981). “Geometry of Ridge Regression Illustrated”.
The American Statistician. 35 (1): 12–15. doi:10.2307/2683577. JSTOR 2683577.
16. ^ Draper, Norman R.; van Nostrand; R. Craig (1979). “Ridge Regression and James-Stein Estimation: Review and Comments”. Technometrics. 21 (4): 451–466. doi:10.2307/1268284.
JSTOR 1268284.
17. ^ Hoerl, Arthur E.; Kennard, Robert W.; Hoerl, Roger W. (1985). “Practical Use of Ridge Regression: A Challenge Met”. Journal of the Royal Statistical Society, Series C. 34 (2): 114–120. JSTOR 2347363.
18. ^ Narula, Subhash
C.; Wellington, John F. (1982). “The Minimum Sum of Absolute Errors Regression: A State of the Art Survey”. International Statistical Review. 50 (3): 317–326. doi:10.2307/1402501. JSTOR 1402501.
19. ^ Stone, C. J. (1975). “Adaptive maximum likelihood
estimators of a location parameter”. The Annals of Statistics. 3 (2): 267–284. doi:10.1214/aos/1176343056. JSTOR 2958945.
20. ^ Goldstein, H. (1986). “Multilevel Mixed Linear Model Analysis Using Iterative Generalized Least Squares”. Biometrika.
73 (1): 43–56. doi:10.1093/biomet/73.1.43. JSTOR 2336270.
21. ^ Theil, H. (1950). “A rank-invariant method of linear and polynomial regression analysis. I, II, III”. Nederl. Akad. Wetensch., Proc. 53: 386–392, 521–525, 1397–1412. MR 0036489.; Sen,
Pranab Kumar (1968). “Estimates of the regression coefficient based on Kendall’s tau”. Journal of the American Statistical Association. 63 (324): 1379–1389. doi:10.2307/2285891. JSTOR 2285891. MR 0258201..
22. ^ Deaton, Angus (1992). Understanding
Consumption. Oxford University Press. ISBN 978-0-19-828824-4.
23. ^ Jump up to:a b Krugman, Paul R.; Obstfeld, M.; Melitz, Marc J. (2012). International Economics: Theory and Policy (9th global ed.). Harlow: Pearson. ISBN 9780273754091.
24. ^
Laidler, David E. W. (1993). The Demand for Money: Theories, Evidence, and Problems (4th ed.). New York: Harper Collins. ISBN 978-0065010985.
25. ^ Jump up to:a b Ehrenberg; Smith (2008). Modern Labor Economics (10th international ed.). London:
26. ^ EEMP webpage Archived 2011-06-11 at the Wayback Machine
27. ^ “Linear Regression (Machine Learning)” (PDF). University of Pittsburgh.
28. ^ Stigler, Stephen M. (1986). The History of Statistics: The
Measurement of Uncertainty before 1900. Cambridge: Harvard. ISBN 0-674-40340-1.
2. Cohen, J., Cohen P., West, S.G., & Aiken, L.S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences. (2nd ed.) Hillsdale, NJ: Lawrence
Erlbaum Associates
3. Charles Darwin. The Variation of Animals and Plants under Domestication. (1868) (Chapter XIII describes what was known about reversion in Galton’s time. Darwin uses the term “reversion”.)
4. Draper, N.R.; Smith, H. (1998).
Applied Regression Analysis (3rd ed.). John Wiley. ISBN 978-0-471-17082-2.
5. Francis Galton. “Regression Towards Mediocrity in Hereditary Stature,” Journal of the Anthropological Institute, 15:246-263 (1886). (Facsimile at: [1])
6. Robert S.
Pindyck and Daniel L. Rubinfeld (1998, 4h ed.). Econometric Models and Economic Forecasts, ch. 1 (Intro, incl. appendices on Σ operators & derivation of
Photo credit: https://www.flickr.com/photos/31878512@N06/4540555191/’]