Note that though as sample responses, are observable, the following statements and arguments including assumptions, proofs and the others assume under the only condition of
knowing but not The Gauss–Markov assumptions concern the set of error random variables, : • They have mean zero: • They are homoscedastic, that is all have the same finite variance: for all and • Distinct error terms are uncorrelated: A linear
estimator of is a linear combination in which the coefficients are not allowed to depend on the underlying coefficients , since those are not observable, but are allowed to depend on the values , since these data are observable.
The main idea of the proof is that the least-squares estimator is uncorrelated with every linear unbiased estimator of zero, i.e., with every linear combination whose coefficients
do not depend upon the unobservable but whose expected value is always zero.
(Since we are considering the case in which all the parameter estimates are unbiased, this mean squared error is the same as the variance of the linear combination.)
The best linear unbiased estimator (BLUE) of the vector of parameters is one with the smallest mean squared error for every vector of linear combination parameters.
In statistics, the Gauss–Markov theorem (or simply Gauss theorem for some authors) states that the ordinary least squares (OLS) estimator has the lowest sampling variance
within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero.
For example, the Cobb–Douglas function—often used in economics—is nonlinear: But it can be expressed in linear form by taking the natural logarithm of both sides: This
assumption also covers specification issues: assuming that the proper functional form has been selected and there are no omitted variables.
Remarks on the proof As it has been stated before, the condition of is a positive semidefinite matrix is equivalent to the property that the best linear unbiased estimator
of is (best in the sense that it has minimum variance).
Proof Let be another linear estimator of with where is a non-zero matrix.
The independent variables can take non-linear forms as long as the parameters are linear.
 The errors do not need to be normal, nor do they need to be independent and identically distributed (only uncorrelated with mean zero and homoscedastic with finite variance).
Strict exogeneity For all observations, the expectation—conditional on the regressors—of the error term is zero: where is the data vector of regressors for the ith
observation, and consequently is the data matrix or design matrix.
This does not mean that there must be a linear relationship between the independent and dependent variables.
The MSE function we want to minimize is for a multiple regression model with p variables.
An equation with a parameter dependent on an independent variable does not qualify as linear, for example , where is a function of .
One should be aware, however, that the parameters that minimize the residuals of the transformed equation do not necessarily minimize the residuals of the original equation.
The equation qualifies as linear while can be transformed to be linear by replacing by another parameter, say .
Data transformations are often used to convert an equation into a linear form.
Then the mean squared error of the corresponding estimation is in other words it is the expectation of the square of the weighted sum (across parameters) of the differences
between the estimators and the corresponding parameters to be estimated.
The random variables are called the “disturbance”, “noise” or simply “error” (will be contrasted with “residual” later in the article; see errors and residuals in statistics).
This is equivalent to the condition that is a positive semi-definite matrix for every other linear unbiased estimator .
The term “spherical errors” will describe the multivariate normal distribution: if in the multivariate normal density, then the equation is the formula for a ball centered
at μ with radius σ in n-dimensional space.
One scenario in which this will occur is called “dummy variable trap,” when a base dummy variable is not omitted resulting in perfect correlation between the dummy variables
and the constant term.
Remark Proof that the OLS indeed MINIMIZES the sum of squares of residuals may proceed as follows with a calculation of the Hessian matrix and showing that it is positive
[‘1. See chapter 7 of Johnson, R.A.; Wichern, D.W. (2002). Applied multivariate statistical analysis. Vol. 5. Prentice hall.
2. ^ Theil, Henri (1971). “Best Linear Unbiased Estimation and Prediction”. Principles of Econometrics. New York: John Wiley
& Sons. pp. 119–124. ISBN 0-471-85845-5.
3. ^ Plackett, R. L. (1949). “A Historical Note on the Method of Least Squares”. Biometrika. 36 (3/4): 458–460. doi:10.2307/2332682.
4. ^ David, F. N.; Neyman, J. (1938). “Extension of the Markoff theorem
on least squares”. Statistical Research Memoirs. 2: 105–116. OCLC 4025782.
5. ^ Jump up to:a b Aitken, A. C. (1935). “On Least Squares and Linear Combinations of Observations”. Proceedings of the Royal Society of Edinburgh. 55: 42–48. doi:10.1017/S0370164600014346.
Jump up to:a b Huang, David S. (1970). Regression and Econometric Methods. New York: John Wiley & Sons. pp. 127–147. ISBN 0-471-41754-8.
7. ^ Hayashi, Fumio (2000). Econometrics. Princeton University Press. p. 13. ISBN 0-691-01018-8.
8. ^ Walters,
A. A. (1970). An Introduction to Econometrics. New York: W. W. Norton. p. 275. ISBN 0-393-09931-8.
9. ^ Hayashi, Fumio (2000). Econometrics. Princeton University Press. p. 7. ISBN 0-691-01018-8.
10. ^ Johnston, John (1972). Econometric Methods
(Second ed.). New York: McGraw-Hill. pp. 267–291. ISBN 0-07-032679-7.
11. ^ Wooldridge, Jeffrey (2012). Introductory Econometrics (Fifth international ed.). South-Western. p. 220. ISBN 978-1-111-53439-4.
12. ^ Johnston, John (1972). Econometric
Methods (Second ed.). New York: McGraw-Hill. pp. 159–168. ISBN 0-07-032679-7.
13. ^ Hayashi, Fumio (2000). Econometrics. Princeton University Press. p. 10. ISBN 0-691-01018-8.
14. ^ Ramanathan, Ramu (1993). “Nonspherical Disturbances”. Statistical
Methods in Econometrics. Academic Press. pp. 330–351. ISBN 0-12-576830-3.
Photo credit: https://www.flickr.com/photos/pyride/12934201824/’]