?Properties of OLS estimators Population regression line: E(y|x)=? 1+? 2x, Observation = systematic component + random error: yi = ? 1 +? 2 x + ui Sample regression line estimated using OLS estimators: = b1 + b2 x Observation = estimated relationship + residual: yi =+ ei => yi = b1 + b2 x + ei Assumptions underlying model: 1. Linear Model ui = yi – ? 1- ? 2xi 2. Error terms have mean = 0 E(ui|x)=0 => E(y|x) = ? 1 + ? 2xi 3. Error terms have constant variance (independent of x) Var(ui|x) = ? 2=Var(yi|x) (homoscedastic errors) 4. Cov(ui, uj )= Cov(yi, yj )= 0. (no autocorrelation) 5.
X is not a constant and is fixed in repeated samples. Additional assumption: 6. ui~N(0, ? 2) => yi~N(? 1- ? 2xi, ? 2) ?1 , ? 2 are population parameters, Estimators of ? 1 , ? 2 = are OLS estimators of the population parameters. Actual estimates of b1 b2 depend on the random sample of data ? vary between samples => b1 b2 are random variables ? they follow a distribution, i. e. have a mean (expected value) and variance Find expected values, variances of b1 b2 and covariance between them i. e find sampling distribution How do b1 b2 compare with other estimators of ? 1 , ? 2 Sampling Distribution:
Variance of estimator b2: standard error of b2: se(b2) = ? var(b2) Var(b2) varies positively with and negatively with Variance of estimator b1: standard error of b1: se(b1) = ? var(b1) Var(b1) increases in and and decreases in N and where ? 2 is the unknown population variance of ui Covariance: Estimators b1, b2 are a function of yi, (Sample data) level of estimators are correlated because both depend on yi i. e. are functions of the same sample => covariance is negative if > 0 i. e. if slope coefficient is underestimated then the intercept is over estimated Probability Distribution of estimators
Estimators b1, b2 are normally distributed because of assumption 6 i. e. ui~N(0, ? 2) => yi~N(? 1 + ? 2xi, ? 2) b1, b2 : linear functions of normally distributed variable yi i. e. where b2 is a linear function of yi Central Limit Theorem (Lecture 2) implies that distribution of b1, b2 will approach the normal as N (sample size) gets larger, if first five assumptions hold. Probability distributions of OLS estimators: b1 ~ b2 ~ Statistical Properties of OLS estimators 1. Linear Estimator where 2. Unbiasedness Average or Expected value of b2 = true value ? 2 i. e. E(b2) = ? 2 On average OLS gets it right Algebraic Proof
Assume regression model is very simple: yi = ? 2 x + ui, substitute for yi => => => => because E(ui)=E(xu)=0; Implication Sampling distribution of the estimator b2 is centred around the population parameter ? 2 ? b2 is an unbiased estimator of ? 2 Note E(xi ui)=0 is crucial, if this is not true then b2 would be biased estimator i. e. E(b2) = ? 2 + additional term. f(. ) E(b2)= ? 2E(B2)? ?2 B2 would be a biased estimator of ? 2 : f(B2) is not centred at ? 2. b1 is an unbiased estimator of ? 1 also; i. e. E(b1)= ? 1 Unbiasedness property hinges on the model being correctly specified i. e. E(xi ui)=0, E(ui)=0 . Efficiency An estimator is efficient if: it is unbiased no other unbiased estimator has a smaller variance i. e. it has the minimum possible variance (See DG Sect 3. 4 & Fig 3. 8) OLS estimators b1, b2 are the Best Linear Unbiased Estimators of ? 1 ? 2 when the first 5 assumptions of the linear model hold. b1, b2 : linear unbiased efficient, (have smaller variance than any other linear unbiased estimator) BLUE Result is known as Gauss ???Markov Theorem Gauss-Markov Theorem: First 5 assumptions above must hold OLS estimators are the “best” among all linear and unbiased estimators because they are efficient: i. e. hey have smallest variance among all other linear and unbiased estimators Normality is unnecessary to assume, G-M result does not depend on normality of dependent variable G-M refers to the estimators b1, b2, not to actual values of b1, b2 calculated from a particular sample G-M applies only to linear and unbiased estimators, there are other types of estimators which we can use and these may be better in which case disregard G-M e. g. a biased estimator may be more efficient than an unbiased one which fulfills G-M. 3. Consistency Other properties hold for “small” samples: Consistency is a large sample property i. e. symptotic property As n ? ?, the sampling distribution of the estimators b1, b2 collapse to ? 1 ? 2 This holds if var(b1), var(b2) ? 0; True for b1, b2 ? b1, b2 are consistent estimators Estimator for ? 2 ?2 is an unknown population parameter: variance of the unobservable error terms where ei is the residual from the sample regression function unbiased estimator of Hypothesis testing in Regression Probability distributions of OLS estimators: b1 ~ b2 ~ where ei is the residual from the sample regression function Test values of b1 b2 calculated from particular sample against what we believe to be the true value e. g. or consumption example, =250. 18+0. 73 x where y = monthly expenditure and x = monthly income MPC = 0. 73, test if MPC = 0. 75, is difference due to sampling error? (review hypothesis testing from before ???lecture 3) Hypothesis testing Procedure (as before) 1. Formulate null and alternative hypothesis 2. Calculate sample test statistic and specify distribution 3. Selection rejection region and compare to critical value 4. Accept or reject null hypothesis two sided: f(. ) -CV CV Calculate ts = , compare to critical value of ts for rejection region of ? ? reject null hypothesis (hypothesised value is outside of the (1-? % CI ) ? fail to reject null hypothesis (hypothesised value is within the (1-? )% CI Need se(bi) to construct test statistic; Test statistic will have a t distribution Take b2 ~ 1. Transform to a standard normal variable: If , and Z = Z~N(0,1) ~N(0,1) Var(b2) = where ? 2 is unknown Estimator: , We know: ui~N(0, ? 2) ? (ui-0)/? ~N(0,1) Std Normal ? are both unobservable i. e. come from population regression function ui = yi – ? 1- ? 2xi ?we estimate them from the sample regression function ei=yi – b1- b2 x but has only N-2 independent ? 2 variables because 2 observations are “lost”: in calculating b1 b2 ? then substitute ? 1. ~N(0,1) 2. Student-t distribution: Z~N(0,1); V~ then: t = If , and , Substituting in for var(b2): , then ? 2 cancel, N-2 cancel: ?test statistic for b1, b2 follows a t-distribution with N-2 d. f. ?Have specified distribution for tests on estimates, b1, b2, Select 1 or 2 sided test: In Probability terms: Two ??? sided: P(tt? /2) = ? /2 ? P(-t? /2 < t < t? /2)=1-? One ??? sided P( t < -tc )= P(t > tc) = ? For many degrees of freedom and ? = 5%, i. e. 95% level of significance tc=1. 96. Confidence intervals: ? P(-t < < t)= 1-? P(b2 – t? /2. se(b2) < ? < b2 + t? /2. se(b2)) = 1-? Similarly for b1 P(b1 – t? /2. se(b1) < ? 1 < b1 + t? /2. se(b1)) = 1-? Hypothesis Testing 1. Formulate null and alternative hypothesis: alternative depends on 1 or 2 tailed test e. g. H0: ? 2 = 0 , H1: ? 2 ? 0 (two sided) 2. Specify test statistic and appropriate distribution ? t = ~ t? /2N-2 3. Choose rejection region: ? 4. Calculate test statistic for sample 5. Reject /Fail to reject the null hypothesis if P(| t |> t? /2) reject null hypothesis (two sided) if P(| t |>t? ) reject null hypothesis (one sided) 6. State Conclusion Prediction
For a given x: x0, use estimates b1 b2 to predict y0: E(y|x0) = = b1 + b2 x0 differs from actual outcome b1, b2 are estimates and not always = ? 1 ? 2 randomness occurs Measuring goodness of fit: How close is sample regression to population regression i. e. how well does estimated model “fit” the data? yi=b1+b2 xi+ ei ? systematic component that is estimated plus a residual Explained Variation: b1+b2 xi Unexplained Variation: ei = yi – = yi – b1 – b2 xi yi =+ ei yi-= (-)+ ei yi-: total variation around the mean -: variation of fitted values around the mean Total =Explained + Residual um ofsum ofsum of squaressquaressquares TSS =ESS+RSS If sample regression line fits perfectlyL TSS=ESS i. e. ESS/TSS = 1 If sample regression line is very poor: ESS/TSS=0. R2 is used a measure of good fit: R2 = ESS = TSS ??? RSS = 1- RSS = 1- ? iei2 TSS TSSTSS R2 is known as the coefficient of determination: it is a descriptive statistic and should not be used as a measure of the quality of the model. Measures proportion of variation explained by the linear model In simple linear model: ? R2 = R = i. e the correlation coefficient between x and y 0 < R2 < 1 ? -1 < < 1