# John Hopkins University Econometrics Problem Set Using STATA Questions

Description

1 attachmentsSlide 1 of 1attachment_1attachment_1.slider-slide > img { width: 100%; display: block; }
.slider-slide > img:focus { margin: auto; }

Unformatted Attachment Preview

Econometrics, Spring 2022
NAME:
FINAL Exam
MULTIPLE CHOICE QUESTIONS(1 point/question,20 points total)-CIRCLE the right
1. Which of the following is true of heteroskedasticity?
a. Heteroskedasticity causes inconsistency in the Ordinary Least Squares estimators.
b. Population R2 is affected by the presence of heteroskedasticity.
c. The Ordinary Least Square estimators are not the best linear unbiased estimators
if heteroskedasticity is present.
d. It is not possible to obtain F statistics that are robust to heteroskedasticity
of an unknown form.
2. Which of the following is a difference between the White test and the BreuschPagan test?
a. The White test is used for detecting heteroskedasticity in a linear regression
model while the Breusch-Pagan test is used for detecting autocorrelation.
b. The White test is used for detecting autocorrelation in a linear regression model
while the Breusch-Pagan test is used for detecting heteroskedasticity.
c. The number of regressors used in the White test is larger than the number of
regressors used in the Breusch-Pagan test.
d. The number of regressors used in the Breusch-Pagan test is larger than the number
of regressors used in the White test.
3. Which of the following is true?
a. In ordinary least squares estimation, each observation is given a different
weight.
b. In weighted least squares estimation, each observation is given an identical
weight.
c. In weighted least squares estimation, less weight is given to observations with a
higher error variance.
d. In ordinary least squares estimation, less weight is given to observations with a
lower error variance.
4. A proxy variable
.
a. increases the error variance of a regression model
b. cannot contain binary information
c. is used when data on a key independent variable is unavailable
d. is detected by running the Davidson-MacKinnon test
5. A measurement error occurs in a regression model when
.
a. the observed value of a variable used in the model differes from its actual value
b. the dependent variable is binary
c. the partial effect of an independent variable depends on unobserved factors
d. the model includes more than two independent variable.
6. Which of the following types of sampling always causes bias or inconsistency in
the ordinary least squares estimators?
a. Random sampling
b. Exogenous sampling
c. Endogenous sampling
d. Stratified sampling
7. Refer to the following model.
– 1 –
yt = ?0 + ?0st + ?1st-1 + ?2st-2 + ?3st-3 + ut
This is an example of a(n):
a. infinite distributed lag model.
b. finite distributed lag model of order 1.
c. finite distributed lag model of order 2.
d. finite distributed lag model of order 3.
8. The model yt = et + ?1et  1 + ?2et  2 , t = 1, 2, .. , where et is an i.i.d.
sequence with zero mean and variance ?2e represents a(n):
a. static model.
b. moving average process of order one.
c. moving average process of order two.
d. autoregressive process of order two.
9. Which of the following is a test for serial correlation in the error terms?
a. Johansen test
b. Dickey Fuller test
c. Durbin Watson test
d. White test
10. A regression model exhibits unobserved heterogeneity if
.
a. there are unobserved factors affecting the dependent variable that change over
time
b. there are unobserved factors affecting the dependent variable that do not change
over time
c. there are unobserved factors which are correlated with the observed independent
variables
d. there are no unobserved factors affecting the dependent variable
11. A pooled OLS estimator that is based on the time-demeaned variables is called
the
.
a. random effects estimator
b. fixed effects estimator
c. least absolute deviations estimator
d. instrumental variable estimator
12. What will you conclude about a regression model if the Breusch-Pagan test
results in a small p-value?
a. The model contains homoskedasticty.
b. The model contains heteroskedasticty.
c. The model contains dummy variables.
d. The model omits some important explanatory factors.
13. Consider the following regression model: log(y) = ?0 + ?1×1 + ?2×12 + ?3×3 + u. This
model will suffer from functional form misspecification if
.
a. ?0 is omitted from the model
b. u is heteroskedastic
c. x12 is omitted from the model
d. x3 is a binary variable
14. Which of the following is true of Regression Specification Error Test (RESET)?
– 2 –
a. It tests if the functional form of a regression model is misspecified.
b. It detects the presence of dummy variables in a regression model.
c. It helps in the detection of heteroskedasticity when the functional form of the
model is correctly specified.
d. It helps in the detection of multicollinearity among the independent variables in
a regression model.
15. Which of the following is a difference between least absolute deviations (LAD)
and ordinary least squares (OLS) estimation?
a. OLS is more computationally intensive than LAD.
b. OLS is more sensitive to outlying observations than LAD.
c. OLS is justified for very large sample sizes while LAD is justified for smaller
sample sizes.
d. OLS is designed to estimate the conditional median of the dependent variable
while LAD is designed to estimate the conditional mean.
16. The model: Yt = ?0 + ?1ct + ut, t = 1,2,.n, is an example of a(n):
a. autoregressive conditional heteroskedasticity model.
b. static model.
c. finite distributed lag model.
d. infinite distributed lag model.
17. Refer to the following model.
yt = ?0 + ?0st + ?1st-1 + ?2st-2 + ?3st-3 + ut
?0 + ?1 + ?2 + ?3 represents:
a. the short-run change in y given a temporary increase in s.
b. the short-run change in y given a permanent increase in s.
c. the long-run change in y given a permanent increase in s.
d. the long-run change in y given a temporary increase in s
18. A seasonally adjusted series is one which:
a. has had seasonal factors added to it.
b. has seasonal factors removed from it.
c. has qualitative explanatory variables representing different seasons.
b. has qualitative dependent variables representing different seasons.
19. The model yt = yt  1 + et, t = 1, 2,  represents a:
a. AR(2) process.
b. MA(1) process.
c. random walk process.
d. random walk with a drift process.
20. In the presence of serial correlation:
a. estimated standard errors remain valid.
b. estimated test statistics remain valid.
c. estimated OLS values are not BLUE.
d. estimated variance does not differ from the case of no serial correlation.
– 3 –
Answer all five questions. Each question is worth
10 points for a total of 50.
QUESTION 1: Time Series
Consider the following dataset and regression
Variable name
Label
lnpq
lntrate
time
Log value of US exports to Mexico
Log of Mexicos tariffs on US exports to Mexico
Month number
. regress lnpq lntrate time
lnpq |
+
lntrate |
time |
_cons |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-1.502397
.0043872
9.329039
.3092717
.0007396
1.253627
-4.86
5.93
7.44
0.000
0.000
0.000
-2.118498
.0029138
6.831687
-.886296
.0058606
11.82639
a) Suppose you show this result to your professor and argue that you have found that
Mexicos tariffs had a significant negative effect on exports to Mexico. She says the
result could be spurious, if both lnpq and lntrate have unit-roots (i.e. are highly
persistent time series). ? Explain what she means by that.
b) You decide to test whether lnpq and lntrate have unit roots, as follows:
. corr lntrate L.lntrate
| lntrate L.lntrate
+
lntrate |
–. |
1.0000
L1. |
0.9757
1.0000
. corr lnpq L.lnpq
|
lnpq
+
lnpq |
–. |
1.0000
L1. |
0.6399
L.lnpq
1.0000
? Explain carefully what these results tell us.
– 4 –
c) Your professor is still not satisfied, and insists you need to run a Dickey Fuller test.
You do the following:
. dfuller lntrate, trend
Dickey-Fuller test for unit root
Test
Statistic
Z(t)
Number of obs
=
77
———- Interpolated Dickey-Fuller ——–1% Critical
5% Critical
10% Critical
Value
Value
Value
-0.164
-4.091
-3.473
-3.164
MacKinnon approximate p-value for Z(t) = 0.9921
dfuller lnpq, trend
Dickey-Fuller test for unit root
Test
Statistic
Z(t)
-4.364
Number of obs
=
77
———- Interpolated Dickey-Fuller ——–1% Critical
5% Critical
10% Critical
Value
Value
Value
-4.091
-3.473
-3.164
MacKinnon approximate p-value for Z(t) = 0.0025
? Explain what these results tell us. Do they agree with your findings in b)?
d) Given these findings, and given that lntrate only changes 8 times in 78 months, would
you recommend taking first differences for this regression? Explain.
– 5 –
QUESTION 2: Panel Data
Consider the following population regression function: Yit = a + b*Xit + ai + eit with
Suppose i=1…N
E(eit|X,a)=0. The ai terms are not observed but we have data on Y and X.
and N=300, while t=1…T and T=10. And suppose we are told the following two facts:
Fact 1: Corr(Xit,ai)=0.7
Fact 2: Corr(eit,eit-1)=0.95
a) Suppose we estimate this equation using OLS. Will the parameter estimates be consistent?
Will the standard errors be valid? Explain.
b) Describe carefully how we could estimate this equation in First Differences. What does
that mean and would it lead to consistent parameter estimates? Explain.
c) Would the standard errors in the equation you propose for part b) be valid? Explain.
– 6 –
d) Describe carefully how we could estimate this equation using Fixed Effects. What does
that mean and would it lead to consistent parameter estimates? Explain.
e) Which method (FD or FE) would be likely to be more efficient? Why?
– 7 –
QUESTION 3: Differences in Differences
Consider the following dataset of workers in South Africa, observed in repeated crosssections (not a panel), before and after a minimum wage law was passed that affected only
domestic service workers:
Variable
Label
loghwage
post
domestic
postXdom
Log hourly wage
=1 if After Minimum Wage Law takes effect
=1 if Domestic Service Worker
Interaction: Post X Domestic
I run a differences-in-differences regression to see if the minimum wage did indeed raise
the wages of domestic workers (i.e. was the law enforced, which is not obvious!):
loghwage |
+
post |
domestic |
postXdom |
_cons |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
.1560798
-.7097982
.1339875
1.568741
.031943
.0478374
.0691446
.0225354
4.89
-14.84
1.94
69.61
0.000
0.000
0.053
0.000
.0934516
-.8035892
-.0015788
1.524558
.218708
-.6160071
.2695537
1.612924
a) What does this regression tell us about the effect of the minimum wage? State the size
of the effect and evaluate its statistical significance at the 5% and 10% levels.
b) Explain the assumptions that are needed if the Differences in Differences estimator is
to provide an unbiased estimate of the effect of the minimum wage change.
– 8 –
c) Fill in the cells in the table below by providing the predicted log wage for each
category:
Domestic Workers
(Min wage changed)
Non-Domestic Workers
(Min wage did not change)
Before
After
d) Now consider the following regression with additional control variables.
loghwage |
+
post |
domestic |
postXdom |
age |
agesq |
educ |
tenure |
tenuresq |
_cons |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
.19254
-.5576953
.0881411
.0660153
-.0006613
.0657971
.0044043
-6.29e-06
-.7256701
.0286959
.0436256
.0620682
.0076873
.0000927
.0035346
.0003869
1.12e-06
.1552253
6.71
-12.78
1.42
8.59
-7.14
18.62
11.38
-5.64
-4.67
0.000
0.000
0.156
0.000
0.000
0.000
0.000
0.000
0.000
.1362782
-.6432286
-.0335511
.0509434
-.000843
.0588671
.0036457
-8.47e-06
-1.030008
? Explain what this tells us about the effect of the minimum wage.
? Why is this different from what we got before?
? Which result seems like the more reliable estimate? Explain.
– 9 –
.2488019
-.472162
.2098334
.0810872
-.0004796
.072727
.0051628
-4.10e-06
-.421332
QUESTION 4: Instrumental variables
Consider the simple linear regression Y = a + b*X + u
a) What is meant by the statement that X is endogenous and what are TWO possible causes
of this problem?
b) What are the consequences of endogeneity?
c) Explain what an instrumental variable is, and provide a clear explanation of what
conditions must be met for Z to be a valid IV for X.
– 10
d) Explain how we can or cannot tell if each of these conditions is met.
e) Provide the formula for the IV estimator (in a regression of Y on X using Z as an
instrument for X) and explain carefully how 2SLS implements the IV estimator.
f) Explain how IV can be used to deal with the problem of omitted variable bias. Where
might we look for appropriate instruments? Provide a concrete example.
– 11
QUESTION 5: Heteroskedasticity
a) Explain and define heteroskedasticity carefully, using mathematical notation.
b) Give an example of how and why heteroskedasticity might arise in economic data.
c) Evaluate this claim: Heteroskedasticity leads to biased parameter estimates and
incorrect standard errors. Explain your response carefully with reference to the MLR
assumptions.
– 12
d) After running the regression from Question 1, I performed the following calculations.
Explain what test I am doing here, how that test works, and what the result is. Is there
evidence of heteroskedasticity?
. predict uhat
. gen uhatsq=uhat^2
. regress uhatsq age agesq educ female domestic femXdom
Source |
SS
df
Coef.
Std. Err.
|.2606388
|-.002223
|.2033015
|-.903941
|-1.786472
|.8012996
|-5.027951
.0048464
.0000591
.0025437
.0217888
.0438379
.0513068
.0979188
MS
+
Model | 1607.96882
6 267.994803
Residual | 75.0107213
993 .075539498
————-+———————————Total | 1682.97954
999 1.68466421
uhatsq |
age
agesq
educ
female
domestic
femXdom
_cons
+
t
Number of obs =
F(6, 993)
Prob > F
R-squared
Root MSE
P>|t|
53.78
-37.65
79.92
-41.49
-40.75
15.62
-51.35
0.000
0.000
0.000
0.000
0.000
0.000
0.000
=
=
=
=
=
1,000
3547.74
0.0000
0.9554
0.9552
.27484
[95% Conf. Interval]
.2511283
-.0023389
.1983098
-.9466984
-1.872498
.7006174
-5.220103
.2701492
-.0021071
.2082931
-.8611837
-1.700447
.9019819
-4.8358
e) Explain in detail how FGLS may be used to correct for heteroskedasticity [Note: this is
a question about FGLS in general, not about the particular approach used for the linear
probability model.]
– 13
– 14

Purchase answer to see full
attachment