Description
5 attachmentsSlide 1 of 5attachment_1attachment_1attachment_2attachment_2attachment_3attachment_3attachment_4attachment_4attachment_5attachment_5.slider-slide > img { width: 100%; display: block; }
.slider-slide > img:focus { margin: auto; }
Unformatted Attachment Preview
Problem Set 2, Econ 120C
Yixiao Sun
Question I Read the following Stata program
clear
cap postclose tempid
postfile tempid beta reject1 reject2 ///
using mydata.dta,replace
forvalues i = 1(1)1000 {
drop _all
quietly set obs 1000
gen x = rnormal()
gen e = rnormal()
gen u = x + e
gen y = 3*x + u
quietly reg y x, r
scalar beta = _b[x]
qui
sca
qui
sca
test x = 3
reject1= (r(p) < 0.10)
test x = 4
reject2 = (r(p) < 0.10)
post tempid (beta) (reject1) (reject2)
}
postclose tempid
use mydata.dta, clear
sum
Part of the output is given in the table below:
Variable Mean
beta
reject1
r1
reject2
r2
Answer the following questions with detailed arguments.
(a) What would you expect the value of to be?
(b) What would you expect the value of r1 to be?
1
Std. Dev.
#
1
2
(c) What would you expect the value of r2 to be?
(d) What would you expect the value of 1 to be?
(e) What would you expect the value of 2 to be?
Question II Consider the linear regression model
Yi =
+ Xi + Wi + ui
Given the sample (Xi ; Wi ; Yi )ni=1 ; we run the following regressions
(i) Regress Y on X and an constant/intercept. Denote the OLS estimator of the slope on X
as ^ short :
(ii) Regress W on X and an constant/intercept. Denote the OLS estimator of the slope on
X as ^:
(iii) Regress Y on X and W and an constant/intercept. Denote the OLS estimators of ; ;
by ^ long ; ^ long and ^ long and de
ne
u
^i = Yi
(^ long + Xi ^ long + Wi ^ long ):
^ (^
By construction, we know that E
u) = 0; cov
c (X; u
^) = 0 and cov
c (W; u
^) = 0:
(a). Write down the formulae for ^ short and ^:
(b). Show that
^
^
^
short = long + ^ long :
Hint: Plug the de
nition Yi = ^ long + Xi ^ long + Wi ^ long + u
^i into the formula for ^ short and
then use the operating rules for sample covariances to simplify.
Question III In a 1994 research paper, two economists examined the impact of looks on
earnings using interviewers ratings of respondents physical appearance. They used data in
which respondents reported their wages, and the interviewers rated the respondents appearance,
using
ve categories: (1. homely, 2. quite plain, 3. average, 4. good looking and 5. strikingly
handsome or beautiful). Download the
le beauty.xls from the course web page. The
le contains
the following variables: hourly earning, looks, female (is equal to 1 if female, 0 otherwise) and
years of education. Assume that a population consists of all individuals in the data set.
(a) Estimate the following regression
ernhr =
+
looks +
yrseduc + u
(1)
using the population data (i.e., use all data). What is the value of (the estimated) ?
Denote this value as 0 :
(b) (i) Draw a simple random sample of size 100.
(ii) Estimate (1) using the sample you obtained.
(iii) Test the null hypothesis H0 : = 0 against H1 : 6= 0 using = 10% as the size of
the test. Let reject be a dummy variable indicating whether H0 is rejected (reject=1 if H0 is
rejected).
(iv) Save ^ OLS , its robust standard error, and the dummy variable reject to a Stata data
set, say mydata.dta. The data set contains three variables, say beta_hat, se_hat, reject
(You can use your own favorite names for the data set and the variables).
2
(c) Repeat (b) 1000 times.
(d) Now load the Stata data set mydata.dta into Stata and graph the histogram of beta_hat.
Is it close to be normal?
(e) Summarize all variables. Is the mean of reject close to be 10%? Is your answer expected?
Can you explain why or why not?
(f) Is the standard deviation of beta_hat close to the mean of se_hat? Is your answer
expected? Can you explain why or why not?
Note: A sample Stata program is posted on the course home page. You are encouraged to
write your own program. The sample program focuses on instead of and so you have to
modify it to accommodate this and other di¤erences.
This question is designed to help you understand the sampling distribution of the OLS estimator. It is also a very good practice for Stata programming.
3
ERNHR LOOKS FEMALE YRSEDUC
5.725481 4 1 14
4.284616 3 1 12
7.963461 4 1 12
11.57033 3 0 16
11.41827 3 0 16
3.90625 2 1 12
8.760684 3 0 16
7.692307 4 0 16
5 2 1 16
3.894231 2 1 12
3.445513 2 1 12
4.030769 2 0 16
5.137362 2 0 17
2.999231 2 0 16
7.988166 4 0 16
6.009615 4 0 16
5.164835 3 0 17
11.53846 4 0 17
10.43956 4 1 17
7.692307 3 1 16
7.692307 4 0 15
6.793334 4 0 14
6.868132 4 0 12
17.03297 2 0 17
10.05245 2 0 13
15.81197 2 0 13
14.83517 2 0 13
19.08007 4 0 17
8.345428 3 0 13
9.615385 3 0 12
5.961538 2 0 10
5.725481 2 0 10
6.730769 3 0 10
8.173077 4 0 13
12.39316 4 0 12
9.615385 3 0 13
4.825175 2 0 10
7.692307 3 0 9
12.5 4 0 11
2.821906 3 0 9
12.30769 4 0 17
13.22115 4 0 16
16.82692 4 0 16
11.53846 3 0 16
4.945055 3 1 9
7.211538 3 0 11
11.58371 3 0 12
15.38461 3 0 17
6.102071 2 0 11
7.932693 3 0 17
9.157509 4 1 17
6.912682 2 1 17
20.18084 3 0 17
9.925558 3 0 17
7.788462 4 0 13
5.128205 3 0 10
10.98901 4 0 16
6.555944 2 0 13
10.09615 3 0 17
10.73601 3 0 16
16.34615 3 0 17
4.615385 2 1 12
7.489879 3 0 17
19.23077 3 0 17
4.771234 2 1 17
7.142857 2 1 17
14.42308 4 0 17
9.278846 4 0 17
29.98297 4 0 17
32.79387 4 0 16
9.615385 3 0 16
8.361204 4 1 16
21.63461 4 0 17
15.90006 4 0 17
5.769231 2 0 10
20.98808 4 0 17
23.32009 4 0 17
4.965035 2 0 11
6.888634 2 0 12
5.725481 3 1 13
5.192307 3 0 13
7.79359 3 1 17
8.791209 3 1 16
9.317307 4 1 17
8.333333 3 1 17
10.12404 4 1 13
1.975481 2 1 8
8.241758 4 1 17
5.048077 2 1 14
6.552707 3 1 16
4.471154 3 0 10
7.963461 3 1 16
6.25 2 0 17
4.134615 3 0 16
6.730769 3 1 13
7.692307 3 1 17
4.349817 2 1 14
7.692307 3 0 14
4.807693 2 0 9
5.244755 3 0 9
2.142857 2 0 8
4.487179 3 0 10
3.605769 2 0 10
5.448718 2 0 12
5 4 0 13
6.730769 3 0 12
10.05128 3 0 17
6.538462 4 0 12
9.203671 4 1 17
4.615385 3 1 16
3.365385 4 1 13
5.53613 2 1 17
6.10718 3 1 12
4.990385 3 1 13
5.879121 3 1 17
5.659616 4 1 16
3.998974 3 0 16
6.868132 3 1 16
4.373024 2 1 17
5.769231 3 0 13
5.769231 4 0 14
12.30769 4 0 17
19.08007 2 0 17
7.554945 4 1 17
5.90035 3 1 17
6.370769 3 1 17
6.153846 4 0 17
1.923077 2 1 17
3.269231 3 0 17
2.179487 4 0 17
5.616606 3 0 17
9.294871 4 1 17
6.410256 3 1 17
7.45842 3 1 16
6.586538 2 1 17
6.730769 2 1 17
5.608974 3 1 16
3.653846 4 1 16
13.01282 4 0 17
6.968326 3 0 17
7.692307 3 1 17
6.730769 3 0 16
5.128205 3 1 17
2.136752 4 1 16
4.529914 2 1 14
9.101099 2 1 17
9.78022 3 1 17
3.634615 2 1 13
7.211538 4 1 17
10.61795 4 1 17
4.615385 2 1 17
6.538462 2 1 14
12.11539 4 1 17
7.211538 4 1 16
6.25 4 1 16
3.846154 2 1 17
3.030769 3 1 16
2.010489 2 1 16
3.895105 2 1 17
6.346154 3 1 17
2 2 1 14
7.692307 4 0 17
2.633974 2 1 12
5 3 1 17
6.543406 3 1 17
5.192307 3 1 16
9.745421 4 1 16
2.808989 2 1 13
4.615385 3 0 17
7.326007 3 0 17
7.692307 4 0 17
3.259615 3 0 17
5.769231 2 1 17
5.594406 3 0 10
5.128205 3 0 10
14.74359 4 0 17
3.671329 3 1 9
9.731935 4 0 19
3.461539 3 0 9
6.370769 4 0 12
6.581197 2 1 12
10.14957 4 0 16
9.230769 3 0 17
6.749359 3 0 11
10.21635 3 1 17
4.759615 3 0 9
1.755983 3 1 17
12.73389 4 0 17
5.85 4 0 10
3.173077 4 0 9
5.994231 4 0 11
4.615385 3 0 9
1.795892 4 0 9
5.952381 3 0 11
7.45633 3 0 14
8.699634 4 0 15
3.749038 2 0 9
7.692307 3 0 14
5.244755 3 0 11
10.09615 2 0 16
Causality
and Causal
Model:
Bivariate
Case
©Yixiao Sun
1
Causality and Causal Model: Definition
? There are many ways to define causality and causal
models
? I present one of many possibilities here.
? Let us first consider the simple case with only two
scalar variables y and x.
? Both variables are deterministic.
? We want to define the notion x causes y
? Either there are no other variables or other variables
do not interfere the relationship between x and y.
©Yixiao Sun
2
SET x, FORCE x, INTERVENE x
? Our definition involves the following conceptual steps:
1. We (as the experimenter) set x to each of its possible
values (or force x to take each of its possible values or
intervene and let x take each of its possible values )
2. We let y respond freely without any further
intervention, direct or indirect.
3. We observe that y takes a unique value for each
setting of x (setting: a particular value that x takes)
4. We ask: does y take different values for different x? If
yes, then x causes y. Otherwise, x does not cause y.
©Yixiao Sun
3
Causality and Causal Model: Definition
? Let c(x) be the unique value of y for each x.
? We call c(x) the response function.
? Mathematically, if c(x) is not a constant function, then
we say that x causes y. In this case, we call c(x) the
causal function.
? When x causes y, we write
y ? c(x).
Here we use an arrow ? instead of the equal sign =
to indicate the causality and causal direction; that is, x is
set and y is free to respond (x is the cause, y is the effect).
©Yixiao Sun
4
No Intervention, No Causality
? We call a variable a settable variable if we can intervene at
will to set it to any desired value.
? Some variables are not settable: race,
? We call the variable a free variable if we do not
intervene to set its value.
? Here x is a settable variable and y is a free variable.
? The importance of the formal role played here by
intervention/setting cannot be over-emphasized.
? The notion of cause and effect we adopt here will have
meaning only in the context of intervention, whether actual
or merely hypothetical.
©Yixiao Sun
5
Causality and Causal Model: Notation
? A main source of confusion in causal inference is the bad
notation.
? Instead of using y ? c(x), we almost always use y =
c(x).
? The notation y ? c(x) signifies that the lhs and rhs are
not exchangeable. causes are different from effects.
? The math equality y = c(x) encodes the exchangeability
of the two sides: y = c(x) iff c(x) = y.
? We use the new and better notation only in this and few
future lectures, as the conventional notation has been so
ingrained in the literature.
©Yixiao Sun
6
Example 1
? y : earnings/wages;
x : years of schooling
Assume that only years of schooling matter for earnings
and nothing else matters so that we have a two-variable
system. This, of course, is an abstraction of the reality.
? To see whether x causes y, we (the experimenter) force
an individual to have 8, 9, 10 years of schooling and
observe the corresponding earnings.
? We ask: do different years of schooling lead to different
earnings?
? The answer is most likely yes. So x causes y.
©Yixiao Sun
7
Example 2
? y : 1 or 0 indicating whether it will rain or not
x : percentage of individuals carrying an umbrella.
? To see whether x causes y, we (the experimenter) set
the percentage at different levels by forcing
individuals to carry or not carry an umbrella. We
examine whether the weather will change in response
to the percentage of individuals carrying an umbrella.
? Key: individuals can not make their own decisions on
whether to carry an umbrella or not. We set their xs.
? In this example, clearly y will not change.
? So x does not cause y in this example.
©Yixiao Sun
8
Example 2 (continued)
? y : 1 or 0 indicating whether it will rain or not
x : percentage of individuals carrying an umbrella.
? If we do not set the value of their x and let individuals
make their own decisions (based on whatever information
they may have) to carry an umbrella or not (we become
passive observers instead of active experimenters), then x
can be useful to predict y.
? Here x does not cause y, but in an observational study
(instead of an experimental study) x can be very useful as a
predictor for y.
©Yixiao Sun
9
Causality vs Prediction
? Causality and Prediction are fundamentally different.
Causality: how things actually work. We care about the
physical, chemical, biological, and economic laws. We want
to understand the causal structure. We often refer to a
causal model as a structural model.
Prediction: how things are related, associated, move
together. We do not care about the physical, chemical,
biological or economic laws. As long as two variables move
together, we can use one variable to predict the other.
The measure for association or co-movement is: correlation
coefficient (at least in the linear case).
Correlation coefficient is purely a statistical object. It does
not have to contain any physical, chemical, biological, or
economic meaning.
©Yixiao Sun
10
Causality and Causal Model:
Multivariate Case
©Yixiao Sun
11
Causality and Causal Model: a definition
? Our definition of causality and causal model can be
extended to a multivariate case.
? Consider the special case that x = (xf , xo) where xf is a
scalar variable and is the focus of interest and xo
consists of all other variables.
? In our returns-to-education example, we can let xf be
the years of schooling and xo be the innate ability.
? We want to assess whether x causes y.
©Yixiao Sun
12
SET x, FORCE x, INTERVENE x
? We repeat the same definition:
1. We (as the experimenter) set x to each of its possible
values (or force x to take each of its possible value, or
intervene and let x take each of its possible values)
2. We let y respond freely without any further
intervention, direct or indirect.
3. We observe that for each setting of x, y takes a unique
value c(x) = c(xf , xo).
4. We ask: is c(x) a constant function? If yes, then x
does not cause y. Otherwise, x causes y.
©Yixiao Sun
13
Question: does xf cause y?
? If there is a setting, say x ?o , such that the function
defined by
c ?o ?x f ? ? c?x f , x ?o ?
is not a constant function, then we say that xf causes y
under the setting x o ? x ?o.
x ?o
? In this case, we write y ? x f
? If the above function is a constant function for all
values of x o , then we say that xf does not cause y.
©Yixiao Sun
14
Example: does xf cause y?
In our returns-to-education example, xf is the years of
schooling and xo is the innate ability.
We ask: do earnings change in response to years of
schooling for some level of innate ability?
If earnings do not change in response to years of
schooling for any level of innate ability, then years of
schooling does not cause earnings.
If earnings DO change in response to years of
schooling for some level of innate ability, then years of
schooling causes earnings for this level of innate
ability.
©Yixiao Sun
15
Question: does xf cause y?
? To address this question, we have to keep all other
variables at some level, say x ?o , and examine whether y
will change in response to different settings of xf .
? Sometimes we refer to xo as the background variable.
? We keep xo at a given level so that it will not confound
the causal relationship between xf and y. However, the
strength of the causal link between xf and y may depend
on the value of xo
? For one setting of xo, xf causes y; For another setting of
xo, xf does not cause y. Causality and its magnitude
established for one sample may be not applicable to the
population (the problem of external validity).
©Yixiao Sun
16
ceteris paribus effect
? Given this definition, we can define a notion of
ceteris paribus effect, that is, the effect of one
variable holding all others equal.
? Ceteris paribus is a Latin phrase meaning other
things equal.
? When xf is a continuous variable and c is
differentiable, this effect is defined to be
??x f , x o ? ?
? If xf is discrete, we define
?c?x f ,x o ?
?x f
? ? ?x f , x o ? ? c?x f ? ?, x o ? ? c?x f , x o ?
©Yixiao Sun
17
Example
? Demand curve: q ? c(p,o) where p is the price, o
consists of all other factors, q is the quantity demanded.
? The familiar demand curve describes an economic law.
It traces the quantities demanded for all possible values
of prices. Some of the prices are actually observed in
the market and others may be not. It is a structural
model that describe the behavioral of the consumers.
? If the price of beef increases ceteris paribus the
quantity of beef demanded by buyers will decrease.
? What is in o?
? price of substitute goods (pork, lamb,
)
? consumers preference (societal shift toward
18
vegetarianism)
Interpretation
? If the causal relationship is linear (a big assumption),
then we have
y ? ?? ? x f ? ? x o ?
? The linear causal model says that if we change xf by 1
unit, then y will change by ? units, all else being
equal.
? I can not stress enough how important the all else
being equal condition is. Under this condition, we
can think that the change in xf is induced or set by us,
that is, we intervene and force xf to change 1 unit but
keep all else constant.
©Yixiao Sun
19
Causality and Causal Model:
Bring Models to Data and Linear Causal Model
©Yixiao Sun
20
Bring Causal Models to Data
? We need to connect the deterministic causal model
with our data, which are often observational.
? Suppose that the values of x = (xf , xo) we want to set
are iid draws from a certain distribution. That is, the
settings of x are: ?X i ? ?X fi , X oi ??
? Either we (as the experimenter) pick these values from
a distribution or we draw these values from a
population (sometimes Nature draws these values for
us)
©Yixiao Sun
21
Bring Causal Models to Data
..
.. .
. .
.. . . .
.
.
.
.. .. .
. . .
. .
..
Sampling
X 1 ? ?X f1 , X o1 ?
X 2 ? ?X f2 , X o2 ?
...
X n ? ?X fn , X on ?
POPULATION
©Yixiao Sun
22
Bring Causal Models to Data
? Let Yi be the realized value of y in the absence of any
intervention for y. We have:
Yi ? c?X fi , X oi ? or mathematically Yi ? c?X fi , X oi ?
X 1 ? ?X f1 , X o1 ? ? Y1
X 2 ? ?X f2 , X o2 ? ? Y2
...
X n ? ?X fn , X on ? ? Yn
? The causal relation is deterministic, but the settings of the
causal factors are stochastic and hence the outcome of
interest is stochastic.
©Yixiao Sun
23
Linear Causal Model
? In the case that the causal function is linear, we have
Yi ? ?? ? X fi ? ? X oi ?
? Now suppose X oi is not observed, we have
Yi ? ?? ? E?X oi ?? ?X fi ? ?X oi ? ? E?X oi ??
ui
?
? ? ? X fi ? ? u i
where
? ? ?? ? E?X oi ?? and u i ? X oi ? ? E?X oi ??
? Here ui captures the effect of all centered and
unobserved causal factors.
©Yixiao Sun
24
Linear Causal Model
Yi ? ?? ? E?X oi ?? ?X fi ? ?X oi ? ? E?X oi ??
ui
?
? ? ? X fi ? ? u i
? We have gone a long way in order to obtain a linear causal
model, which may look familiar.
? We should not take the simple linear causal model for
granted: there are important assumptions underlying the
model, for example, the linear and separable form of the
causal relationship.
Linearity: the effect X fi ? is linear in X fi
Separability: X fi ? and ui are additively separable
©Yixiao Sun
25
Nonlinear and Non-separability
? To understand linearity and separability, we consider two counter
examples.
? Example 1: a nonlinear but separable relationship:
Yi ? ? ? X fi ? ? X 2 ?? ? u i
fi
? Example 2: a nonlinear and non-separable relationship:
Yi ? ? ? X fi ? ? X 2 ? u i ?? ? u i
fi
? What is the ceteris paribus causal effect in each case?
? Example 1. The effect depends on xf but not on xo or u
??x f , x o ? ?
?c?x f ,x o ?
?x f
?
? ??x f ??x 2f ?? ?u
?x f
?
? ? ? 2x f ?
? Example 2. The effect depends on both xf and xo (i.e., u).
??x f , x o ? ?
?c?x f ,x o ?
?x f
?
? ??x f ?? x 2f ?u ?? ?u
?x f
©Yixiao Sun
? ? ? 2x f ? u ? ??
26
Linear Causal Model
Yi ? ?? ? X fi ? ? X oi ?
??x f , x o ? ?
?c?x f ,x o ?
?x f
? ??x f , x o ? ?
???? ?x f ??x o ??
?x f
? ?.
Linearity and separability imply that the causal effect is a
constant. It is the same for all individuals.
This is an important restriction.
©Yixiao Sun
27
Prediction Analysis versus
Causal/Structural Inference
Yixiao Sun
1
Predictive Model: Review
I
Given two random variables (X , Y ) , suppose we want to
predict Y based on X .
I
The starting point of a linear predictive model is to de
ne
?1
?0
cov (X , Y )
and
var (X )
= EY (EX ) ? .
=
I
These are well de
ned as long as var (X ) < ? and
var (Y ) < ?.
I
Note that these are purely statistical objects. They may not
contain any physical, chemical, biological, or economic
meaning.
2
Predictive Model: Review
I
With these de
nitions, we de
ne e to be the di¤erence
between Y and the linear function ?0 + X ?1 :
e=Y
I
( ?0 + X ?1 ) .
I want to emphasize that this is just a mathematical
de
nition. The mathematical equation can be rewritten as
Y = ( ?0 + X ?1 ) + e.
I
We add whatever it should be to bring ?0 + X ?1 to Y .
I
The added amount may not represent any real e¤ect.
3
Passive Prediction: Prediction
I
Passive Prediction is the use of predictive analytics to make
predictions based on the data for which no predictor is
exogeously changed. That is, the data are passively observed.
I
Example: Predict whether a visitor to a website will make a
purchase, based on the observed browsing behavior within
that website.
I
In making her prediction, the analyst assesses the relationship
between a purchase and observed information only; she does
not consider how purchasing behavior might be a¤ected if the
visitors browsing experience were changed, e.g., by changing
a banner ad to an automatic pop-up.
4
Passive Prediction
I
The analyst may have data on visits by many individuals, and
use those data to establish a predictive relationship between
browsing behavior and purchasing (e.g., by estimating a
regression model).
I
Then using the predictive relationship, she can make
predictions for a given visitor based on his observed browsing
behavior.
I
She can also make predictions based on hypothetical browsing
behavior (i.e., what if a visitor browse page 2 for 30 seconds,
page 5 for 45 seconds, etc.?)
5
Passive Prediction vs Pattern Discovery
I
When used to make predictions, pattern discovery (pattern
recognition) is generally used for passive prediction.
I
Conceptually, if we discover a distinctive patterns among a set
of variables in a given dataset, we would expect that this
pattern will emerge again when the same variables are
collected without interference.
6
Passive Prediction vs Pattern Discovery
I
For example, we may discover the following pattern: whether
a student graduates from high school is related to his
mothers level of education.
I
Then we can use the observed level of education for a
students mother to predict whether the student will graduate
from high school.
I
The above problem is fundamentally di¤erent from predicting
the outcome when we exogenously change the level of
education the mother attained.
7
Passive Prediction vs Pattern Discovery
I
For example, we may discover the following pattern: whether
a student graduates from high school is related to his
mothers level of education.
I
Then we can use the observed level of education for a
students mother to predict whether the student will graduate
from high school.
I
The above problem is fundamentally di¤erent from predicting
the outcome when we exogenously change the level of
education the mother attained.
8
Passive prediction: Econ120C performance
I
I collected some data from my past Econ120C students
I
I
I
Xi : Econ120B Professor, Econ120B Grade, Hours of study per
week, % of class attended live, Time spent on class webpage
Yi : weighted score
Passive prediction:
I
I
I
Draw a student randomly from the class.
Reveal only his/her X
Predict his/her Y
9
Causal Model
I
For a linear causal model
Y
?+X?+u
where u stands for other and possibly unobserved causal
factors.
I
Interpretation of ? : If we intervene and set X to change by 1
unit while keeping all elseconstant, then Y will change by ?
units.
I
The di¤erence between ? and ? lies in whether all else has
been kept as equal.
10
Causal Model: Active Prediction
I
If we want to predict the consequence of some action on an
outcome of interest, we need to establish a causal or
structural relationship.
11
Causal Model: Active Prediction Example 1
I
You may want to predict the impact on your
nal grade in
Econ120C if you increase your study time by one hour per
week.
I
Implicitly, you assume that all else have been kept as equal.
I
So you are making an active prediction.
12
Causal Model: Active Prediction Example 2
I
Suppose you care about your body weight two months from
now (call this Y ).
I
Currently, you do not eat whole grains but are considering
switching to a whole-grain-only diet (call this change in diet
X ).
I
Then you may want to make an active prediction of Y based
on X .
13
Causal Model: Active Prediction 3
I
Do you notice music playing in retail stores?
I
Studies show that music can signi
cantly inuence sales.
I
Actively vary music tempo played in stores from very slow to
quick and observe the results on sales.
I
Slower music causes shoppers to shop more slowly, resulting in
higher sales.
14
A comparison
Show that the two models coincide if cov (X , u ) = 0
15
A comparison
Model
Correlation
Interpretation of the slope
Model
Correlation
Interpretation of the slope
Predictive Analysis
Y = ? +X? +e
By construction cov (X , e ) = 0
other variables run their own course
all else may not be equal
Causal Inference
Y = ?+X?+u
Cov (X , u ) may not be zero.
all other variables kept constant
16
Example 1: No causality does not imply no predictability
Suppose that we have the following simple casual relations:
y
z
a
x
z
b
for b 6= 0. Graphically,
z
x
.
&
y
Let z be generated as a sequence of iid random variables Zi so
that in the absence of intervention for x and y we observe
Xi
= Zi
b,
Yi
= Zi
a=
a
Xi .
b
17
Example 1: No causality does not imply no predictability
18
Example 1: No causality does not imply no predictability
I
For the purpose of this example, we assume that we do not
observe Zi0 s.
I
Our observations consist of (Xi , Yi ) lying on the line
y = (a/b ) x.
I
Given any Xi , the best prediction of Yi is m (Xi ) = (a/b ) Xi .
I
Thus Xi is useful for predicting Yi , even though there is no
causal relation between Xi and Yi . Furthermore, the
regression coe¢ cient a/b de
nitely does not measure the
e¤ect on y caused by a change in x.
I
Intervening to change x (while keeping z constant) has no
e¤ect on y . Instead, the regression coe¢ cient a/b works
together with Xi to give an optimal prediction of Yi .
19
Example 1: No causality does not imply no predictability
I
I
I
In any equation system, like the one above, if we intervene a
variable (say x ), then the equation that determines this
variable has to be crossed out. The equation does not
describe how x is determined any more.
In this example, the system becomes
x
x0 (set )
y
az
Graphically,
x0
#
x
z
&
.
y
x and y are not connected in any way: the causal e¤ect is
zero. The cause e¤ect is (when x is set at two di¤erent values
x0 , x00 ): y (x00 ) y (x0 ) = 0
20
Example 2: Causality does not imply predictability
Consider the following causal system
y
ax + u,
x
by + v ,
or graphically
v
#
x
$
u
# .
y
Example: x : crime rate; y : police spending. (The two di¤erent
causal directions may not happen at exactly the same time, but if
we observe the variables not very frequently, then y
ax + u and
x
by + v can be regarded as happening simultaneously over
each observation interval).
21
Example 2: Causality does not imply predictability
I
Suppose that the values of (u, v ) are generated as an iid
sequence of pairs (Ui , Vi ) such that
(Ui , Vi ) s N 0,
?uu
?uv
?uv
?vv
.
I
We do not observe f(Ui , Vi )g , and we observe f(Xi , Yi )g
only.
I
The reduced form (the equilibrium solution in terms of Ui and
Vi ) is given by
Xi
=
Yi
=
1
1
ab
1
1
ab
(bUi + Vi ) ,
(Ui + aVi ) .
22
Example 2: Causality does not imply predictability
I
It is now easy to show that
? =
cov (Xi , Yi )
b?uu + (1 + ab ) ?uv + a?vv
.
=
var (Xi )
b 2 ?uu + 2b?uv + ?vv
I
Su¢ cient freedom exists to deliver a wide range of possible
value for ? .
I
For example, when ?uu = 0, we have
? =
a?vv
=a
?vv
whereas if ?vv = 0, we have ? = 1/b.
I
Picking
?uv =
b?uu + a?vv
1 + ab
gives ? = 0 so that Xi is useless as a predictor of Yi
23
Remarks
The optimal linear prediction interpretation of ? holds regardless
of whether we have each of the following
(a) x is the cause of y (?uu = 0)
(b ) y is the cause of x (?vv = 0)
(c ) x and y are mutually non-causal, although both have a
common cause (y
za and x
zb )
(d ) x and y mutually cause each other in the presence of
additional causal variables (?uu 6= 0, ?vv 6= 0)
In the
rst three cases, the predictions are in fact perfect, while in
the last case we can have Xi and Yi useless as predictors of one
another despite their causal relationships.
24
Remarks
I
While in case (a): x ! y , the conditional mean coincides with
the causal function, this is not true in any of other cases.
I
This conditional expectation cannot by itself tell us what we
should expect to happen when we intervene to set Xi to a
particular value.
I
Rather, it predicts: it tells us what we can expect Yi to be
given Xi when Yi and Xi are generated by whatever process is
operational for observation i.
25
Purchase answer to see full
attachment
Tags:
Econometrics
Linear Prediction
Prediction Analysis
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.