Description

5 attachmentsSlide 1 of 5attachment_1attachment_1attachment_2attachment_2attachment_3attachment_3attachment_4attachment_4attachment_5attachment_5.slider-slide > img { width: 100%; display: block; }

.slider-slide > img:focus { margin: auto; }

Unformatted Attachment Preview

Name ________________________________

Kuehn

Economics 310

Fall 2021

Homework 1

DUE DATE: Monday, September 13

1) A common test for AIDS is called the ELISA (Enzyme-Linked Immunosorbent Assay)

test. Among 1,000,000 people who are given the ELISA test, we can expect results

similar to those given in the table.

A1: Test Positive

A2: Test Negative

Totals

B1: Carry AIDS

Virus

3,595

205

3,800

B2: Do Not Carry

AIDS Virus

65,241

930,959

996,200

Totals

68,836

931,164

1,000,000

If one of these 1,000,000 people is selected randomly, find the following

probabilities:

a) P(B1) (i.e. probability carry the AIDS virus)

3800/1000000 = 0.0038

b) P(A1) (i.e. probability person tests positive)

68836/1000000 = 0.0688

c) P(A1|B2) (i.e. probability person tests positive given they do not carry)

65241/996200 = 0.0655

d) P(B1|A1) (i.e. probability person carries given they test positive)

3595/68836 = 0.0522

2) Suppose you have an 8-sided die that lands on the numbers 1-8 with equal

probability. Let X be the random variable equal to the number rolled on the die.

a) Calculate E[X].

E[X] = (1+2+3+4+5+6+7+8)/8 = 4.5

b) What is the variance of X?

[(1-4.5)2 + (2-4.5)2 + (3-4.5)2 + (4-4.5)2 + (5-4.5)2 + (6-4.5)2 + (7-4.5)2 + (84.5)2]/8 = 5.25

3) Compute the following probabilities:

a) If Y is distributed N(5,49), find Pr(Y ? 12).

Pr (Y ? 12) = Pr((Y-5)/7 ? (12-5)/7) = Pr (Z ? 1) = 0.84134

b) If Y is distributed N(6,16), find Pr(Y > 1).

Pr(Y>1) = Pr((Y-6)/4 > (1-6)/4) = Pr(Z > -1.25) = 1 Pr(Z ? -1.25) = 0.89435

c) If Y is distributed N(2,25), find Pr(0 ? Y ? 5).

Pr(0?Y?5)=Pr(Y?5)Pr(Y?0)=Pr((Y-2)/5?(5-2)/5)Pr((Y-2)/5?(0-2)/5)

= Pr(Z ? 0.6) Pr(Z ? -0.4)

= 0.72575 0.34458 = 0.38117

4) The following table lists the height in inches and weight in pounds of four college

students. Calculate the correlation coefficient. Detail your answer.

Height

Weight

65

145

68

160

69

155

73

170

Average Height = 68.75

Average Weight = 157.5

?Height = sqrt(8.1875) = 2.86138

?Weight = sqrt(81.25) = 9.01388

?HeightWeight = 24.375

corr(H,W) = 24.375/(2.86138*9.01388) = 0.9451

5) DATA Question: The dataset alabama.xlsx has standardized test scores from 127

public school districts in the state of Alabama. You should download it to your

computer from the Blackboard course webpage. The file contains the following

variables:

score89: Average reading and math standardized test score for 8-9 grade

students (in standard deviation units)

pcy: Per capita income in the district

syscde: The code used to identify the district.

Use the dataset to answer the following questions:

a) How many observations are in the dataset?

127 observations

b) What are the mean, min, and max for per capita income?

Mean: 10,709.68

Min: 6,306

Max: 39,610

c) What are the mean, min, and max for test scores?

Mean: 0.083917

Min: -3.2238

Max: 4.7221

d) What is the median for per capita income and test scores?

Median PCI: 10,127

Median Test Score: 0.01617

e) What is the standard deviation for per capita income and test scores?

StDev PCI: 3,613.81

StDev Test Score: 1.2666

0

.1

Density

.2

.3

.4

f) Create a histogram showing the distribution of test scores?

-4

-2

0

score89

2

4

g) What is the average test score in districts with per capita income at or above

the median? What is the average test score in districts with per capita

income below the median?

Districts with Above Median Income: 0.6253451

Districts with Below Median Income: -0.4661048

h) What is the standard deviation of test scores in districts with per capita

income at or above the median? What is the standard deviation of test scores

in districts with per capita income below the median?

Districts with Above Median Income: 1.216266

Districts with Below Median Income: 1.071052

i) What is the correlation coefficient between per capita income and test

scores?

0.6480

-4

-2

score89

0

2

4

j) Create a scatterplot with per capita income on the x-axis and test scores on

the y-axis. Comment on the relationship you see between per capita income

and test scores.

0

10000

20000

pcy

30000

40000

Name ________________________________

Kuehn

Economics 310

Fall 2021

Homework 2

DUE DATE: Wednesday, September 22

1) Below are the 3-point shooting performances for some of the top 3-point shooters in the

NBA during the 2020-2021 NBA season.

Player

Stephen Curry

Buddy Hield

Damian Lillard

Duncan Robinson

Terry Rozier

Field Goals Attempted

801

721

704

613

571

Field Goals Made

337

282

275

250

222

For a given player, the outcome of a particular shot can be modeled as a discrete random

variable: if Yi is the outcome of shot i, then Yi = 1 if the shot is made and Yi = 0 if the shot is

missed. Let p denote the probability of making any particular 3-pt shot attempt. The

natural estimator of p is W = FGM/FGA.

a) Estimate p for all 5 players. Which player has the highest estimate?

Player

Estimate

Stephen Curry

0.421

Buddy Hield

0.391

Damian Lillard

0.391

Duncan Robinson

0.408

Terry Rozier

0.389

b) What is the standard deviation of the estimator for Stephen Curry (i.e. sd(WSC))?

(Hint: The variance of a random variable Y that only takes the values 0 and 1 and

has mean ?Y, is given by var(Y) = ?Y(1-?Y).)

????!” =

0.421(1 ? 0.421)/801 = 0.01744

c) Are we able to get a more accurate estimate of Stephen Currys or Buddy Hields

ability to shoot 3-pointers? Why is that?

Steph Curry, because he takes more shots and so we have a larger sample size.

d) Construct a 95% confidence interval for the true probability that Stephen Curry

makes a 3-point shot.

[0.421-1.96*0.01744, 0.421+1.96*0.01744] = [0.3868, 0.4552]

e) Using a 1% significance level, test whether Stephen Currys true 3-point shooting

percentage is 45% against the alternative that it is less than 45% (i.e. H0: p = 0.45

against H1: p < 0.45).
t = (0.421-0.45)/0.01744 = -1.6628
Fail to reject the null hypothesis that Stepen Currys true 3-point shooting
percentage is 45% against the 1-sided alternative at the 1% level.
2) Suppose that your high school friend at San Jose St. claims that the average Math SAT
scores of CSU East Bay undergraduates is 540. To verify this claim, you collect a dataset of
the Math SAT scores of 50 randomly selected CSUEB undergraduates. You calculate a
sample mean of W = 552, and a sample variance sY2 = 1500.
a) Set up a hypothesis test to test your friends claim by stating the null hypothesis and
a 2-sided alternative hypothesis.
??! : ?? = 540, ??! : ?? ? 540
b) Suppose you choose a significance level of 5%. Compute the t-statistic for this test
and perform the test. What do you tell your friend?
t-stat = 2.1909,
2.1909 > 1.96 à Reject the null (i.e. your friend is wrong)

c) What would you tell your friend if you chose a significance level of 0.01 instead?

2.1909 < 2.58 à Cant reject the null at the 1% level (i.e. your friend might be right)
d) Using the Cumulative Standard Normal table in the Appendix of your textbook (or
any Cumulative Standard Normal table), compute the p-value associated with this
test.
p-value = 0.0286 or 2.86%
e) Construct a 95% confidence interval for the average Math SAT scores for CSUEB
undergraduates
[541.265, 562.735]
3) You are interested in whether at 12 years old, there is a difference in heights between boys
and girls. You collect the following data:
Boys
Girls
Observations
255
287
Average (inches)
57.8
58.4
Standard Deviation
4.4
4.2
a) What is the estimate for the difference in height?
58.4-57.9 = 0.5
b) What is the standard deviation for the difference in height?
?????????? ??! ??! =
4.2!
287 +
4.4!
255 = 0.37065
c) Construct a 90% confidence interval for the difference in height?
[0.5-1.64*0.371, 0.5+1.64*0.371] = [-0.1084, 1.1084]
d) What is the t-statistic for the test with the null hypothesis that there is no difference
in the height of boys and girls at this age level against the 1-sided alternative that
girls are taller than boys?
t = 0.5/0.371 = 1.3477
e) What is the p-value for the above test? Would you reject the null hypothesis at a 5%
significance level? What about at a 10% significance level?
p-value = 0.08851 or 8.85%
Fail to reject at a 5% significance level, but reject at a 10% significance level.
4) DATA Question: Use the dataset wage.xlsx provided on the class website to answer the
following questions. We will be using the following variables from this dataset:
wage: average hourly earnings
educ: years of education
female: indicator equal to 1 if the individual is female
children: indicator equal to 1 if the individual has children
Use the dataset to answer the following questions:
a) You want to test whether females earn less than their male counterparts. Write out
the null and alternative hypothesis for this test.
??! : ??! = ??!
??! : ??! < ??!
b) What is the t-statistic for this test? What is the p-value? What do you conclude from
these results?
t = -8.2787
p-value ? 0
Reject null that wages are equal in favor of the alternative that females are paid a
lower wage than their male counterparts. In other words, there is enough statistical
evidence that on average, females are paid lower wages than males.
c) You want to test whether workers with children earn the same amount as workers
without children. Write out the null and alternative hypothesis for this test.
??! : ??! = ??!"
??! : ??! ? ??!"
d) What is the t-statistic for this test? What is the p-value? What do you conclude from
these results?
t = -0.4521
p-value = 0.6512 or 65.12%
Fail to reject the null hypothesis that individuals with children earn the same
average wage as individuals without children.
e) Construct a 95% confidence interval for the average wage for someone with more
than 13 years of education.
[7.13473, 8.45926]
Name ________________________________
Kuehn
Economics 310
Fall 2021
Homework 3
DUE DATE: Monday, October 18
1) Consider the following empirical model: ??! = ??! + ??! ??! + ??!
Suppose you collect a random sample of 3 observations on ??! and ??! .
????
51
60
45
????
3
2
4
a) Using the formulas from class, compute the OLS estimate ??! and ??! by hand (i.e.
using a calculator, not Python).
??! = ?7.5
??! = 74.5
b) Compute the predicted value ??! and the residual ??! for each of the 3 observations.
??! = 52.0,
??! = ?1.0,
??! = 59.5,
??! = 0.5,
??! = 44.5
??! = 0.5
c) What is the mean of the ??! s?
0
d) What is the mean of the ??! s? How does it compare to the mean of the ??! s?
52. It is the same as the mean of the Yis.
2) Using observations for 500 families on annual income(inc) and consumption(cons) (both
measured in dollars), the following regression results are obtained:
???????? = ?98.81 + 0.749 ? ??????,
??! = 0.492
a) Interpret the intercept in this equation, and comment on its sign and magnitude.
Means that for someone with zero income, consumption will be negative. This
doesnt make sense in this instance.
b) What is the predicted consumption when family income is $35,000?
-98.81 + 0.749*35,000 = $26,116.19
c) If a family increases its income by $5,000, what is the regression models prediction
for how this will affect the familys consumption spending?
0.749*5,000 = 3,745
They will increase their consumption spending by $3,745.
3) The following regression estimates the relationship between whether a county is urban or
rural (measured by urban which takes on the value of 1 for an urban county and 0 for a
rural county) and the unemployment rate (measured by urate):
?????????? = 0.0514 ? 0.0102 ? ??????????
(0.0201) (0.0059)
The standard error is given in parenthesis below the coefficient estimate.
a) What is mean unemployment rate for urban counties?
0.0514 0.0102 = 0.0412
b) What is the difference in the mean unemployment rate between rural and urban
counties?
Urban counties have a 0.0102 (or 1.02 pp) lower unemployment rate.
c) Does whether a county is urban or rural have a statistically significant effect on the
countys unemployment rate at a 5% significance level?
t = -1.73 < 1.96 à Fail to reject à Not statistically different from zero
4) The following regression estimates the impact of crime on housing prices by regressing the
median price of a 1-bedroom apartment (price) on the number of reported crimes per
capita (crime):
?????????? = 121,831.53 ? 1,133.60 ? ??????????, ??! = 0.236 ?????? = 15,103.21
(25,054.59) (562.93)
The standard error associated with each estimated coefficient is given in parenthesis below
the estimate.
a) Interpret the coefficient on crime in your own words.
1 additional reported crime per capita is expected to cause housing prices to decline
by $1,133.60.
b) Write out the null and alternative hypothesis for the test of whether crime has any
effect on housing prices against the alternative that it has either a positive or
negative effect on housing prices.
H0: ?1 = 0
H1: ?1 ? 0
c) What is the t-statistic associated with the test in (b)?
t = (-1130.6-0)/562.93 = -2.014
d) What is the p-value associated with the test in (b)?
p = 4.44%
e) Can you reject the null hypothesis at the 5% significance level?
Yes, you can reject the null hypothesis at the 5% level.
f) Can you reject the null hypothesis at the 1% significance level?
No, you would fail to reject the null hypothesis at the 1% level
5) DATA Question: Use the dataset Earnings_and_Height.xlsx provided on the class website to
answer the following questions. We will use the following variables from this dataset:
height: height without shoes (in inches)
earnings: annual labor earnings (expressed in $2012)
sex: indicator equal to 1 for male and 0 for female
a) Run a regression of earnings on height of the form:
????????????????! = ??! + ??! ???????????! + ??!
What is the estimated slope of the regression? What is the estimated intercept?
What is the ??! ?
??! = ?512.7336
??! = 707.6716
??! = 0.0109
b) What is an economic interpretation for your estimate of ??! ?
For a 1-inch increase in height, there is an expected increase in earnings of $707.
c) Run the same regression as in (a) but only for female workers. What are the
estimated slope, the estimated intercept, and the ??! of the regression?
??! = 12650
??! = 511.222
??! = 0.0027
d) Run the same regression as in (a) and (c) but now only for male workers. What are
the estimated slope, the estimated intercept, and the ??! of the regression?
??! = ?43130
??! = 1306.86
??! = 0.0209
Name ________________________________
Kuehn
Economics 310
Fall 2021
Homework 4
DUE DATE: Wednesday, October 27
1) Bob is interested in measuring the effect of increasing the average length of jail sentences
on the wages of individuals when they get out of jail. He believes that since prisons often
provide job training programs, more time spent in jail could increase wages upon leaving
jail. He collects data from former inmates on their current wage, ????????! , and the amount of
time they have spent in prison, ????????????????????! . He obtains the following regression results:
????????! = 45.67 ? 1.23 ? ????????????????????!
Think of a specific omitted variable that might be biasing the estimate of the slope
coefficient. Describe whether (and why) you think this omitted variable is biasing the
estimated slope coefficient upwards or downwards (i.e. positively or negatively).
Ex. Level of education à Positively affects wages and is negatively correlated with prison
time, thus I would expect the slope coefficient to be negatively biased.
2) Cheryl thinks that regular doctor visits increase preventive care and may reduce the
number of sick days workers take. To try to measure this effect, she obtains data from a
number of individuals on the number of sick days they took in the last year, ????????????????! , and
the number of doctor visits they made in the past year, ????????????????????????! . She obtains the
following regression results:
????????????????! = 4.67 ? 0.22 ? ????????????????????????!
Think of a specific omitted variable that might be biasing the estimate of the slope
coefficient. Describe whether (and why) you think this omitted variable is biasing the
estimated slope coefficient upwards or downwards (i.e. positively or negatively).
Ex. Health Status à Better health means less doctor visits and less sick days, and so I would
expect the slope coefficient to be positively biased.
3) You are interested in how the number of hours a high school student has to work in an
outside job has on their GPA. In your regression you want to control for high school
standing and so you run the following regression:
?????? = 3.4 ? 0.03 ? ???????????? ? 0.7 ? ????????? ? 0.3 ? ??????? + 0.1 ? ????????????
(1.1) (0.013)
(0.23)
(0.14)
(0.08)
where HrsWrk is the number of hours the student works per week, and Frosh, Soph, and
Junior are dummy variables for the students class standing
a) Why dont you also include a dummy variable for seniors? What is the name of the
problem caused by adding a dummy variable for seniors?
This would cause a perfect multicollinearity problem
b) In your own words interpret the coefficient on Junior?
High school juniors are expected to have a GPA that is 0.1 points higher than seniors,
holding constant the number of hours that they work.
c) What is the expected GPA of a Sophomore who works 10 hours per week?
3.4 0.03*10-0.3 = 2.8
d) What is the expected GPA of a Senior who works 10 hours per week?
3.4 0.03*10 = 3.1
e) If Dom and Sarah work the same number of hours per week, but Dom is a Junior and
Sarah is a Freshman, what is the expected difference in their GPAs?
Dom is expected to have a 0.8 higher GPA than Sarah
f) Suppose you rewrite the regression as:
?????? = ??! ???????????? + ??! ????????? + ??! ??????? + ??! ???????????? + ??! ????????????
where the intercept is dropped and the dummy variable for Senior is added.
Given that you estimate this regression on the same sample as above, what is the
coefficient estimate you will get for ??! , the coefficient on Senior?
??! = 3.4
g) You run the same regression as in (f), what is the coefficient estimate you will get for ??! ,
the coefficient on Frosh?
??! = 2.7
4) Tuition at your school has gone up, and administrators argue that the reason for the
increase is that the reputation of the school has increased. You decide to investigate this
hypothesis by collecting data randomly for 100 national universities and liberal arts
colleges from the 2017-2018 U.S News and World Report annual rankings. You run the
following regression:
???????? = 7311.17 + 3985.20 ? ?????? ? 0.20 ? ???????? + 8406.79 ? ?????????? ? 416.38 ? ?????????????? ? 2376.51 ? ????????
R2 = 0.72, SER = 3773.35
where Cost is the tuition in dollars, Rep is the index in the U.S. News and World Report that
ranges from 1 to 5 with 5 being the bets reputation, Size is the number of undergraduate
students, Dpriv is a binary variable equal to 1 if the school is private, Dlibart is a binary
variable equal to 1 if the school is a liberal arts school, and Drel is a binary variable equal to
1 if the school has a religious affiliation.
a) What is the forecasted cost for a private liberal arts college, which has no religious
affiliation, a size of 1500 students, and a reputation level of 4.5?
$32,934.98 (~$32,935)
b) Suppose you switch from a private university to a public university that has a 0.5 lower
reputation ranking and 10,000 more students. What is the effect on your cost?
-$12399.39 (~ -$12,400)
c) If you eliminate Size, and just regress Cost on Rep, Dpriv, Dlibart, and Drel, what do you
expect to happen to the coefficient on Dpriv? Would you expect the coefficient to go up
or down? In other words, would you expect the estimated effect of attending a private
institution on cost, to have increased or decreased? Why?
(Hint: Private schools tend to be smaller than public schools.)
Go up because then Size is an omitted variable that is negatively correlated with Dpriv
and negative related to Cost (seen in above regression), and so the coefficient on Dpriv
will be upwardly biased.
5) Consider the following regression of housing prices on house characteristics:
?????????? = 119.2 + 0.485?????? + 23.4??????? + 0.156?????????? + 0.090?????? ? 48.8????????
(23.9) (2.61)
(8.94)
(0.011)
(0.311)
(10.5)
2
R = 0.72, SER = 41.5, n =250
Price is the price of the house in thousands of dollars, BDR is the number of bedrooms, Bath
is the number of bathrooms, Hsize is the size of the house in square feet, Age is the age of
the house in years, and Poor is a variable equal to 1 if the house is in poor condition.
a) A homeowner purchases an adjacent lot and extends the size of the home (without
changing the number of rooms) by 150 square feet. Construct a 99% confidence
interval for the change in the value of her house.
[0.156-2.58*0.011,0.156+2.58*0.011]*150
[0.12762, 0.18438]*150
[19.143, 27.657]
b) The F-statistic for omitting Age and Poor from the regression is F=3.85. Are the
coefficients on Age and Poor jointly statistically different from zero at the 5% level?
What about at the 1% level?
Critical value for ??!,! at the 5% level is 3.00 à Thus we can reject the null hypothesis
that both Age and Poor are both equal to zero at the 5% level.
Critical value for ??!,! at the 1% level is 4.61 à Thus we cannot reject the null
hypothesis that both Age and Poor are both equal to zero at the 1% level.
c) Your friend tells you the only variable that matters is the size of the house. You test this
by running the following regression:
?????????? = 142.4 + 0.178??????????
(31.5) (0.013)
R2 = 0.58, SER = 56.4, n =250
What is the value of the homoscedasticity-only F-statistic that tests the null hypothesis
H0: ?1 = 0, ?2 = 0, ?4 = 0, and ?5 = 0, in the original regression?
F = [(0.72-0.58)/4]/[(1-0.72)/(250-5-1)] = 30.5
6) DATA Question: Use the dataset fatality.xlsx provided on the class website to answer the
following questions. The dataset contains one observation for each of the lower 48 states
for each year between 1982 and 1988, for a total of 336 observations. We will use the
following variables from this dataset:
mrall: Rate of fatal motor vehicle accidents in a given year
dry: Percentage of the states population living in dry counties (i.e. counties that
prohibit alcohol sales)
perinc: Measures per capita income
a) Regress the rate of fatal motor vehicle accidents on the percentage of dry counties in a
state. What are the results? (Remember to use robust standard errors.)
?????????? = 0.2008 + 0.0076 ? ??????
(0.003) (0.000)
Alcohol control laws increase motor vehicle fatalities in that a 1% increase in dry
counties leads to a .0076 increase in motor vehicle fatality rates.
b) Test the null hypothesis that the slope coefficient on dry is equal to zero at the 5%
significance level. What does that tell you about the effect of alcohol control laws on
motor vehicle fatalities?
t = 3.986
Reject null at 5% level.
These results imply that that alcohol control laws have a positive and statistically
significant effect on motor vehicle fatality rates.
c) Run a second regression of the rate of fatal motor vehicle accidents on the percentage of
dry counties and per capita income (again using robust standard errors). What are the
results? What do these results tell you about the effect of alcohol control laws on motor
vehicle fatalities?
?????????? = 0.3867 ? 0.0003 ? ?????? ? 0.000013 ? ????????????
(0.17) (0.000)
(0.000001)
Alcohol control laws have a negative effect on motor vehicle fatalities.
d) What is a good explanation for why the estimated coefficient on dry in part (c) has the
opposite sign of the estimated coefficient on dry in part (a)?
There is omitted variable bias in part (a) since income could negatively effect fatality
rates and also could be negatively correlated with percent of dry counties. This will
lead to a positive bias of the coefficient on dry counties in the regression in part (a).
e) Test the null hypothesis that the coefficient on dry equals 0 at the 5% significance level.
Test the null hypothesis that the coefficient on perinc equals 0 at the 5% significance
level. Test the null hypothesis that both the coefficients on dry and perinc are equal to 0
at the 5% significance level.
t = -1.45 à Cant reject the null that the coefficient on dry is equal to zero.
t = -11.78 à We can reject the null that the coefficient on perinc is equal to zero.
F = 91.58 à Using the F-stat we can reject the joint null hypothesis that alcohol control
laws and per capita income dont have any effect on traffic fatality rates.
Remaining Time: 1 hour, 59 minutes, 31 seconds.
Question Completion Status:
A Moving to another question will save this response.
Question 1 of 10
Question 1
10 points
Save Answer
In an OLS regression, the slope estimator, B1, has a larger standard error, other things equal, if
there is a small variance in the error term u
the intercept, Bo, is large
there is more variation in the explanatory variable X
the estimator converges
the sample size is smaller
A Moving to another question will save this response.
Question 1 of 10
Purchase answer to see full
attachment
Tags:
Regression
EnzymeLinked Immunosorbent Assay
random
variable
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.