Regression and Economics Questionnaire

Description

5 attachmentsSlide 1 of 5attachment_1attachment_1attachment_2attachment_2attachment_3attachment_3attachment_4attachment_4attachment_5attachment_5.slider-slide > img { width: 100%; display: block; }
.slider-slide > img:focus { margin: auto; }

Unformatted Attachment Preview

Name ________________________________
Kuehn
Economics 310
Fall 2021
Homework 1
DUE DATE: Monday, September 13
1) A common test for AIDS is called the ELISA (Enzyme-Linked Immunosorbent Assay)
test. Among 1,000,000 people who are given the ELISA test, we can expect results
similar to those given in the table.
A1: Test Positive
A2: Test Negative
Totals
B1: Carry AIDS
Virus
3,595
205
3,800
B2: Do Not Carry
AIDS Virus
65,241
930,959
996,200
Totals
68,836
931,164
1,000,000
If one of these 1,000,000 people is selected randomly, find the following
probabilities:
a) P(B1) (i.e. probability carry the AIDS virus)
3800/1000000 = 0.0038
b) P(A1) (i.e. probability person tests positive)
68836/1000000 = 0.0688
c) P(A1|B2) (i.e. probability person tests positive given they do not carry)
65241/996200 = 0.0655
d) P(B1|A1) (i.e. probability person carries given they test positive)
3595/68836 = 0.0522
2) Suppose you have an 8-sided die that lands on the numbers 1-8 with equal
probability. Let X be the random variable equal to the number rolled on the die.
a) Calculate E[X].
E[X] = (1+2+3+4+5+6+7+8)/8 = 4.5
b) What is the variance of X?
[(1-4.5)2 + (2-4.5)2 + (3-4.5)2 + (4-4.5)2 + (5-4.5)2 + (6-4.5)2 + (7-4.5)2 + (84.5)2]/8 = 5.25
3) Compute the following probabilities:
a) If Y is distributed N(5,49), find Pr(Y ? 12).
Pr (Y ? 12) = Pr((Y-5)/7 ? (12-5)/7) = Pr (Z ? 1) = 0.84134
b) If Y is distributed N(6,16), find Pr(Y > 1).
Pr(Y>1) = Pr((Y-6)/4 > (1-6)/4) = Pr(Z > -1.25) = 1 – Pr(Z ? -1.25) = 0.89435
c) If Y is distributed N(2,25), find Pr(0 ? Y ? 5).
Pr(0?Y?5)=Pr(Y?5)–Pr(Y?0)=Pr((Y-2)/5?(5-2)/5)–Pr((Y-2)/5?(0-2)/5)
= Pr(Z ? 0.6) – Pr(Z ? -0.4)
= 0.72575 – 0.34458 = 0.38117
4) The following table lists the height in inches and weight in pounds of four college
students. Calculate the correlation coefficient. Detail your answer.
Height
Weight
65
145
68
160
69
155
73
170
Average Height = 68.75
Average Weight = 157.5
?Height = sqrt(8.1875) = 2.86138
?Weight = sqrt(81.25) = 9.01388
?HeightWeight = 24.375
corr(H,W) = 24.375/(2.86138*9.01388) = 0.9451
5) DATA Question: The dataset alabama.xlsx has standardized test scores from 127
public school districts in the state of Alabama. You should download it to your
computer from the Blackboard course webpage. The file contains the following
variables:
• score89: Average reading and math standardized test score for 8-9 grade
students (in standard deviation units)
• pcy: Per capita income in the district
• syscde: The code used to identify the district.
Use the dataset to answer the following questions:
a) How many observations are in the dataset?
127 observations
b) What are the mean, min, and max for per capita income?
Mean: 10,709.68
Min: 6,306
Max: 39,610
c) What are the mean, min, and max for test scores?
Mean: 0.083917
Min: -3.2238
Max: 4.7221
d) What is the median for per capita income and test scores?
Median PCI: 10,127
Median Test Score: 0.01617
e) What is the standard deviation for per capita income and test scores?
StDev PCI: 3,613.81
StDev Test Score: 1.2666
0
.1
Density
.2
.3
.4
f) Create a histogram showing the distribution of test scores?
-4
-2
0
score89
2
4
g) What is the average test score in districts with per capita income at or above
the median? What is the average test score in districts with per capita
income below the median?
Districts with Above Median Income: 0.6253451
Districts with Below Median Income: -0.4661048
h) What is the standard deviation of test scores in districts with per capita
income at or above the median? What is the standard deviation of test scores
in districts with per capita income below the median?
Districts with Above Median Income: 1.216266
Districts with Below Median Income: 1.071052
i) What is the correlation coefficient between per capita income and test
scores?
0.6480
-4
-2
score89
0
2
4
j) Create a scatterplot with per capita income on the x-axis and test scores on
the y-axis. Comment on the relationship you see between per capita income
and test scores.
0
10000
20000
pcy
30000
40000
Name ________________________________
Kuehn
Economics 310
Fall 2021
Homework 2
DUE DATE: Wednesday, September 22
1) Below are the 3-point shooting performances for some of the top 3-point shooters in the
NBA during the 2020-2021 NBA season.
Player
Stephen Curry
Buddy Hield
Damian Lillard
Duncan Robinson
Terry Rozier
Field Goals Attempted
801
721
704
613
571
Field Goals Made
337
282
275
250
222
For a given player, the outcome of a particular shot can be modeled as a discrete random
variable: if Yi is the outcome of shot i, then Yi = 1 if the shot is made and Yi = 0 if the shot is
missed. Let p denote the probability of making any particular 3-pt shot attempt. The
natural estimator of p is W = FGM/FGA.
a) Estimate p for all 5 players. Which player has the highest estimate?
Player
Estimate
Stephen Curry
0.421
Buddy Hield
0.391
Damian Lillard
0.391
Duncan Robinson
0.408
Terry Rozier
0.389
b) What is the standard deviation of the estimator for Stephen Curry (i.e. sd(WSC))?
(Hint: The variance of a random variable Y that only takes the values 0 and 1 and
has mean ?Y, is given by var(Y) = ?Y(1-?Y).)
????!” =
0.421(1 ? 0.421)/801 = 0.01744
c) Are we able to get a more accurate estimate of Stephen Curry’s or Buddy Hield’s
ability to shoot 3-pointers? Why is that?
Steph Curry, because he takes more shots and so we have a larger sample size.
d) Construct a 95% confidence interval for the true probability that Stephen Curry
makes a 3-point shot.
[0.421-1.96*0.01744, 0.421+1.96*0.01744] = [0.3868, 0.4552]
e) Using a 1% significance level, test whether Stephen Curry’s true 3-point shooting
percentage is 45% against the alternative that it is less than 45% (i.e. H0: p = 0.45
against H1: p < 0.45). t = (0.421-0.45)/0.01744 = -1.6628 Fail to reject the null hypothesis that Stepen Curry’s true 3-point shooting percentage is 45% against the 1-sided alternative at the 1% level. 2) Suppose that your high school friend at San Jose St. claims that the average Math SAT scores of CSU East Bay undergraduates is 540. To verify this claim, you collect a dataset of the Math SAT scores of 50 randomly selected CSUEB undergraduates. You calculate a sample mean of W = 552, and a sample variance sY2 = 1500. a) Set up a hypothesis test to test your friend’s claim by stating the null hypothesis and a 2-sided alternative hypothesis. ??! : ?? = 540, ??! : ?? ? 540 b) Suppose you choose a significance level of 5%. Compute the t-statistic for this test and perform the test. What do you tell your friend? t-stat = 2.1909, 2.1909 > 1.96 à Reject the null (i.e. your friend is wrong)
c) What would you tell your friend if you chose a significance level of 0.01 instead?
2.1909 < 2.58 à Can’t reject the null at the 1% level (i.e. your friend might be right) d) Using the Cumulative Standard Normal table in the Appendix of your textbook (or any Cumulative Standard Normal table), compute the p-value associated with this test. p-value = 0.0286 or 2.86% e) Construct a 95% confidence interval for the average Math SAT scores for CSUEB undergraduates [541.265, 562.735] 3) You are interested in whether at 12 years old, there is a difference in heights between boys and girls. You collect the following data: Boys Girls Observations 255 287 Average (inches) 57.8 58.4 Standard Deviation 4.4 4.2 a) What is the estimate for the difference in height? 58.4-57.9 = 0.5 b) What is the standard deviation for the difference in height? ?????????? ??! – ??! = 4.2! 287 + 4.4! 255 = 0.37065 c) Construct a 90% confidence interval for the difference in height? [0.5-1.64*0.371, 0.5+1.64*0.371] = [-0.1084, 1.1084] d) What is the t-statistic for the test with the null hypothesis that there is no difference in the height of boys and girls at this age level against the 1-sided alternative that girls are taller than boys? t = 0.5/0.371 = 1.3477 e) What is the p-value for the above test? Would you reject the null hypothesis at a 5% significance level? What about at a 10% significance level? p-value = 0.08851 or 8.85% Fail to reject at a 5% significance level, but reject at a 10% significance level. 4) DATA Question: Use the dataset wage.xlsx provided on the class website to answer the following questions. We will be using the following variables from this dataset: • wage: average hourly earnings • educ: years of education • female: indicator equal to 1 if the individual is female • children: indicator equal to 1 if the individual has children Use the dataset to answer the following questions: a) You want to test whether females earn less than their male counterparts. Write out the null and alternative hypothesis for this test. ??! : ??! = ??! ??! : ??! < ??! b) What is the t-statistic for this test? What is the p-value? What do you conclude from these results? t = -8.2787 p-value ? 0 Reject null that wages are equal in favor of the alternative that females are paid a lower wage than their male counterparts. In other words, there is enough statistical evidence that on average, females are paid lower wages than males. c) You want to test whether workers with children earn the same amount as workers without children. Write out the null and alternative hypothesis for this test. ??! : ??! = ??!" ??! : ??! ? ??!" d) What is the t-statistic for this test? What is the p-value? What do you conclude from these results? t = -0.4521 p-value = 0.6512 or 65.12% Fail to reject the null hypothesis that individuals with children earn the same average wage as individuals without children. e) Construct a 95% confidence interval for the average wage for someone with more than 13 years of education. [7.13473, 8.45926] Name ________________________________ Kuehn Economics 310 Fall 2021 Homework 3 DUE DATE: Monday, October 18 1) Consider the following empirical model: ??! = ??! + ??! ??! + ??! Suppose you collect a random sample of 3 observations on ??! and ??! . ???? 51 60 45 ???? 3 2 4 a) Using the formulas from class, compute the OLS estimate ??! and ??! by hand (i.e. using a calculator, not Python). ??! = ?7.5 ??! = 74.5 b) Compute the predicted value ??! and the residual ??! for each of the 3 observations. ??! = 52.0, ??! = ?1.0, ??! = 59.5, ??! = 0.5, ??! = 44.5 ??! = 0.5 c) What is the mean of the ??! ’s? 0 d) What is the mean of the ??! ’s? How does it compare to the mean of the ??! ’s? 52. It is the same as the mean of the Yi’s. 2) Using observations for 500 families on annual income(inc) and consumption(cons) (both measured in dollars), the following regression results are obtained: ???????? = ?98.81 + 0.749 ? ??????, ??! = 0.492 a) Interpret the intercept in this equation, and comment on its sign and magnitude. Means that for someone with zero income, consumption will be negative. This doesn’t make sense in this instance. b) What is the predicted consumption when family income is $35,000? -98.81 + 0.749*35,000 = $26,116.19 c) If a family increases its income by $5,000, what is the regression model’s prediction for how this will affect the family’s consumption spending? 0.749*5,000 = 3,745 They will increase their consumption spending by $3,745. 3) The following regression estimates the relationship between whether a county is urban or rural (measured by urban which takes on the value of 1 for an urban county and 0 for a rural county) and the unemployment rate (measured by urate): ?????????? = 0.0514 ? 0.0102 ? ?????????? (0.0201) (0.0059) The standard error is given in parenthesis below the coefficient estimate. a) What is mean unemployment rate for urban counties? 0.0514 – 0.0102 = 0.0412 b) What is the difference in the mean unemployment rate between rural and urban counties? Urban counties have a 0.0102 (or 1.02 pp) lower unemployment rate. c) Does whether a county is urban or rural have a statistically significant effect on the county’s unemployment rate at a 5% significance level? t = -1.73 < 1.96 à Fail to reject à Not statistically different from zero 4) The following regression estimates the impact of crime on housing prices by regressing the median price of a 1-bedroom apartment (price) on the number of reported crimes per capita (crime): ?????????? = 121,831.53 ? 1,133.60 ? ??????????, ??! = 0.236 ?????? = 15,103.21 (25,054.59) (562.93) The standard error associated with each estimated coefficient is given in parenthesis below the estimate. a) Interpret the coefficient on crime in your own words. 1 additional reported crime per capita is expected to cause housing prices to decline by $1,133.60. b) Write out the null and alternative hypothesis for the test of whether crime has any effect on housing prices against the alternative that it has either a positive or negative effect on housing prices. H0: ?1 = 0 H1: ?1 ? 0 c) What is the t-statistic associated with the test in (b)? t = (-1130.6-0)/562.93 = -2.014 d) What is the p-value associated with the test in (b)? p = 4.44% e) Can you reject the null hypothesis at the 5% significance level? Yes, you can reject the null hypothesis at the 5% level. f) Can you reject the null hypothesis at the 1% significance level? No, you would fail to reject the null hypothesis at the 1% level 5) DATA Question: Use the dataset Earnings_and_Height.xlsx provided on the class website to answer the following questions. We will use the following variables from this dataset: • height: height without shoes (in inches) • earnings: annual labor earnings (expressed in $2012) • sex: indicator equal to 1 for male and 0 for female a) Run a regression of earnings on height of the form: ????????????????! = ??! + ??! ???????????! + ??! What is the estimated slope of the regression? What is the estimated intercept? What is the ??! ? ??! = ?512.7336 ??! = 707.6716 ??! = 0.0109 b) What is an economic interpretation for your estimate of ??! ? For a 1-inch increase in height, there is an expected increase in earnings of $707. c) Run the same regression as in (a) but only for female workers. What are the estimated slope, the estimated intercept, and the ??! of the regression? ??! = 12650 ??! = 511.222 ??! = 0.0027 d) Run the same regression as in (a) and (c) but now only for male workers. What are the estimated slope, the estimated intercept, and the ??! of the regression? ??! = ?43130 ??! = 1306.86 ??! = 0.0209 Name ________________________________ Kuehn Economics 310 Fall 2021 Homework 4 DUE DATE: Wednesday, October 27 1) Bob is interested in measuring the effect of increasing the average length of jail sentences on the wages of individuals when they get out of jail. He believes that since prisons often provide job training programs, more time spent in jail could increase wages upon leaving jail. He collects data from former inmates on their current wage, ????????! , and the amount of time they have spent in prison, ????????????????????! . He obtains the following regression results: ????????! = 45.67 ? 1.23 ? ????????????????????! Think of a specific “omitted variable” that might be biasing the estimate of the slope coefficient. Describe whether (and why) you think this “omitted variable” is biasing the estimated slope coefficient upwards or downwards (i.e. positively or negatively). Ex. Level of education à Positively affects wages and is negatively correlated with prison time, thus I would expect the slope coefficient to be negatively biased. 2) Cheryl thinks that regular doctor visits increase preventive care and may reduce the number of sick days workers take. To try to measure this effect, she obtains data from a number of individuals on the number of sick days they took in the last year, ????????????????! , and the number of doctor visits they made in the past year, ????????????????????????! . She obtains the following regression results: ????????????????! = 4.67 ? 0.22 ? ????????????????????????! Think of a specific “omitted variable” that might be biasing the estimate of the slope coefficient. Describe whether (and why) you think this “omitted variable” is biasing the estimated slope coefficient upwards or downwards (i.e. positively or negatively). Ex. Health Status à Better health means less doctor visits and less sick days, and so I would expect the slope coefficient to be positively biased. 3) You are interested in how the number of hours a high school student has to work in an outside job has on their GPA. In your regression you want to control for high school standing and so you run the following regression: ?????? = 3.4 ? 0.03 ? ???????????? ? 0.7 ? ????????? ? 0.3 ? ??????? + 0.1 ? ???????????? (1.1) (0.013) (0.23) (0.14) (0.08) where HrsWrk is the number of hours the student works per week, and Frosh, Soph, and Junior are dummy variables for the student’s class standing a) Why don’t you also include a dummy variable for seniors? What is the name of the problem caused by adding a dummy variable for seniors? This would cause a perfect multicollinearity problem b) In your own words interpret the coefficient on Junior? High school juniors are expected to have a GPA that is 0.1 points higher than seniors, holding constant the number of hours that they work. c) What is the expected GPA of a Sophomore who works 10 hours per week? 3.4 – 0.03*10-0.3 = 2.8 d) What is the expected GPA of a Senior who works 10 hours per week? 3.4 – 0.03*10 = 3.1 e) If Dom and Sarah work the same number of hours per week, but Dom is a Junior and Sarah is a Freshman, what is the expected difference in their GPAs? Dom is expected to have a 0.8 higher GPA than Sarah f) Suppose you rewrite the regression as: ?????? = ??! ???????????? + ??! ????????? + ??! ??????? + ??! ???????????? + ??! ???????????? where the intercept is dropped and the dummy variable for Senior is added. Given that you estimate this regression on the same sample as above, what is the coefficient estimate you will get for ??! , the coefficient on Senior? ??! = 3.4 g) You run the same regression as in (f), what is the coefficient estimate you will get for ??! , the coefficient on Frosh? ??! = 2.7 4) Tuition at your school has gone up, and administrators argue that the reason for the increase is that the reputation of the school has increased. You decide to investigate this hypothesis by collecting data randomly for 100 national universities and liberal arts colleges from the 2017-2018 U.S News and World Report annual rankings. You run the following regression: ???????? = 7311.17 + 3985.20 ? ?????? ? 0.20 ? ???????? + 8406.79 ? ?????????? ? 416.38 ? ?????????????? ? 2376.51 ? ???????? R2 = 0.72, SER = 3773.35 where Cost is the tuition in dollars, Rep is the index in the U.S. News and World Report that ranges from 1 to 5 with 5 being the bets reputation, Size is the number of undergraduate students, Dpriv is a binary variable equal to 1 if the school is private, Dlibart is a binary variable equal to 1 if the school is a liberal arts school, and Drel is a binary variable equal to 1 if the school has a religious affiliation. a) What is the forecasted cost for a private liberal arts college, which has no religious affiliation, a size of 1500 students, and a reputation level of 4.5? $32,934.98 (~$32,935) b) Suppose you switch from a private university to a public university that has a 0.5 lower reputation ranking and 10,000 more students. What is the effect on your cost? -$12399.39 (~ -$12,400) c) If you eliminate Size, and just regress Cost on Rep, Dpriv, Dlibart, and Drel, what do you expect to happen to the coefficient on Dpriv? Would you expect the coefficient to go up or down? In other words, would you expect the estimated effect of attending a private institution on cost, to have increased or decreased? Why? (Hint: Private schools tend to be smaller than public schools.) Go up because then Size is an omitted variable that is negatively correlated with Dpriv and negative related to Cost (seen in above regression), and so the coefficient on Dpriv will be upwardly biased. 5) Consider the following regression of housing prices on house characteristics: ?????????? = 119.2 + 0.485?????? + 23.4??????? + 0.156?????????? + 0.090?????? ? 48.8???????? (23.9) (2.61) (8.94) (0.011) (0.311) (10.5) 2 R = 0.72, SER = 41.5, n =250 Price is the price of the house in thousands of dollars, BDR is the number of bedrooms, Bath is the number of bathrooms, Hsize is the size of the house in square feet, Age is the age of the house in years, and Poor is a variable equal to 1 if the house is in poor condition. a) A homeowner purchases an adjacent lot and extends the size of the home (without changing the number of rooms) by 150 square feet. Construct a 99% confidence interval for the change in the value of her house. [0.156-2.58*0.011,0.156+2.58*0.011]*150 [0.12762, 0.18438]*150 [19.143, 27.657] b) The F-statistic for omitting Age and Poor from the regression is F=3.85. Are the coefficients on Age and Poor jointly statistically different from zero at the 5% level? What about at the 1% level? Critical value for ??!,! at the 5% level is 3.00 à Thus we can reject the null hypothesis that both Age and Poor are both equal to zero at the 5% level. Critical value for ??!,! at the 1% level is 4.61 à Thus we cannot reject the null hypothesis that both Age and Poor are both equal to zero at the 1% level. c) Your friend tells you the only variable that matters is the size of the house. You test this by running the following regression: ?????????? = 142.4 + 0.178?????????? (31.5) (0.013) R2 = 0.58, SER = 56.4, n =250 What is the value of the homoscedasticity-only F-statistic that tests the null hypothesis H0: ?1 = 0, ?2 = 0, ?4 = 0, and ?5 = 0, in the original regression? F = [(0.72-0.58)/4]/[(1-0.72)/(250-5-1)] = 30.5 6) DATA Question: Use the dataset fatality.xlsx provided on the class website to answer the following questions. The dataset contains one observation for each of the lower 48 states for each year between 1982 and 1988, for a total of 336 observations. We will use the following variables from this dataset: • mrall: Rate of fatal motor vehicle accidents in a given year • dry: Percentage of the state’s population living in “dry” counties (i.e. counties that prohibit alcohol sales) • perinc: Measures per capita income a) Regress the rate of fatal motor vehicle accidents on the percentage of dry counties in a state. What are the results? (Remember to use robust standard errors.) ?????????? = 0.2008 + 0.0076 ? ?????? (0.003) (0.000) Alcohol control laws increase motor vehicle fatalities in that a 1% increase in dry counties leads to a .0076 increase in motor vehicle fatality rates. b) Test the null hypothesis that the slope coefficient on dry is equal to zero at the 5% significance level. What does that tell you about the effect of alcohol control laws on motor vehicle fatalities? t = 3.986 Reject null at 5% level. These results imply that that alcohol control laws have a positive and statistically significant effect on motor vehicle fatality rates. c) Run a second regression of the rate of fatal motor vehicle accidents on the percentage of dry counties and per capita income (again using robust standard errors). What are the results? What do these results tell you about the effect of alcohol control laws on motor vehicle fatalities? ?????????? = 0.3867 ? 0.0003 ? ?????? ? 0.000013 ? ???????????? (0.17) (0.000) (0.000001) Alcohol control laws have a negative effect on motor vehicle fatalities. d) What is a good explanation for why the estimated coefficient on dry in part (c) has the opposite sign of the estimated coefficient on dry in part (a)? There is omitted variable bias in part (a) since income could negatively effect fatality rates and also could be negatively correlated with percent of dry counties. This will lead to a positive bias of the coefficient on dry counties in the regression in part (a). e) Test the null hypothesis that the coefficient on dry equals 0 at the 5% significance level. Test the null hypothesis that the coefficient on perinc equals 0 at the 5% significance level. Test the null hypothesis that both the coefficients on dry and perinc are equal to 0 at the 5% significance level. t = -1.45 à Can’t reject the null that the coefficient on dry is equal to zero. t = -11.78 à We can reject the null that the coefficient on perinc is equal to zero. F = 91.58 à Using the F-stat we can reject the joint null hypothesis that alcohol control laws and per capita income don’t have any effect on traffic fatality rates. Remaining Time: 1 hour, 59 minutes, 31 seconds. Question Completion Status: A Moving to another question will save this response. Question 1 of 10 Question 1 10 points Save Answer In an OLS regression, the slope estimator, B1, has a larger standard error, other things equal, if there is a small variance in the error term u the intercept, Bo, is large there is more variation in the explanatory variable X the estimator converges the sample size is smaller A Moving to another question will save this response. Question 1 of 10 Purchase answer to see full attachment Tags: Regression EnzymeLinked Immunosorbent Assay random variable User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.