Manchester College Stata Data Analysis Questions

Description

1 attachmentsSlide 1 of 1attachment_1attachment_1.slider-slide > img { width: 100%; display: block; }
.slider-slide > img:focus { margin: auto; }

Unformatted Attachment Preview

Applied Economic Methods
BUSM112: Instructions for Data Assignment
This data assignment, whether written individually or in a team of up to four people,
represents 40% of the total mark.
The assignment is due on 6 April 2022 15:00am and must be submitted in the dropbox that is
available in the module’s QMPLUS website.
For those wishing to write the assignment in teams (up to four people): You are
responsible for selecting your teammates. In the remote case that you and your teammates do
not get along (e.g. due to free-riding issues, clash with work agendas, etc.) then you MUST let
the module organiser know (r.gutierrez@qmul.ac.uk) by 1 April the latest so you are allowed
to submit the assignment individually, rather than in teams. In this case, you might be sent
new personalised files.
Each team needs to submit only ONE data assignment per team. The team leader is the
student responsible to submit the assignment in QMPLUS by the stated deadline 6 April
2022.
Submitted team assignments are understood to have been the product of the team members
and the assignment overall mark will be given equally to each teammate with no exception.
All data assignments, whether written individually or in a team of up to four people, needs to
include:
1. The first page of the assignment should be the filled cover sheet in word version (available
in second page of this document). This sheet needs to include all the required QMUL id of the
student(s) composing the team).
2. All the data assignment must be submitted in word version only.
3. Below each question you must include any relevant tables (e.g. tests, regression tables).
These can be copied as a picture into word to preserve format.
4. The appendix of the assignment must include the syntax needed to replicate your results.
This syntax can be copied from your do files and copied directly into the word document. In
the do file you can include comments, and notations that might make it easier for markers
understand and mark your syntax. Failing to include your syntax will result in a heavy penalty
and you might be referred to the school Assessment Offence Officer for the assignment to be
investigated for plagiarism.
5. DO NOT include your log file. (If your do files are well documented anyone should be able
to replicate your results).
Failure to follow points above 1-5, will result in marks deducted from the assignment.
Assignments handed after the deadline will be penalised according to the students’ handbook,
unless you have an explicit permission to submit late (granted before the deadline) due to
extenuating circumstances. Beware that each team and students wishing to do the assignment
individually will receive personalised files to prevent collusion. All submitted assignments
will be screened using turninit, and those suspected of having committed an academic offence
(e.g. colluding with another team to produce same assignments, plagiarism or ghost writing)
will be referred to the school’s Assessment Offence Officer which might result in hefty
penalties.
1
Data Assessment Feedback Form
MODULE CODE
BUSM112
MODULE TITLE
Applied Economic Methods
Assignment Type
40% Data assignment
Student 1 QMUL ID
Fill in here your QMUL id
Student 2 QMUL ID
Fill in here the QMUL of second member if done in group
Student 3 QMUL ID
Fill in here the QMUL of third member if done in group
Student 4 QMUL ID
Fill in here the QMUL of fourth member if done in group
Each team needs to submit only ONE data assignment per team.
Marker(s) Initials Provisional
Late (no. of Penalty Overall mark
Mark(s)
days)
Marks to be
deducted
Checklist:
1. The first page of the assignment should be the filled cover sheet in word version.
2. All the data assignment must be submitted in word version only.
3. Below each question you must include any relevant tables (e.g. tests, regression tables). These can be
copied as a picture into word to preserve format.
4. The appendix of the assignment must include the syntax needed to replicate your results. This syntax
can be copied from your do files and copied directly into the word document. Please you can include all
your comments, and notations that might make it easier to read and mark.
5. Please do NOT include your log file. (If your do files are well documented anyone should be able to
replicate your results).
Note that submitted team assignments are understood to have been the product of the team
members composed this group and the assignment overall mark will be given equally to each
teammate with no exception.
COMMENTS
2
Applied Economic Methods
BUSM112
Data Assignment
The data assignment consists of three parts: A, B and C. The answers to these three parts must
be typed. The strict word limit is 1500 words excluding the word count of provided
instructions/questions, tables, and do files. Any extra text exceeding this word limit will not
be read and will not be marked.
Part A (20 marks)
File parta.dta contains information of a randomized intervention. In this randomized
intervention 1,000 children were treated with a dosage of fish oils on a daily basis for three
months.
The intervention then compared the test scores of the treated students with a group of students
that randomly received a placebo. Neither of the participants knew whether they were given
the real fish oils nor the placebo.
1) Using t-tests explain whether the treated and control groups have on average same
characteristics?
[5 marks]
2) Estimate the impact of the intervention, by comparing the outcome (the student’s test
scores) after the intervention between the treatment and control groups. For this purpose, use
a t-test clearly explaining the impact of the intervention (if any) and whether this impact is
statistically significant.
[5 marks]
3) Using an OLS regression estimate the impact of the intervention by comparing the test
scores between the treatment and control groups whilst also controlling in the same
regression for other covariates that might have affected the outcome. Explain if your results
differ in sub-questions 2) and 3). If so explain which results are more reliable of the true
impact.
[5 marks]
4) Test whether the OLS regression used in option 3) suffers from any violations for OLS to
be reliable and BLUE. If there are any violations, then try correcting for these violations
clearly explaining your rationale for these corrections. Note: Some of you will receive files
where it will not be possible to correct some of these violations. In those cases, just explain
briefly how you tried to correct. Avice, do not spend over an hour on this sub-question 4.
[5 marks]
3
Part B (30 marks)
File partb.dta contains information of a non-randomized intervention. The intervention
consisted of providing job training to people working in fast food industry in New Jersey in
USA. The training provided courses on IT, numeracy and customer service. The people used
as a “control group” were also working in the fast food industry but in Pennsylvania state.
Independent researchers hope to investigate whether the intervention had any impact by
comparing the change in earnings (measured in natural logarithm, learnings) in participants of
the programme in New Jersey before and after the programme was implemented to those of
the control group in Pennsylvania.
1) Estimate and interpret the impact of the programme using the difference-in-difference
estimator using panel fixed effects.
[10 marks]
2) Estimate and interpret the impact of the programme using the difference-in-difference
estimator combined with kernel matching. To match people use the following variables: bk
kfc mc wendy.
[10 marks]
3) With the data provided, test whether the treatment and control groups are statistically
similar before the intervention took place and discuss whether this might affect the reliability
of the difference-in-difference estimators obtained above.
[10 marks]
Part C (50 marks)
File partc.dta contains information from a real policy programme implemented in Colombia
in the 1990s that aimed at increasing education attainment among poor people. To this end,
the World Bank gave a secondary school voucher to poor children that wished to continue
with their education at secondary level.
These vouchers covered about half of students’ schooling expenses and were renewable
depending upon students’ performance.
Given that the programme did not have enough funds to give vouchers to all poor children,
these vouchers were randomized through a lottery among eligible households.
The variable won_lottry denotes whether the student won=1 or lost the lottery=0.
The variable use_fin_aid denotes whether the student used the voucher or any other sort of
scholarship=1 or not=0.
To estimate the impact of this school voucher programme, all students were tested after the
intervention. The file provides information on the students’ tests scores (lscores) including
those who won and not the voucher. Note that this test score variable is already measured in
natural logarithm.
4
Questions for part C:
1) Using a simple OLS regression estimate the following regression:
lscores =?+?1 won_lottry + ?2 male+ ?3 base_age + ?rror
Interpret the coefficient of having won the lottery (variable won_lottry). In your
interpretation be clear on whether this variable has a significant impact on the dependent
variable, the scores obtained (lscores), and the magnitude of this coefficient.
[10 marks]
The regression estimated in question above is likely to be biased. As you can see in the
dataset, some students that won the lottery ended up not using the voucher. Also some
students that did NOT win the school voucher still managed to go to secondary school as
obtained other scholarships or funding (use_fin_aid). Thus, a simple comparison in test
scores between winners and losers of the lottery is likely to give a biased estimate of the
intervention.
Thus, researchers from the MIT and Stanford have suggested to identify the effect of this
intervention on test scores using instrumental variables.
These researchers suggest to investigate what is the impact of use_fin_aid on test scores.
Since use_fin_aid is likely to be endogenous, the researchers suggest to use the variable
lottery (won_lottry) as its instrument.
The researchers argue that having won the lottery (won_lottry) is a good instrument as it is
random, and very closely correlated to having obtained a school voucher.
2) So your tasks for question 2. Run an instrumental variable regression using as main
dependent variable, lscores, the test score variable.
The main covariate of this regression is use_fin_aid. Since use_fin_aid variable is likely to
be endogenous, use as instrument whether the student was winner or not of the lottery
(variable won_lottry). In your IV regression also control for male and base_age as additional
covariates.
Interpret your results of both the first and second stage IV regression (the regression
coefficients). The results of both stages need to be presented as well as tables.
[20 marks]
3) Explain what characteristics a good instrument should have to deal with endogeneity and
whether the instrument used in question above satisfies these characteristics. Show exactly all
the tests you used to formulate your answer.
[10 marks]
4) Using endogeneity tests explain which results, if those of OLS or IV, offers a more reliable
estimate of the impact of the intervention.
[10 marks]
5

Purchase answer to see full
attachment

Explanation & Answer:
450 Words

User generated content is uploaded by users for the purposes of learning and should be used following Studypool’s honor code & terms of service.