6SSMN961: APPLIED ECONOMETRICS
MIDTERM COURSEWORK 2024/25
INSTRUCTIONS TO CANDIDATES:
1. The deadline for submission of the coursework is Thursday 7th November at 10:00 am GMT. Work should be submitted on KEATS.
2. The file that you upload on KEATS should contain two parts:
• Short written answers to the questions
• The STATA output in pdf format
You can merge two pdf files using Acrobat Professional or an online pdf merger.
3. The word limit is 1,000 words, excluding STATA output and the cover sheet.
4. Your submission should be your own words and not be generated by AI software.
5. You must complete the coursework coversheet. This is very important to ensure that your work can be identified. In addition, you should name the file with your candidate number as follows: Candidatenumber.pdf.
6. To avoid collusion, each student is given a unique version of the datasets. This means that you should answer the questions with the datasets that have been provided to you. If your answers or the STATA output file are based on the datasets given to another student, you will lose marks and face an allegation of collusion. Because the datasets are different, you should not expect to replicate the results in the papers exactly.
This question is based on a paper by Lee, Miguel and Wolfram (2020) which presents results from an experiment that randomised the expansion of the electricity grid in rural Kenya. In April 2014, the authors randomly divided communities into treatment and control groups. 380 unconnected households in 25 communities were in the treatment group and received an opportunity to connect to the electricity grid with a 100% subsidy, which meant they could connect for free. The control group consists of 1,150 unconnected households in 75 communities who received no subsidy. Between May and August 2014, each treatment household received a letter describing a limited-time opportunity to connect to the electricity grid at a subsidised price.
The dataset electricity.dta contains information on household and community characteristics at baseline (between February and August 2014), a treatment indicator equal to 1 if the household was randomly assigned to be in the treatment group and 0 if the household was in the control group (variable treatment) and an indicator equal to 100 if the household was connected to the grid after 2014 and 0 if not (variable connected).
a) Complete the table below comparing the means of a set of observable household characteristics of the control and the treatment groups and the p-value of the ttest of the difference between the means of the two groups. Do the two groups appear balanced? Explain why it is important to check for balance. (16 marks)
Differences between electricity grid control vs. treated households at baseline
|
Control
|
Treatment
|
p-value of difference
|
Number of members
|
|
|
|
High-quality walls (%)
|
|
|
|
Age (years)
|
|
|
|
Attended secondary schooling (%)
|
|
|
|
Senior citizen (%)
|
|
|
|
Chickens
|
|
|
|
Not a farmer (%)
|
|
|
|
Has bank account (%)
|
|
|
|
Employed (%)
|
|
|
|
b) Explain how you can use the information provided to test whether the subsidy increases the probability of connecting to the electricity grid. Write down the estimated equation and explain in detail. (12 marks)
c) Estimate the model in part b) with and without controlling for household and community characteristics. Cluster the standard errors by community (variable siteno). Interpret the results. Explain why you would want to use clustered standard errors. (14 marks)
d) Does the inclusion of controls in part c) have a substantial impact on the results? Explain in detail. (12 marks)
e) You would like to test whether the effect of the subsidy on the probability of being connected to the grid is larger when the household head is more educated. The hypothesis is that more educated individuals understand better the benefits of having electricity.
i. Explain how you can adapt the model in part b) to test this hypothesis. (16 marks)
ii. Estimate this modified model, including controls for household and community characteristics. Explain the results in detail. (12 marks)
You would like to test the effect of being connected to electricity on some energy outcomes and some noneconomic outcomes. The dataset outcomes.dta contains data on these outcomes and the treatment indicator.
To test the effect on these outcomes, you use the model in part b), but instead of having the take-up indicator as the dependent variable, you look at energy outcomes and non-economic outcomes.
f) Estimate the model separately for five energy outcomes: number of appliance types owned, owns mobile phone, owns radio, owns television, and owns iron. Interpret the results. (11 marks)
g) Estimate the model separately for two noneconomic outcomes: life satisfaction and the political and social awareness index (which captures whether the household head was able to correctly identify the presidents of Tanzania, Uganda, and the United States). Interpret the results. (7 marks)
References:
Lee, Kenneth, Edward Miguel and Catherine Wolfram (2020). “ Experimental Evidence on the Economics of Rural Electrification”, Journal of Political Economy, Vol. 128, No. 4.