ECO205
1st SEMESTER 2024/25 Group Project
BSc Actuarial Science – Year 3
BSc Economics – Year 3
BSc Financial Mathematics – Year 3
BA English and Finance – Year 3
ECONOMETRICS I
Group Project
General guidelines
This group project is an integral component of ECO205 and it contributes to 35% of your module mark. Please choose a socioeconomic phenomenon or relationship (see guidelines below on choosing topics) that involves two or more variables and study this phenomenon or relationship using real world data and statistical models you learn in ECO205. As a stand-alone empirical study, your report is expected to follow the structure of a typical academic research (see more about the recommended structure later). Your submission is subject to Turnitin to check for similarities. Cases of academic dishonesty will be penalized according to university policy.
The topic may come from your own experience/knowledge (as an economist), textbook examples (with proper modification), or the academic literature. You are free to choose any topic, but please bear in mind that (1) it must make use of regressional models, and (2) it must be properly motivated (i.e., why is it important/useful to investigate the specific problem). Please also note that even though the statistical methods and models presented in ECO205 is sufficient to produce many interesting results, you are free to use more advanced statistical methods if they provide additional information or fit your purpose.
Guidelines on choosing topics
If you don’t know where to start from, there are some good references that you want to check. Potentially, you may find research topics from the following sources.
1. The first source is your textbooks in other fields of studies (micro/macroeconomics, labor economics, international economics, finance, etc.). Usually these textbooks cover a wide range of economic or financial theories which you can test with real-world data. For example, you learned the concept of production function in micro/macroeconomics and you may want to estimate a parametric form. using city-level data on capital stock, labor input, and output for a given year.
2. A second source of topics is the academic literature. Google Scholar is the best place to search the academic literature. Type a key word and it will return hundreds of articles. You may read an article arguing that the urban land use is determined by income, population, and urban transportation conditions. Following this article, you can collect data from China City Statistical Yearbook 2018 on (1) urban population, (2) per capita income, (3) transport infrastructure, and (4) urban land use and analyze how the first three factors may affect urban land use.
3. A third source is textbooks in econometrics. Most econometric textbooks emphasize empirical examples or exercises. Thus, they provide a large pool of potential topics. The easiest approach is to take one of the problems and apply the empirical model to your own data.
4. Of course, your topics are not restricted to the sources mentioned above. I also encourage you to find your own topics through deep thinking. Deep thinking produces interesting research questions. To give an example, you may model housing price to be jointly determined by demand and supply factors. However, there are many of them. It is then your job to narrow down to a few major factors and collect data accordingly. These cannot be done without deep thinking. Even if you adopt a research question raised by others, deep thinking will help you refine the question and generate new insights. For instance, in the model of Chinese housing price, you may want to consider factors overlooked by others but may be important in the Chinese context, such as administrative hierarchy and geographical location. These factors may bring further insights into your results.
Below are a few exemplary topics:
• Estimate aggregate production function using regional (province- or city-level) data.
• Estimate determinants of pollutants emission using regional data.
• Estimate determinants of housing price using regional data.
• Estimate β-convergence using national data.
• Estimate the environmental Kuznets curve using national data.
Although there is no restriction to the scope of topics you may try, to ensure that you obtain meaningful results from the analysis, please adhere to the following principles.
1. Please make sure you test an economic model, rather than an accounting identity. An economic model is a hypothetical functional form. (according to some theory) that describes how one variable is determined by other variables. The exact form. of this function is unknown and must be estimated using real-world data. For instance, economists often view the entire economy as a factory, where inputs (capital and labor) are converted into outputs (GDP) using a certain technology. A commonly adopted functional form. is the Cobb-Douglas one, i.e., Y = AKα Lβ , where Y stands for GDP, K for capital stock, L for labor input, and A is called the total factor productivity (TFP). In this formulation, the parameters α and β are unknown, which can be estimated using real-world data. Within the regression model, we can test whether the technology exhibits constant returns to scale (α + β = 1), increasing resturns to scale (α + β > 1), or decreasing returns to scale (α + β < 1).
Accounting identities, on the other hand, are known formulas that must be universally true. This statement has two implications. One, the parameters of the formula are all known, which means there is no need to estimate them. Second, the relationship must be always true for any data set. To illustrate, let’s consider the well-known GDP decomposition by expenditure type: Y = C + I + G + NX, where Y stands for GDP, C for personal consumption expenditures, I for private investment, G for government spending, and NX for net export. This is an accounting identity because the use of outputs must be one of the four types. Thus, their sum must be GDP. Here we have a linear function in C, I, G, and NX, but their coefficients are known to be unity. Hence, it is meaningless for you to estimate this equation.
2. Data must be available for all the variables in your model. You cannot perform. econometric analysis without data. Data availability is usually a major challenge for empirical studies. Using the Cobb-Douglas production function as an example, usually data on GDP (or value added) and labor input (employment) are relatively easy to obtain, but data on capital stock are seldom provided by the statistic bureau. If data on capital stock is unavailable, in principle the estimation cannot be done. In this very example, there are ways to overcome this data problem, but I don’t plan to elaborate here.
As another example, you may conceptualize a relationship between IQ and students’ academic performance, controlling for effort. Although measures of effort are relatively easy to construct (attendance, hours of study, etc.), a reliable measure of IQ is usually difficult to obtain. Imaginably you need to ask the subject to undergo an IQ test, which is very costly and difficult to implement.
If your study employs country-level, province-level, or city-level aggregate data, please keep in mind that government agencies or international organizations are your only data source. Please check their websites or publications (statistical yearbooks) to verify that the data you need are available. If you plan to collect data by a survey, please think carefully about implementation issues.
If data availability is a problem, you have two options: First, you can change the proxy you are using for the variable of interest. For instance, if you need data on the number of permanent residents in cities, but such information is not provided, you can use the number of registered residents instead. Second, you can modify your topic by using a different variable. As an example, you may want to study the production function for the economy as a whole. In that situation you need productive capital stock for the entire economy. Suppose that data are unavailable but the statistical yearbooks do provide data on the capital stock of the secondary industry, then you can narrow down your topic to the production function of the secondary industry. Third, if both options are not possible, you had better think about a different topic for which data are available.
Guidelines on using data
A large sample is always recommended. Although it was mentioned in the lecture that the minimal sample size could be as small as 50, in empirical studies it is highly recommended that you have far more data. A sample size of a few hundred or more is preferred.
Aggregate socioeconomic data at the city-, province-, or country-level can be downloaded from online sources. Below are some frequently used ones.
Statistical yearbooks offered by CNKI (access from XJTLU library link):
XJTLU library home->Databases-> China Statistical Yearbooks Database
Data offered by the National Statistics Bureau (register to download):
http://data.stats.gov.cn/index.htm
World Bank Open Data (all indicators):
https://data.worldbank.org/indicator?tab=all
IMF data:
https://www.imf.org/en/Data#global
Eurostat:
https://ec.europa.eu/eurostat/data/database
OECD.Stat:
https://stats.oecd.org/index.aspx?lang=en
The Penn World Table:
https://www.rug.nl/ggdc/productivity/pwt/?lang=en A rich collection of online data sources (including U.S. labor survey data) compiled by the American Economic Association:
https://www.aeaweb.org/resources/data
Please note: Some data sources cannot be accessed from China, please find technical solutions.
Guidelines on designing the analysis
This module covers quite a few important methods, including the OLS regression model, test of a single parameter, test of joint hypothesis, test for heteroskedasticity and WLS, nonlinear model, instrumental variable, etc. You are expected to employ appropriate methods (potentially statistical methods not covered by this module) in your empirical analysis. Although there is no fixed rule for good research design, quality researches share these common features:
1. The analytical framework is carefully chosen to answer the research question and to analyze the data.
2. Alternative model specifications or extensions of the model are explored to extract further information from the data, to address data problems, and to consolidate the main findings.
3. The results are interpreted and analyzed in detail.
Please avoid these common mistakes among past students:
1. Trying all the regression models or analytical methods learned in this module. Please bear in mind that your ultimate objective is to answer research questions. The coursework is not supposed to be an exercise on everything you learn. Contents that are unrelated to the research question damage the quality of your work.
2. Presenting the analytical results without much interpretation. It is the interpretation, not the numerical results generated by software that answers the research question. Without proper interpretation, the results make little sense.
3. Copying the analytical framework of a past student work that earned a high mark. Their analysis serves their research question and their data, which are different from yours. Blindly copying other students’ analytical framework often results in a poor report.
Guidelines on format
1. I recommend no more than 2,500 words. This is not mandatory: the mark is not explicitly linked to the word count.
2. I recommend the following structure for the final report:
a. Title;
b. Motivation and research question;
c. Description of data sources, variable measurement, and empirical model (why the regressors are important determinants of the dependent variable and what are their expected signs);
d. Presentation of analytical results, interpretations, and statistical inferences;
e. Discussion of results and conclusion;
f. References (if any);
g. Appendix (see below).
3. All Stata code and regression output must be reported in the appendix, placed at the end of the report. You should also include figures and tables in the main text and tables should be formatted as those in the textbook (for example, Table 8.3, though you can skip the 95% confidence intervals). Please do not present tables or Stata code/output as screenshots.
4. Please use the accompanying MS Word template to prepare your final report. Please insert your digital signature as a picture in the cover page. Please do not alter the format (font, line spacing, page margin, etc.) of the first two pages of the document. Please submit your final report as a MS Word document. PDF files are not accepted.