BUSI4528-E1
A LEVEL 4 MODULE, AUTUMN SEMESTER 2020-2021
QUANTITATIVE RESEARCH METHODS FOR FINANCE AND INVESTMENT
1. a) Answer the following points regarding the Linear Probability Model:
(i) Using standard mathematical notation, outline the Linear Probability Model for binary dependent variables. [20 marks]
(ii) Explain how the LPM can be used to predict binary choices. [5 marks]
(iii) Explain how to interpret the regression coefficients in the LPM. [10 marks]
(iv) Explain what are the main shortcomings of the LPM? [15 marks]
b) Consider a model explaining the monthly sales of a popular brand of coffee as a function of its price and the average price of two competitors. Also included is an indicator variable disp=1 if there is a store display but no newspaper ad during the month for the target brand, and 0 otherwise. The indicator variable dispad=1 if there is a store display during the month for the target brand and newspaper ads, 0 otherwise. The estimated results were obtained using Stata:
Sales: logarithm of 1000’s boxes sold
Price: average price of the target brand in $ for a given month
Price1 and Price2: average prices of two competitors.
(i) Write down the regression model and interpret the meaning and significance of each coefficient, including the intercept. Are the signs and the relative magnitudes for the advertising variables consistent with economic logic? [25 marks]
(ii) Label the parameters in the equation β1, β2 … β6 . With β5 and β6 corresponding to the coefficients of disp and dispad, respectively. If the null hypothesis is H0 : β6 ≤ β5 , state the alternative hypothesis. Why is the test of this null hypothesis against the alternative hypothesis interesting? Carry out the test at the 1% significance, given the calculated t-value is 6.34. What do you conclude? [15 marks]
(iii) What is an indicator variable and how to avoid the indicator variable trap? In the above regression, assume there is another indicator variable : ads=1, if there is newspaper ads for target brand, and ads=0 otherwise. Explain how to obtain an interaction variable of the indicator variables disp and ads. [10 marks]
Total [100 marks]
2. a) Compare and contrast the Durbin-Watson (DW) d-statistic test with the Lagrange Multiplier (LM) approach to test for serial correlation in time series models. [50 marks]
b) We try to explore the relationship between the cost per student and related factors at four-year colleges in the U.S., covering the period 1987 to 2011. We run a OLS regression using Stata, and the results are as below:
where lntc is the logarithm total cost per student, ftestu is the number of full time equivalent students, ftgrad is the number of full-time graduate students, tt is the number of tenure track faculty per 100 students, GA is the number of graduate assistants per 100 students, and CF is the number of contract faculty per 100 students, which are hired on a year to year basis.
(i) Write down the regression model. [10 marks]
(ii) If we consider university ‘identity’ as a factor that could affect average cost per student, how should we adjust the estimation? Write down the new model and explain the differences with the model in point (i). [25 marks]
(iii) What is the F-test used for? Comment the F-test result for the above model. [15 marks]
Total [100 marks]
3 a) Outline the setup of the Logit model for binary dependent variables. [35 marks]
b) Explain why the conventional R-square index is not a valid measure to evaluate the goodness-of-fit of the Logit model. Discuss which other measures of model fit should be used instead. [15 marks]
c) We estimate a regression describing the relationship between the cost per student and related factors at four-year colleges in the U.S., covering the period 1987 to 2011, where lntc is the logarithm total cost per student, ftestu is the number of full time equivalent students, ftgrad is the number of full-time graduate students, tt is the number of tenure track faculty per 100 students, GA is the number of graduate assistants per 100 students, and CF is the number of contract faculty per 100 students, which are hired on a year to year basis. One of the test results in our analysis is as follows :
(i) Explain what this test is used for, and explain what you can learn from the result of the test. [10 marks]
(ii) Explain the differences between fixed effects model and random effects model. [25 marks]
(iii) Describe what heteroscedasticity is, and discuss what the consequences of the presence of heteroscedasticity are for linear regression analysis. [15 marks]
Total [100 marks]
4. a) Explain how you could test the null hypothesis of no cointegration between two time series variables. [40 marks]
b) Explain what is meant by ‘spurious regression’ . In what sense should empirical analysis be cautious of it? [10 marks]
c) Discuss how the difference-in-difference (DID) estimator (specify the DID regression model) might be used to test for a potential treatment effect of a policy reform. Outline the key assumptions for the DID estimation. Use graphs where necessary. [50 marks]
Total [100 marks]