Empirical Finance Spring II 2025
Assignment 2
Q1. (70pts) Cross-sections of returns
Download “q1data.xlsx.” You will find 2 log stock returns (labeled “A, B”) and 4 factors (labeled “MKT-RF, SMB, HML, RF”) covering the period from January 2015 to December 2024. The data are in monthly intervals and expressed in percentage terms.
1. (20pts) Subtract RF from each of A and B and regress on MKT-RF. This is the empirical regression for CAPM. Report the estimated β and adjusted R2
from the regression. Note that you are estimating the CAPM equation for A and B stocks separately.
(Grading rule: Any deficiencies will result in deductions in increments of 10 points.)
2. (20pts) Similarly estimate the Fama-French 3 factor model for A and B stocks. Com-pare the estimates of β with those from the CAPM equation.
(Grading rule: Any deficiencies will result in deductions in increments of 10 points.)
3. (30pts) Based on the coefficient estimates for “SMB” and “HML” factors, discuss the characteristics of A and B stocks.
(Grading rule: Any deficiencies will result in deductions in increments of 10 points.)
Q2. (110pts) Event study
Please open the file “q2data.xlsx”, where you’ll discover the stock returns for Companies A and B from Day 1 to Day 100. Under “Text 1”, you’ll find headlines related to the Banking (B) sector, while “Text 2” contains headlines for the Commodity (C) sector. Notably, there is a dummy variable indicating an important change in the company A.
1. (20pts) Conduct sentiment analysis on “Text 1” and “Text 2”. For each case, you will create a vector (length 100) with sentiment scores (sentiment score is set to zero if there is no news released on that day). Let’s call this S1 and S2. For this, you should rely on Python codes that I uploaded in canvas. Note that when there is no news, it will read as NaN value. You need to adjust the code to handle the NaN values. Report the sample averages of the two sentiment score vectors.
(Grading rule: There is no partial credit for this question.)
2. (20pts) With the two sentiment score vectors in hand, our objective is to explain the returns for B. Regress B returns on each of sentiment vector (S1, S2) and report the coefficient estimates. So, two regressions (i) regress B returns on constant and S1 and (ii) regress B returns on constant and S2. Based on the regression results, infer whether B is affiliated with the B sector or the C sector.
(Grading rule: You are required to report the coefficient estimates on the sentiment vector and the R2 value from each regression. Failure to do so will result in deduc-tions in increments of 10 points.)
3. (20pts) So far, your analysis has assumed a linear relationship between sentiment and returns.However, stock returns may exhibit asymmetric reactions to sentiment. Construct two new variables
S1p,t = max(S1,t, 0), S1n,t = min(S1,t, 0),
to decompose the sentiment score S1,t into positive and negative components. Esti-mate the following regression
rB,t = α + βS1p,t + γS1n,t + ∈t
and report the estimated coefficients and adjusted R2. Based on your results, does Company B react more strongly to positive or negative sentiment? Does this model deliver a higher R2
than the one using undifferentiated sentiment?
(Grading rule: There is no partial credit for this question.)
4. (20 pts) Begin by reporting the full sample correlation of returns for A with B. Next, provide the sample correlation of returns for A and B during periods when dummy is zero. Finally, report the sample correlation of returns for A and B during periods when dummy is one. What conclusions can be drawn regarding the characteristics (B versus C) of company A following dummy of one?
(Grading rule: You should discuss any significant changes in correlation patterns.
Any deficiencies will result in deductions in increments of 10 points.)
5. (30 pts) From the previous question, we can infer that company A went through structural change after dummy turned one. Let’s accommodate this feature into the regression model. Your goal is to reach the highest adjusted R2 value as possible.
(Grading rule: You will get the full 30 pts if you get the adjusted R2 value higher than 0.90. You will receive 10 points for anything below 0.90.
Q3. (20pts) Reading
Read Expected Returns and Large Language Models and summarize in three paragraphs.
(Grading rule: You will get the full 20 pts.)