2025-2026, Term 1
PSY4101 Assignment 2 (individual work): Data analysis problem set
Submit ONE SOFTCOPY (AS A PDF FILE) ON TURNITIN
Instructions
- Write/Type your name and student ID clearly in your submission.
- Mark the problem number and subpart number clearly for each answer you provide.
- Your answers can be handwritten, word-processed (e.g., using Google Docs or MS Word), or in a mix of both. For handwritten answers, please make sure your handwriting is legible.
- For all questions related to null-hypothesis significance tests (NHST), set α = 0.05 unless otherwise specified.
- Please ensure that all numerical answers are rounded to three decimal places.
- For each analysis in jamovi, please provide the syntax generated from jamovi (jamovi user guide for enabling syntax mode). This syntax allows us to trace your analysis, so that, if you make an error, we can identify the type of error and give you partial credit if possible.
- Use of generative AI declaration (Write statements (see Section B of the notes on assessment items for details) below regardless of whether or not you have used AI in any parts of your work.
If you have used any generative artificial intelligence (AI) (e.g., software supported by large language models or LMMs such as ChatGPT) in any part of your work, please include a “use of generative AI statement” at the end of your work to declare when, where, how, and how much you have used them in your work.
- Using generative AI per se does not result in any penalty, but failure to appropriately and adequately acknowledge the use of it could be regarded as plagiarism. If you use generative AI in your work, the more details you include in your statement, the less likely you will face such a penalty.
Below are some example statements:
“I asked ChatGPT questions about the topic, and it gave me a list of readings to start with. Below are the questions I asked ChatGPT and the answers given by ChatGPT…”
“I used generative AI to improve my writing / check for grammatical errors / get suggestions of words/expressions only after I had written the first draft all by myself.”
“I did not use any generative AI in any part of this work.”
Please insert your use of generative AI declaration in the box below:
IMPORTANT: 50% of your final mark will be deducted if the statements are not included
Assignment Scoring Instructions:
- Please note that the total score for this assignment is 125%. However, we will cap the maximum score at 100%. This means that if your score exceeds 100%, it will be recorded as 100%. If your score is 100% or below, it will be recorded as is.
- This structure provides you with "extra room" to earn points, not "extra points." It is meant to encourage you to do your best with less pressure of losing points.
Problem 1 (50%)
Pat is an analyst at a teaching-quality unit of a university and is interested in finding out the factors affecting students’ performance in exams. Specifically, Pat would like to study how personality (measured based on the Big-Five scale), behavior. before exam (measured as the number of hours spent on revision before an exam), and emotion before exam (measured as anxiety level) may influence a student’s exam performance (measured as exam scores). The dataset contains questionnaire data from a sample of 103 students.
Dataset for Problem 1: Problem 1_data.omv
A. (30%) One of Pat’s original hypotheses was that anxiety is a significant predictor of exam scores even after the effects of personality have been controlled for.
To test this hypothesis, first conduct a linear regression with exam scores being the outcome and all 5 personality scores as the predictors.
i. (6%) For each of the following assumptions of linear regression, state whether there is any evidence that the assumption was violated in this regression analysis. Report relevant statistics and/or figures to support your answer. If the assumption was violated, justify whether we should proceed with the analysis.
a. (3%) the linear-independence assumption
b. (3%) the normality assumption
ii. (4%) Write down the regression equation with the coefficient estimates.
iii. (5%) Which personality factor yielded the smallest p value as a predictor in this model? Test whether this factor significantly predicted exam performance and write a sentence to interpret the regression coefficient of this predictor.
iv. (5%) Did the model have significant explanatory power on exam scores? Write a sentence to report how much variability in exam scores was explained by personality and test whether it was significant.
Now, insert anxiety into the model with the 5 personality factors already included in the model.
v. (6%) Did anxiety provide any additional explanatory power, over and above personality, in predicting exam performance? Test this and report the results in no more than 2 sentences.
vi. (4%) Given the regression results above, what would you expect to see if you extract the part (aka semipartial) correlation between anxiety and exam scores with personality factors being partialed out? Briefly explain your expectation and verify it using jamovi.
B. (20%) Before collecting the data, Pat had two other hypotheses, which are stated below. For each of the hypotheses, conduct an appropriate analysis and write a short paragraph in a “result-section” style. (with relevant statistics and/or graph(s)) to support your answer.
i. (10%) Pat hypothesized that revision duration would moderate the effect of anxiety on exam performance by weakening the effect of anxiety on exam performance.
Include a simple-slope analysis to interpret how revision duration moderated the relationship between anxiety and exam performance.
ii. (10%) To explain how neuroticism influences revision duration, Pat hypothesizes that anxiety is a mediator. Report whether the mediation is complete or partial, and whether it is consistent or inconsistent.
Problem 2 (40%)
Charlie is interested in the genetic and environmental effects on the body metrics on newborns and collected data from a sample of 42 cases of newborns, including body metrics of each newborn such as body length, birth weight, head circumference, possible sources of genetic influence such as father’s and mother’s heights, and other environmental factors such as father’s years of education. One key factor that Charlie wanted to focus on was the smoking habit of the parents, especially the mother’s. Charlie hypothesized that a mother’s smoking habit has a negative effect on her newborn’s body metrics.
Dataset for Problem 2: Problem 2_data.omv
A. (12%) Charlie wondered whether the newborn’s body metrics are influenced by a gene-and-environment interaction. Specifically, head circumference could be the result of an interaction between the daily number of cigarettes smoked by the mother and the mother’s height.
Conduct an appropriate analysis with the genetic variable as the original predictor in the model to test this hypothesis. Report your results in a short paragraph, including the interpretation of the coefficient of the interaction effect, supported by a simple-slope analysis.
B. (18%) Charlie would like to find out what factors affect the daily number of cigarettes smoked by the mother as it seems to be an important environmental factor. Charlie noticed that the numbers of cigarettes smoked by the mother and father seem to be correlated, and a potential mediator could be the father’s years of education.
i. (12%) Conduct an analysis to test this hypothesis, and write a short paragraph in “result-section” style. (with relevant statistics and/or graph(s)) to support your answer. Report whether the mediation is complete or partial, and whether it is consistent or inconsistent.
ii. (6%) How strong does the result from this mediation analysis support a causal relationship among the predictor, mediator, and the outcome? Write a few sentences to justify your answer.
C. (10%) One of Charlie’s specific hypotheses was that the newborn’s birth weight would be lower for a smoking mother than that for a non-smoking mother, after controlling for genetic factors.
i. (5%) Test the above hypothesis using an appropriate analysis with father’s and mother’s heights being the two genetic factors to be controlled for. Report your results in a short paragraph.
ii. (2%) Write a sentence to interpret the regression intercept.
iii. (3%) Write a sentence to interpret the regression coefficient of the binary variable of “smoker” (0=non-smoker, 1=smoker) in the full model.
Problem 3 (35%)
You are a research assistant in the Department of Psychology at a large university. The department is conducting a study to understand the factors that influence students' academic performance, specifically their GPA. The university administration is particularly interested in how students' study habits and socioeconomic backgrounds impact their academic success, beyond their prior academic achievement.
To carry out this study, you have access to a dataset from a recent survey of undergraduate students. The survey collected information on their GPA, HKDSE scores, English test scores (IELTS), study habits, and socioeconomic status.
The variables available in the dataset are:
● GPA: The students' current college GPA (on a 4.0 scale).
● HKDSE: Best 5 scores
● IELTS: score of IELTS (Max score = 9).
● StudyHours: Average number of study hours per week.
● StudyEnv: Quality of the study environment, rated on a scale from 1 (poor) to 10 (excellent).
● FamilyIncome: Family's annual income (in HKD1,000 units).
● ParentsEdu: Parents' highest education level, coded as follows:
● 1: Primary School or below
● 2: Secondary School
● 3: High diploma or Associate degree
● 4: Bachelor's degree
● 5: Master's degree or above
Dataset for Problem 3: Problem3_data.omv
Note: This dataset is not real and is used for assignment purposes only.
A. The department has tasked you with analyzing this data to identify the key predictors of GPA. Specifically, they want you to control for prior academic achievement (HKDSE) and language ability (IELTS scores) and then test the additional impact of study habits (Quality of the study environment, study hours) and socioeconomic status (family income and parent’s education level).
i. (10%) Conduct a multiple regression analysis to determine whether GPA is predicted by previous test scores, i.e., HKDSE and IELTS scores.
a. (4%) How much of the variance in GPA can be explained by HKDSE and IELTS scores? Report the F-statistic and p-value for the overall model.
b. (6%) Whether each of the predictors (HKDSE and IELTS scores) significantly predict GPA? Report the findings for each predictor and write a sentence to interpret the regression coefficients of the significant predictor(s).
ii. (20%) Extend your analysis by including additional predictors related to study habits and socioeconomic status into the existing regression model.
a. (4%) How much of the variance in GPA can be explained by all predictors? Report the F-statistic and p-value for the new model.
b. (4%) After accounting for HKDSE and IELTS scores, did study habits and socioeconomic status significantly improve the prediction of college GPA? Write a sentence to report how much variability in GPA was explained by habits and socioeconomic status and test whether it was significant.
c. (12%) In the final model with all predictors included, does each of the study habit and socioeconomic status variables significantly predict GPA with other test scores (HKDSE & IELTS) already in the model? Report the findings for each of these predictors and write a sentence to interpret the regression coefficients of the significant predictor(s).
iii. (5%) Some argue that family income and parents' education may be highly correlated, potentially affecting the results due to one of the assumptions of multiple linear regression models. Which of the assumptions of multiple linear regression relates to this concern? Conduct appropriate test(s) to assess and address this concern.