代写PUBPOL 5310 Applied Multivariate Statistics PROBLEM SET 2 Fall 2025代做留学生SQL 程序

PUBPOL 5310

Applied Multivariate Statistics

PROBLEM SET 2

Fall 2025

Due 11:55pm, Monday, September 22, via Canvas

•      Work in teams of 2-3 (or solo, if you strongly prefer).  You can choose your own teams.  Clearly indicate all the members of the team at the beginning of the problem set.  Turn in one problem set per team (not one per person).

•      Keep answers as brief as possible, and include key Stata output (charts and descriptive statistics) with your answers. Be sure to label your charts and output clearly, and to indicate which question each chart is intended to answer.

•      Turn in the  main problem set as one file only, not several documents.   PDF preferred.   Clearly label the problem set file (example: “PS 1 – PADM 5310 – Olivero Miller.pdf”)

•    NEW: However, also turn in related .log and .do files (if relevant), as separate uploads.

•      Include  relevant  Stata  commands  and output  (such as tables or “summarize” output) in your answers, so that we know what commands led to what results

•      When cutting and pasting Stata results into your Word document, use “Courier” or “Courier New” or other fonts that preserve the neat formatting in Stata

1. Bivariate regression with cross-sectional data.

For this question, use the data on Hourly Wage and education of US residents in 2022 (for those

individuals working “full time and full year”), given in the data set CPS-ASEC-2024_fall25.dta on the course website (in the “Data” folder).

a.       Look at the variables.  Use “summarize” for the variables wage, age, education, and male and female, and “tabulate” for education.  Do the summary statistics look consistent with your expectations?  Explain.

b.       Run a linear regression of WAGE on Years of education using the “regress” command.

c.       Interpret the slope coefficient  – what does it tell us in words?  Is this reasonable?

d.       Interpret the intercept coefficient.  What does it tell us in words?  Is this reasonable?

e.       Based on the regression output, what is the predicted wage for someone with 12 years of education?  Show your work.

2. Predicting after a regression

a. Immediately following the regression in the previous problem, generate a new variable that is “predicted wages”.  You can do this in Stata with “predict wage_hat” . (If you want to give your new variable a different name than wage_hat, that is fine too.)   Next, generate a new variable that is the residual from the regression.  You can do this in Stata with “predict wage_residual , resid” .  (If you want to give your new variable a different name than wage_residual, that is fine too.)


b. Find someone in the dataset with 12 years of schooling.  Confirm that their predicted wage is the same as your answer to the question above.

c. Confirm for a few sample cases that the residual is indeed equal to the actual value minus

the predicted value.  (Hint:  try “list wage_per_hour wage_hat wage_residual in 1/10” to get a listing of these variables for the first 10 observations in the dataset.  (You only need to show your calculation for one observation; but confirm for yourself that this is what is going on).

d.   Graphically show what is going on with the predicted values.  You can do this with a command such as:

i.   graph twoway (line wage_hat years_education)

ii.   or alternatively:

graph twoway (line wage_hat years_education) (scatter wage_per_hour years_education)

e.   Do the predicted values make sense?  Do they look like a “best fit line”?  It seems like for

low educated individuals, the predictions are systematically strange.  Why do you think that this is happening?

3. Creating conditional averages as a way to clarify data presentation. For this question, you will need the Stata data set CPS-ASEC-2024_fall25.dta, available for download in the “Data” folder on our course website.  Let’s start with a graphical representation of the relationship between years of schooling (“years_education”) and wage (“wage_per_hour”).

a.   Create a scatter plot with wage on the Y axis, and years of schooling on the x-axis.   This will look a little weird!  Why do you think the graph has these vertical lines?

b.   It’s even worse than it looks.  Most of the data are “smooshed together” down in the lower range of the wage y-axis.  You can get a better sense of this by making the “marker size” smaller.  Add an option to your command “ , msize(tiny)”, and show the resulting graph. This helps, but it’s still not clear.

c.    Before proceeding further, use stata’s “help” command to look up the following commands:  preserve, restore, collapse.  We will use collapse to compute “conditional averages”.  But this will alter the data in Stata’s memory, and we will later want to return to the main data.  The commands “preserve” and “restore” will help us with that part.

d.   Use the “collapse” command to compute average wages for each value of years of schooling.  Hint:  this will require use of the “, by(years_schooling)” option.

e.   Using the “list” command, confirm that your new dataset now has only one observation per year of schooling.  Create a scatter plot on this transformed data.  Is the relationship in this graph more clear, or less clear, than in your answers to (a) and (b) above?

f.    Next, let’s return to our main data, and add a twist. [use can use “restore” command, if you previously preserved the data.]  Now (after “preserve”-ing again) collapse your data to “years of education by female” cells.  You should end up a dataset with 28 observations:

one for each year of schooling for men (female == 0) and for women (female == 1).  Create a scatter plot with two different colors, one for men and one for women.  You can do this with a command like:

graph twoway (scatter wage year if female == 1) (scatter wage year if female == 0)

What do we learn from this graph about how wages vary across education and gender?  How does the gender gap in wages change across different levels of education?

g.   What is the magnitude (in $/hour) of the gender wage gap for those with 12 years of schooling?  For those with 16 years of schooling?

h.   Now restore the main data set, and then repeat part (d), except instead of looking at “years of schooling” collapse to “age by gender” cells.  Plot out the life-cycle pattern of wages for men and women.  Is the gap consistent over all ages, or does it grow at certain parts of the life-cycle?


4.  IPUMS Account creation and exploration

The most natural place to go for data for your independent research project is the IPUMS website.  For data related to demographics and labor market outcomes in the US economy, the best data is the CPS.  This is the dataset that the government collects to calculate the monthly unemployment rate, and to measure trends in poverty, etc.  You can access the raw data at https://cps.ipums.org/cps/.  This problem has two parts.

NOTE: Please include and answer for (a) and (b) for each member of the team, not just one answer per team.

a – each member of your team should request an account to use IPUMS-CPS.  For this part of the problem set, just verify that this has taken place.  (it’s okay if the account has not yet been  approved by the submission deadline.)  Let us know in the problem set with a screen shot or something similar that this has taken place for each member of the team.

b – Each member of your team should browse the variable listings, and daydream/brainstorm what variables you would each like to explore.  In particular, I’d like you to think of 1-4 “outcome variables” that you would like to predict, and 2-6 “predictor” variables that you would like to use to predict them.  You can search for variables (and see for what years/months those variables are available) at https://cps.ipums.org/cps-action/variables/group .  For this part of the problem, for each member of the team clearly indicate the “outcome” and “predictor” variables you are interested in, and provide a very brief motivation for why you think it would be interesting to examine relationships between these variables.  Also, indicate what time periods and/or other sample restrictions that you would like to examine.



热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图