代写STAT6128: Key Topics in Social Science: Measurement and Data Week 4 Supplementary worksheet代做迭代

STAT6128: Key Topics in Social Science: Measurement and Data

Week 4 Supplementary worksheet: data transformation in Stata; and using Statas documentation as aresource for learning

Learning how  to  transform/  manipulate  data  in  an  appropriate  way,  to  prepare  data  for subsequent analysis, is a key skill.  In the workshops thus far we have assumed that the data is an appropriate structure for you to use.  This is not always the case.  For example before proceeding with your analysis you may need first to (for example):

append one dataset to the end of another dataset

merge datasets- join corresponding observations in different datasets by matching on key

variable(s)

reshape the data from a long’ to a wide’ structure, or viceversa

In this workshop you will be introduced to a few important ways to transform / manipulate data in Stata.  Spend sometime working through the examples on the following pages.

Source of material in this workshop

Please note that all of these examples (the text and the data) are taken directly from Stata's documentation, which includes fully worked examples using downloadable datasets.

See https://www.stata.com/features/documentation/ .  In this workshop I have simply directly selected  documentation  relating  to  the append, merge and reshape from  the   data

management manual (https://www.stata.com/bookstore/data-management-reference-manual/). The original sources are:

append https://www.stata.com/manuals/dappend.pdf merge https://www.stata.com/manuals/dmerge.pdf reshape https://www.stata.com/manuals/dreshape.pdf

For more detail on these commands, go directly to these sources.  This will also help you gain  familiarity with Stata’s documentation, which is a great resource for your subsequent learning.

Append

Example 1

This command appends (joins) a dataset to the end of a dataset already in Stata’s memory. To illustrate we will append two example ‘toy’ datasets from the Stata website. The first dataset, called even.dta, contains the sixth through eighth positive even numbers. The second dataset, called odd.dta, contains the first five positive odd numbers. We’ll have a look at each dataset first.   Type:

use http://www.stata-press.com/data/r15/even list

Typing list provides a list of values of variables in the data.  We could also have looked at the data in the data window.  You should see the following in the Results window:

Let’s have a look at the other dataset:

use http://www.stata-press.com/data/r15/odd1

list

We will append the even data to the end of the odd data. Because the odd data are already in memory (we just used them above),we type

append using http://www.stata-press.com/data/r15/even

list

Because the number variable is in both datasets, the variable was extended with the new data from the file even.dta. Because there is no variable called odd in the new data, the additional observations on odd were forward-filled with missing (.). Because there is no variable called even in the original data, the first observations on even were back-filled with missing.

Example 2

Suppose that we have several datasets containing the populations of counties in various states. We can use append to combine these datasets all at once and use the generate() option to  create a variable identifying from which dataset each observation originally came.   See the  following example:

use http://www.stata-press.com/data/r15/capop, clear list

append using http://www.stata-press.com/data/r15/ilpop http://www.stata-press.com/data/r15/txpop,

generate(state)

label define statelab 0 "CA" 1 "IL" 2 "TX"

label values state statelab

list

Merge

merge is for adding new variables from a second dataset to existing observations (if you wish to add new observations to existing variables, then use append).  merge joins corresponding observations from the dataset currently in memory (called the ‘master ’ dataset) with those from filename.dta (called the ‘using ’ dataset), matching on one or more key variables.  merge can perform one-to-one, one-to-many, many-to-one, and many-to-many merging.   Here we will look at one-to-one merging.

Example

We have two datasets, one of which has information about the size of old automobiles, and the other of which has information about their expense:

use http://www.stata-press.com/data/r15/autosize

list

Use http://www.stata-press.com/data/r15/autoexpense list

We can see that these datasets contain different information about nearly the same cars—the autosize file has one more car. We would like to get all the information about all the cars into one dataset. Because we are adding new variables to old variables, this is a job for the merge command. We need only to decide what type  of match merge we need. Looking  at the datasets, we see that the make variable, which identifies the cars in each of the two datasets, also identifies individual observations within the datasets. What this means is that if you tell me the make of car, I can tell you the one observation that corresponds to that car. Because this is true for both datasets, we should use a 1:1 merge.

We will start with a clean slate to show the full process. We start with the ‘master’ data and  merge in the ‘using’ data.  In this example the variable make is the ‘key’/‘matching’ variable.

use http://www.stata-press.com/data/r15/autosize

merge      1:1      make      using      http://www.stata- press.com/data/r15/autoexpense

list

The merge is successful—all the data are present in the combined dataset, even that from the one car that has only size information. A new variable is created in the merged dataset, _merge, which reports the source of the observation.   If we wanted  only those makes  for which all information is present, we could drop the observations for which merge < 3.

Reshape

reshape converts data from ‘wide’ to ‘long’ form and vice versa.   Think  of the data as a collection of observations Xij, where i is the logical observation, or group identifier, and j is the  subobservation,  or  within-group  identifier.  Wide-form  data  are  organized  by  logical observation, storing all the data on a particular observation in one row. Long-form data are organized by subobservation, storing the data in multiple rows.

Example

For example, we might have data on a person’s ID, gender, and annual income over the years 1980–1982. We have two X ij variables with the data in ‘wide ’ form.

use http://www.stata-press.com/data/r15/reshape1

list

To convert these data to the ‘long ’ form, we type

reshape long inc ue, i(id) j(year)

where  i  is  the  logical  observation,  or  group  identifier  (in  this  case   id),  and   j is  the subobservation, or within-group identifier (in this case year).   Inc and   ue are the ‘stubs’ of the variable names to be reshaped.

There is no variable named year in our original, ‘wide-form ’ dataset. year will be a new variable in our ‘long ’ dataset. After this conversion, we have

list, sep(3)

We can return to our original, wide-form. dataset by using reshape wide. reshape wide inc ue, i(id) j(year)

list

Converting from wide to long creates the j (year) variable. Converting back from long to wide drops the j (year) variable.


热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图