代做11464 – 11524 AR/VR for Data Analysis and Communication代写C/C++程序

Tutorial and Laboratories

11464 - 11524 AR/VR for Data Analysis and Communication

Data Structures and how to load data into R - Week 1

Introduction

In this tutorial will continue practising basic operations in R. In particular, you will learn about vectors and data frames in R; how to create them, access their elements and modify them in your program, and basics of data visualisation.

Skills Covered in this tutorial include:

•    Data Structures

•    Obtaining specific information from data frames

•    Reading files (txt and csv)

Note: Do not copy-paste the commands. As you type each line, you will make mistakes and correct them, which make you think as you go along. Remember, that the objective is that you understand the commands and master the concepts, so you can reproduce their principles on your own later.

2. Data Structures

R has five basic data structures: atomic vectors, matrices, arrays, lists, and data frames. These structures have specific requirements in terms of their dimension. Figure 1, presents a graphical representation of these data structures.

One dimension: Atomic vectors and lists

Two dimensions: Matrices and data frames

•    N dimensions: Arrays

Figure 1. Basic data structures in R. Different colours represent different data types (e.g., numeric, character, Boolean).

For today’slab, we will practice using vectors and data frames only.

2.1. Vectors

These are the basic data structure in R. It contains elements of the same type. The data types can be logical, integer, double, character, complex. A vector’s type can be checked with the typeof() function. Another important property of a vector is its length. This is the number of elements in the vector and can be checked with the function length().

Exercise 1. Create atomic vectors with different data types and observe their type and length.

int_var <- c(10L, 2L, 5L)

num_var <- c(0.4, 3.7, 2)

typeof(int_var)

length(int_var)

coe_var <- c(5L, 3.5, “A”)

typeof(coe_var)

animals <- c("mouse", "rat", "dog", "bear")

x <- seq(0, 10, 2)

y <- 2:-2;

Elements of a vector can be accessed using vector indexing. The vector used for indexing can be logical, integer or character vector. Note: Vector index in R starts from 1, unlike most programming languages where index start from 0.

x[3]           # access 3rd element

x[c(2, 4)]     # access 2nd and 4th element

x[-1]          # access all but 1st element

x[c(2, -4)]    # cannot mix positive and negative integers

x[c(2.4, 3.54)]    # real numbers are truncated to integers

Using character vector as index. This type of indexing is useful when dealing with named vectors. We can name each elements of a vector.

x <- c("first"=3, "second"=0, "third"=9) #create vector

names(x)                       #print names of each element in the vector

x["second"]                    #access value of “second” element

x[c("first", "third")]          #access the 1st and 3rd element

2.2. Data Frames

A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column. Following are the characteristics of a data frame.:

•    The column names should be non-empty.

•    The row names should be unique.

•    The data stored in a data frame can be of numeric, factor or character type.

Each column should contain same number of data items.

We can create a data frame using the data.frame() function.

Exercise 2. Create the data framedf from a group of three vectors n, s, b.

n = c(2, 3, 5)

s = c("aa", "bb", "cc")

b = c(TRUE, FALSE, TRUE)

df = data.frame("var1"=n, "var2"=s, "var3"=b, stringsAsFactors=FALSE)       # df is a data frame.

str(df)                  # check structure of df

Note: By default, when building a data frame, the columns that contain characters are converted into factors. If you want to keep that columns as characters, you need to pass the argument stringsAsFactors=FALSE

You can also visualize the structure of the data frame by clicking on its name in the Environment Pane. The top line of the table, called the header, contains the column names (variable names). Each horizontal line afterward denotes a data row, which begins with the name of the row, and then followed by the actual data. Each data member of a row is called a cell.

We can use either [, [[ or $ operator to access columns of data frame. Practice using these operators to access data from a data frame and observe the results.

df[[1]]                  # access data in column 1

df$var2               #access column 2 by name, returns a vector

df[2]                    #access data in column 2, returns a data frame.

sum(df$var1 == 5)           #returns the number of observations invariable 1 that are equal to 5

max(df$var1)                   # returns the maximum value invariable 1

which(df$var1==3)         # return the element (not the value) in var1 == 3

df[c("var1", "var3")]       #access column1 and column 3 simultaneously

df[2,1]                                #access cell (2,1)

df$var4 <- c(1.5, 3.5, 5.5) # add variable named “var4”

df <- rbind(df,list(7, "dd",FALSE,7.7)) # add a row

df$var3 <- NULL #delete variable 3

df <- df[-1,] #delete first row

You can also use the subset() function to access data in a data frame. An advantage of using subset() is that it drops all missing values. However, most functions allow to remove missing values with the  parameter na.rm =TRUE. We will practice using subset() in the next section.

Exercise 3. Use built-in data frames in R. You can get the list of all the datasets by using data() function. For this exercise, we will use a built-in data frame called mtcars (for a complete description of the dataset visit: https://rpubs.com/neros/61800). Have a look at the following instructions.

rm(list=ls())

data()               # get the list of built-in data frames

data("mtcars")   #select mtcars data frame.

head(mtcars)     #visualise only the first 6 rows

rownames(mtcars)     #returns the name of each row

Now, imaging that you are required to some specific information from the mtcars data frame.

# How big is the data frame? Use the function dim() or the functions nrow() and ncol().

dim(mtcars)                     # dimension of data frame (rows, columns)

nrow(mtcars)                   #number of rows

ncol(mtcars)                     #number of columns

# What is the cell value from the first row, second column of mtcars?

mtcars[1,2]

#Could you get the same value by using row and column names instead? Which names?

mtcars["Mazda RX4", "cyl"]         # using row and column names instead

#Are there more automatic (0) or manual (1) transmission-type cars in the dataset? Hint: use the

sum() function to sum each type of transmission in the am (automatic/manual) variable.

sum(mtcars$am == 1)                   # get total number of manual transmission-type cars

sum(mtcars$am == 0)                   # get total number of automatic transmission-type cars

3. Reading Data from Files

In this section you will learn how to read data from different sources. It is assumed that you are now familiar with different data types.

The data file is in the format called comma-separated values (CSV). In other words, each line contains a row of values which can be letters or numbers, and each value is separated by a comma. Generally, the very first row in the file contains the labels to refer to each column of values.

The data file that we need for this example is available in Canvas. Download the file and save it in   your working directory. The labels of the three columns are: trial, mass, velocity. The values from each row comes from an observation during one of two experimental conditions labelled: A and B.

For this tutorial we will use two commands to input the same data, read.table() and read.csv(). Exercise 4. Read data from structure files using read.table() function.

read.table() allows us to read a file containing structured (table-like format) text into a data frame. The file can be comma delimited, tab, or any other delimiter specified by parameter “sep=” . If the  parameter “header = TRUE”, then the first row will be used as the row names.

The “sep=” argument can be used to specify different separators, some of the most common

separators are: tab (\t), space (\s),single backslash (\\), comma (,), and blank space (“”) (default).

Now, lets use read.table() to get the data from the csv file.

# read data

data_csv <- read.table("simple.csv",sep="\t")

# check if it is a data frame.

is.data.frame(data_tab)

How many columns and rows did you obtained?

What is the name of each column?

Did you get the right number of columns?

How can you read correctly the names of each column and rows?

If R is not finding the file you are trying to read, then it might be looking in the wrong folder. You can change the working directory from the menu bar, click on “Session” then “Set Working Directory” and “Choose Directory” . If you are not sure what files are in the current working directory you can use the dir() command to list the files and the getwd() command to determine the current directory.

# list files in working directory

dir()

# obtain location of current working directory

getwd()

Exercise 5. Read data from structure files using read.csv() function.

We will now use another example, which is also csv. In this case, we will create the file using windows notepad by copying and pasting the data. Save the file as input_data.csv, use the save As All Files (*) option in notepad. If you are using mac, following these examples:

https://help.sharpspring.com/hc/en-us/articles/115001068588-Saving-CSV-Files

id,name,salary,start_date,dept

1,Rick,623.3,2012-01-01,IT

2,Dan,515.2,2013-09-23,Operations

3,Michelle,611,2014-11-15,IT

4,Ryan,729,2014-05-11,HR

5,Gary,843.25,2015-03-27,Finance

6,Nina,578,2013-05-21,IT

7,Simon,632.8,2013-07-30,Operations

8,Richard,722.5,2014-06-17,Finance

Once the data was saved, we can read it and store it into a data frame using read.csv() function.

# read data

data <- read.csv(file="input_data.csv",header=TRUE,sep=",");

# print data

print(data_names)

Remember that by default the read.csv() and read.table() functions gives the output as a data frame. Once we load and save the data in a data frame, we can apply all the functions available for data frames as explained in the previous section.

Now, obtain the following:

# get the max salary from the data frame.

sal < - max(data$salary)

print(sal)

# get the details of the person with the highest salary

details < - subset(data, salary == sal)

print(details)

# get all the staff members working in IT

peopleIT < - subset(data, dept == “IT”)

print(peopleIT)

# get the staff member in IT whose salary is greater than 600

richIT <- subset(data, salary > 600 & dept == “IT”)

print(richIT)

Who gets the lowest salary in Operations department?

4. Take Home Exercises

4.1 Titanic dataset. This dataset contains survival status of passengers on the Titanic. The dataset is a tab-separated file and saved as a txt file. Information included in the dataset: names, passenger class, age, gender, and survival status. More information about this dataset can be obtain from:

http://www.statsci.org/data/general/titanic.html.

Inspect the data set and answer the following questions:

1.    How many passengers are in the dataset?

2.    Create two new data frames, one with male survivors and one with female survivors.

3.    Using the newly created data frames, who was the oldest surviving male? What was his age?

4.    In what passenger class was the youngest surviving female?

5.    How many female and male passengers survived?

6.    What is the average age of those who survived and those who did not?

7.    What is the name of the oldest survivor?

4.2 Rainfall dataset. This dataset contains 52 years (1968-2020) of daily rainfall amounts as measured in Canberra. Source: BOM (http://www.bom.gov.au/climate/data/).

Inspect the data set and answer the following questions:

1.    Calculate the mean and standard deviation of the rainfall variable.

2.    Which date (day,month,year) saw the highest rainfall? (use a loop)

3.    Obtain a subset of the rainfall data where rain is larger than 20mm.

4.    Find the mean rainfall for the days where the rainfall was at least 30mm?

5.    How many days (which dates) were recorded where the rainfall was exactly 40.4mm?

6.    Obtain the average rainfall for each year in the dataset. What years got the highest and lowest rainfall in the dataset? (use a loop)

7.    Obtain the average rainfall for each month in the dataset. In average, what months are the driest and wettest in Canberra? (use a loop)

5. Summary of some functions useful for this tutorial.




热门主题

课程名

mktg2509 csci 2600 38170 lng302 csse3010 phas3226 77938 arch1162 engn4536/engn6536 acx5903 comp151101 phl245 cse12 comp9312 stat3016/6016 phas0038 comp2140 6qqmb312 xjco3011 rest0005 ematm0051 5qqmn219 lubs5062m eee8155 cege0100 eap033 artd1109 mat246 etc3430 ecmm462 mis102 inft6800 ddes9903 comp6521 comp9517 comp3331/9331 comp4337 comp6008 comp9414 bu.231.790.81 man00150m csb352h math1041 eengm4100 isys1002 08 6057cem mktg3504 mthm036 mtrx1701 mth3241 eeee3086 cmp-7038b cmp-7000a ints4010 econ2151 infs5710 fins5516 fin3309 fins5510 gsoe9340 math2007 math2036 soee5010 mark3088 infs3605 elec9714 comp2271 ma214 comp2211 infs3604 600426 sit254 acct3091 bbt405 msin0116 com107/com113 mark5826 sit120 comp9021 eco2101 eeen40700 cs253 ece3114 ecmm447 chns3000 math377 itd102 comp9444 comp(2041|9044) econ0060 econ7230 mgt001371 ecs-323 cs6250 mgdi60012 mdia2012 comm221001 comm5000 ma1008 engl642 econ241 com333 math367 mis201 nbs-7041x meek16104 econ2003 comm1190 mbas902 comp-1027 dpst1091 comp7315 eppd1033 m06 ee3025 msci231 bb113/bbs1063 fc709 comp3425 comp9417 econ42915 cb9101 math1102e chme0017 fc307 mkt60104 5522usst litr1-uc6201.200 ee1102 cosc2803 math39512 omp9727 int2067/int5051 bsb151 mgt253 fc021 babs2202 mis2002s phya21 18-213 cege0012 mdia1002 math38032 mech5125 07 cisc102 mgx3110 cs240 11175 fin3020s eco3420 ictten622 comp9727 cpt111 de114102d mgm320h5s bafi1019 math21112 efim20036 mn-3503 fins5568 110.807 bcpm000028 info6030 bma0092 bcpm0054 math20212 ce335 cs365 cenv6141 ftec5580 math2010 ec3450 comm1170 ecmt1010 csci-ua.0480-003 econ12-200 ib3960 ectb60h3f cs247—assignment tk3163 ics3u ib3j80 comp20008 comp9334 eppd1063 acct2343 cct109 isys1055/3412 math350-real math2014 eec180 stat141b econ2101 msinm014/msing014/msing014b fit2004 comp643 bu1002 cm2030
联系我们
EMail: 99515681@qq.com
QQ: 99515681
留学生作业帮-留学生的知心伴侣!
工作时间:08:00-21:00
python代写
微信客服:codinghelp
站长地图