STATS 101/108 - Chapter 1 Datafication | Whakararaunga: Task
Introduction
In this investigation you will use data from Google Books for this assignment, so watch the short video below to índ out a little more about this digital service and how people use it!
Remember that you can also access thousands of books online through our university library!
In this assignment, we will explore the features of the covers, titles and descriptions of books available from Google books.
Q1
For this question, you need to identify a visual feature of book covers.
. Head to Google Books, search for diìerent topics and explore visual
features of book covers, such as colours, typography, or certain images.
Decide on ONE speciíc topic for your investigation and write it down in your answers. This should consist of one word.
. Take a screenshot (snip) of the írst few books and paste it into your answers.
. Write one sentence that identiíes how covers of books of this particular topic vary in terms of one visual feature. You can use this framework:
“Book covers about (your topic) either contain … or … ”
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.
The topic:
The screenshot of the írst few books from the search:
The sentence identifying how book covers vary in terms of one visual feature:
Q2
For this question, you need to use a visual feature of book covers to create ONE new categorical variable.
. Head to the GoogleBooks app. (Note: You need to click this link. This is diìerent from the website above). Use your topic from Q1 to generate a sample of books. You will get 20 books with their covers.
. Decide on a categorical variable to sort the book covers by. A good starting point is the visual feature you have identiíed in Q1 . Your variable needs an overall name, and you need to write each level. Write down the variable names and the two levels.
Sort ALL of the book covers into one of the two levels. There should be at least íve book covers in each group and no book covers left over.
. Take a screenshot of your sorted book covers that shows the variable
name and levels. It is OK if your screenshot doesn’t contain all of the book covers, but you need to sort all of them.
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.
The categorical variable name:
The two levels:
The screenshot of the sorted book covers with name and levels of categorical variable:
Q3
For this question, you need to use text features of the book title only to create ONE new numeric variable.
The software guides for working with data in Google sheets and exploring data in iNZight Lite have step-by-step demonstrations of related skills, and you will see examples of using these technologies in lectures. You will also see a demonstration of skills required for tasks in Friday’s lectures.
. Create a new Google sheet and use this to create a rectangular data
set based on the books you have explored. In order to do this, scroll to the bottom of the sorted book covers in the app, click the “ Copy” button and paste everything into cell A1 in the new Google sheet.
o In addition to the categorical variable you created as part of Q2, you need to create one new variable using the variable title . This variable needs to be numeric. You need to use text features of the book titles you have explored to create the new variable. You can use functions within Google sheets to create values for the new variable, or you can enter values manually.
. Write one sentence explaining what this new variable measures.
. Publish your Google sheet as a CSV and copy the link produced (make sure when the link is clicked that a CSV íle is downloaded).
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.
The sentence explaining what the new numeric variable measures:
The link to the Google sheet published as a CSV:
Q4
For this question, you need to use one categorical variable to make a prediction about the book covers you have explored.
· Import your data into iNZight Lite using the CSV link from Q3 and create a visualisation (plot) using just the categorical variable you created in Q2.
. Use the Summary tab in iNZight Lite to índ the level of this categorical
variable that has the highest proportion and record this as a percentage rounded to one decimal place (if there are more than one, just pick one).
. Write a sentence that uses this level and proportion to make a
prediction e.g. I predict that book covers about (your topic) will feature
(something) on the cover, since most (x%) ofthe book covers in my data did.
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.
The screenshot of your plot from iNZight Lite:
The sentence that makes the prediction:
Q5
For this question, you need to consider the purpose of dataícation in the context of book publishing.
o In this chapter you encountered three main purposes for generating and analysing data: description, prediction, and explanation. In two to three sentences explain how the numeric variable you created in Q3 could be used for one of these purposes in the context of publishing books
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.
Your explanation of the purpose of dataícation:
Q6
For this question, you need to reîect on the learning focus for this chapter (Dataícation).
. Describe in your own words ONE important idea from this topic. Do
not just copy one of the learning objectives or something from the notes or other learning resources. One sentence is enough, but you must write about your own personal reîection.
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.