STATS 101/108 - Chapter 10 Distribution | Tuari: Task
Introduction
In the Chapter 9 task, you investigated the difference in age of New Zealand companies of different industries. In this task we return to New Zealand business but we are going to investigate a different aspect of businesses.
In the early days of the internet, businesses that wanted to attract New Zealand traffic to their website often chose the .co.nz domain extension. Search engines like Google tend to rank websites with local domain extensions higher when a search is done in a given country. Since 2013, a new domain extension was created .nz (without the .co ) and this is increasingly used by newer businesses. More global oriented Kiwi businesses, on the other hand, may opt for international domains, such as .com or .net . Watch this short clip about reasons for New Zealand businesses to choose a local domain extension.
In this task, we will explore the distribution of domain extensions for New Zealand business web addresses from different industries.
Q1
For this question, you need to explain why distributions may be different using contextual information.
Similar to Task 9, for this investigation you will collect data using an app that takes random samples of businesses from different industries. The data was collected from the NZBN website, a website that stores key information for New Zealand businesses, such as the address, website, industry and when the business was first registered.
Industries are defined as the following:
Accommodation and Food Services
Administrative and Support Services
Agriculture, Forestry and Fishing
Arts and Recreation Services
Construction
Education and Training
Electricity, Gas, Water and Waste Services
Financial and Insurance Services
Health Care and Social Assistance
Information Media and Telecommunications
Manufacturing
Mining
Other Services
Professional, Scientific and Technical Services
Public Administration and Safety
Rental, Hiring and Real Estate Services
Retail Trade
Transport, Postal and Warehousing
Wholesale Trade
In Q2 you will collect data using an app that takes random samples of businesses from three different industries. Note that the app will only sample registry entries that contain a web address. If a business has several websites in the register, we will consider only the first one.
You need to decide which three industries you will investigate.
Select three industries from the list above. You may use the same industries you investigated in Chapter 9 or choose other industries. Write down the three industries.
Identify one industry that you have a hunch may have a higher proportion of .co.nz websites than the other two industries. Write one sentence explaining why you think this industry has a higher proportion of .co.nz websites than others.
Go to the NZBN website (nzbn.govt.nz) and search for companies that have something related to the industry in the name. For example, searching “school” or “university” may allow you to find business in the “Education and Training” sector.
Click through to some of the businesses in your search results and check the domain extension of their Website(s), i.e., what their website type. Is it .co.nz or something else, like just .nz , .com , .io ? Write one sentence about whether the information you found on the website supports your hunch.
The three industries you will investigate:
The industry you think will tend to have a higher proportion of .co.nz websites:
Your sentence explaining why you think businesses from your selected industry may have a higher proportion of .co.nz websites:
Your sentence explaining whether or not your hunch was supported:
Q2
For this question, you need to write suitable hypotheses for a hypothesis test.
The research question for this investigation is: Does website type change with industry?
One of the variables you will use for this investigation is called website_type and has two levels: uses .co.nz and doesn't use .co.nz .
The other variable you will use is called industry (based on the three industries you selected in Q1).
Write down a suitable null hypothesis and alternative hypothesis for your investigation in context.
The null hypothesis:
The alternative hypothesis:
Q3
For this question, you need to compare two sample proportions for website types between three different industries.
Use the three industries from Q1 in the Mind your business app to generate data. The app will give you the website types of a random sample of 80 businesses from each of your chosen industries from the NZBN website.
Copy the link to your data in iNZight Lite.
Use the link to import the data into iNZight Lite. Generate a plot displaying the distribution of website type. Copy a screenshot of this plot.
Generate a second plot that shows the distribution of website type by industry. Copy a screenshot of this plot.
Identify the two industries that have the biggest difference in proportions of .co.nz website types. Find the sample proportions from the Summary tab and write one sentence comparing the two sample proportions for these industries. This needs to include words like “higher” or “lower”.
The link to the data in iNZight Lite:
The screenshot of the plot for the website type:
The screenshot of the plot for the website type by industry:
The sentence comparing the two sample proportions:
Q4
For this question, you need to carry out and interpret an appropriate hypothesis test.
Use the Inference tab in iNZight Lite to carry out a Chi-square test. Take a screenshot of the output, ensuring the p-value is included. Copy and paste this image into your answers.
Summarise what you have learned by writing one sentence that answers the research question from Q2 by interpreting the p-value.
A screenshot of the inference output from iNZight Lite:
One sentence answering the research question:
Q5
For this question, you need to consider how data could be used to compare risks, and explain a limitation of this approach.
The NZBN register contains much more data, such as whether a business is currently in receivership because it failed to pay its debt. Consider how this data could be used in order to compare the risk of receivership for businesses in the three industries you investigated. Explain a limitation of using a risk estimate because it is based on sample data. Write two or three sentences.
Construct your answer using the following structure. Copy and paste it into the template and complete your answers accordingly.
Your answer:
Q6
For this question, you need to reflect on the learning focus for this chapter (Distribution).
Describe in your own words ONE important idea from this topic. Do not just copy one of the learning objectives or something from the notes or other learning resources. One sentence is enough, but you must write about your own personal reflection.