FE6127/6308
Assignment 3 – Statistical Methods & Data Visualisation
For this assignment upload:
· a Word file containing the tables or graphs/charts you created that are relevant to your answers, together with any write-up of the results and interpretations required for each question;
· an Excel file containing your workings (each question on one or more separate worksheets or workbooks).
· There are 5 questions.
1. A company manufactures a fresh chorizo (a Spanish-style. preserved meat) and currently packages it in conventional plastic pouches. Because there is some demand from consumers for a more environmentally sustainable form. of packaging instead of plastic, the company has developed a paper package. They are concerned however whether this new paper package will affect the shelf-life of their product. The file “A3 - Packaging” contains a dataset on measurements of the shelf-life (in days and part-days to two decimal places) for the product using the different forms of packaging -plastic and paper.
Analyse the question of whether there is a difference in shelf-life, as follows:
(a) generate Descriptive Statistics and a Histogram) for each package-type (measure) (use the Analysis Toolpack or Insert>Chart;
(b) State your Null Hypothesis;
(c) State which statistical test (from the ones we covered) is best to use and why;
(d) in Excel carry out your chosen test;
(e) Write a statement of the results in the form. given in the lecture slides (i.e., p-value and effect size).
2. The same company later decided to experiment with a second form. of package using “Modified Atmosphere Packaging” (MAP) technology. Taking the same approach as in Q1, analyse the question of whether there is a difference in shelf-life between the original plastic packaging and the MAP one. The data for this is also in “A3 - Packaging”.
3. A researcher is interested in the topic of how returns to investment in innovation may differ between small and large firms. She has collected data on 60 firms, 30 each classified as “Small” or “Large”. This data is in “A3 – Innovation and RoI”. The data is coded “S” or “L” under the measure “Firm Size” for small or large firm respectively. Two (scale) measures are recorded – a level of innovation (Innovation Index, which can have values from 0 to 6), and the firm’s return on investment (RoI), as a percentage, which may be positive or negative (i.e., a loss).
Using simple linear regression (including scatter plot, regression line, equation, coefficients, R2) analyse:
(a) the relationship between innovation (independent) and RoI (dependent) for each of the two groups of firms (i.e.do two regression analyses separately, for Small and Large);
(b) using your models (i.e., regression lines) from part (a), create a table (in Excel or Word) with the predicted values for each of small and large firm at Innovation Index Values of 0, 3, and 6 (i.e., minimum or no innovation, mid-rank, and maximum);
(c) based on your two regression models (only) say what you think this analysis may suggest as answers to the researcher’s topic.
4. The data in “A3 – MicroBreweries” is based on that in an economic study of the micro-brewing industry in Ireland in 2016. It has counts of microbreweries per county, with the counties then classified into 4 economic regions (a simplified classification based on the NUTS system). For each of the following graphs/charts choose a suitable colour scheme, Title, Axes, Legend, and any other appropriate graphical aesthetics.
(a) Create a geographic map (a “choropleth” map, to be precise) in Excel, showing the number of microbreweries per county.
(b) Create a Tree Map, showing the subdivision into Regions and Counties. Include data labels showing the actual counts of microbreweries.
(c) Create a Clustered Column Chart with again counts by Region, and by County within Region. Sort the clusters and columns in the chart so that the largest of each come first (i.e., leftmost) in the chart.
(d) For each of the above charts (a) to (c), state one question and answer for which that chart would be a good choice of evidence to support that answer.
5. Using the dataset “A3 – EU Organic Acreage” create appropriate data visualisations showing:
(a) the geographic variation in organic farming across the EU/EEA countries in both 2012 and 2020;
(b) the variation over the whole time period 2012-2020, and across the countries;
(c) suggest a form. of visualisation that could combine the information in parts (a) and (b) above into one visualisation, even if that can’t be made in Excel.
Take care to include suitable Guides (e.g. axis titles and labels, chart title, legend, data-source annotation, etc.) as appropriate.
HINTS:
Example Charts for Q4
As we’ve seen there are many choices with charts and there’s no one correct or even “best” answer. These are suggestions that may help you understand what’s being asked for, but they aren’t perfect answers!
Figure 1 Q4a
Figure 2 Q4b
Figure 3 Q4c
Example Charts for Q5a and Q5b
NOTE:
To have your map “zoom in” on just Europe, the trick is to select a data point (e.g. Sweden), then “Format Data Series” and “Only Regions with Data”.