AD688 Web Analytics for Business
Assignment 4: Competitive Intelligence through Web Scraping
This assignment focuses on gaining competitive intelligence through the use of various like Web scraping and text analysis with the following learning objectives:
Use of programming languages like python
Understand the logic behind scraping websites
Analysis of unstructured data to derive insights.
The main purpose of this tutorial is to provide guidance in understanding the code provided to scrape the website and to describe the steps needed to complete the assignment. This tutorial should be used alongside the python file “ AD688 Car Depreciation – Multiple Pages.ipynb”.
Step 1: Review the website.
For this assignment, we’ll be using the websitehttps://www.cars.com/ . On the homepage of the
website select “ Used” and then enter the make, model and zip code of your choice. Copy the link of the results page.
Depending on the car make and model you chose, there maybe other features like trims and more. For this example, I’ve selected the Nissan Rogue with SV trims as can be seen in the image below
Step 2: Inspect the website.
You should now have a view of the ‘Cars for Sale’ listing page. Focus on the first car listed, right-click on the name of the car and select ‘ Inspect’.
After we inspect the element, we can see the html code that stored the name of the car, the price, mileage and soon.
Step 3: Open and run the Python file, AD688 Car Depreciation – Multiple
Pages.ipynb
The first cell lists all the libraries needed for this assignment
We would then like to create empty lists to store the data we are about to collect
As we are trying to scrape multiple pages, we can use a for loop to help us iterate through these pages. Then we create a variable to store the url of the website. If you take a closer look at the url, you’ll notice a page variable and number included that let ’s you know the current page you are on as can be seen in the image below:
Action Item: Goto the code and place the url you’ve copied as a value to the variable website_url like below:
Scroll all the way to the right to locate the page number. Once we’ve located the page number, split the url string which looks like the following:
Follow along and run the rest of the code paying particular attention to the comments.
Step 4: Complete tasks 4-2 to 4-4 of the assignment.
The data that you’ve just collected presents information on a car of your choice for one city. You need to repeat the steps for the same car but in a different location. You need to then determine the following:
Task 4-2 What is the average price point for the car you’ve chosen in each of the locations. You might
want to consider including additional visualizations that may help understand the relationship between price and other variables like ratings, mileage and more.
Task 4-3 Calculate the depreciation of the car across the two locations. You should conduct research on how to calculate the depreciation of a car. A possible method is to conduct a regression analysis.
Task 4-4 Write a competitive intelligence report summarizing your findings for the above tasks and also provide your recommended price for a 3 year old car in each of the locations you’ve selected.