Menu
Data Visualization: COVID-19 Cases
Data Wrangling with Python & Excel
This project analyzes the COVID-19 case number recorded in Victoria on 24 August 2021 and reporting the total summary per suburb. The data is collected from the open-source data www.data.melbourne.vic.gov.au.
The steps involved importing and merging two datasets: where the first dataset is the number of cases reported in Victoria, and the second dataset is the location/map of Victoria. Data cleaning, evaluation, and summarization are being performed for each dataset.
Below is the first dataset explanatory:
The steps involved importing and merging two datasets: where the first dataset is the number of cases reported in Victoria, and the second dataset is the location/map of Victoria. Data cleaning, evaluation, and summarization are being performed for each dataset.
Below is the first dataset explanatory:
- Postcode: Postcode of the location
- Population: The number of population in the area
- Active: Number of active cases recorded on the day
- Cases: Number of total cases recorded on the day
- Rate: Percentage of the cases on the day
- New: Number of new cases recorded on the day
- Band: Risk of the location on the day
- Data_date: Date of the case recorded
- File_processed_date: File reported date
Data Cleaning: Covid Cases Dataset
The table shows the row of population containing NaN / Missing value has a random postcode recorded, and there have been many covid cases reported in this location. We need to investigate why this is happening by first contacting the data administrator and ask for an explanation on the randomly recorded postcode. Maybe it was recorded accidentally by the data administrator and was not actually part of the on-site data collection but rather it is just a total number of the cases on that day. If this is the case, we can omit the value so that it will not interfering our analysis in the future.
|
Now we will create a daily covid case report. The cleaned covid dataset consists of 701 observations and 9 variables.
Let’s create a few scenarios for our data analysis:
1. How many new total cases are there in a day.
Let’s create a few scenarios for our data analysis:
1. How many new total cases are there in a day.
2. How many new cases reported by Postcode & Suburb.
Importing Second Dataset: Victoria Map Dataset
We can do further analysis by combining the above dataset with Victoria map to get the Suburb name instead of Postcode. I will demonstrate the steps on how to finally get to the end result. The Victoria map dataset can be downloaded from website Australia GeoNames.
The next step is merging the two datasets but prioritising the covid case dataset.
After merged the datasets, we ended up with more rows than our new covid case dataset which is unusual. We must perform a further data cleaning.
Data Cleaning: Victoria Map Dataset
As can be seen from the table that there are many duplicate values recorded on the combined dataset which may exist from the Victoria map dataset. For example, postcode 3015 has three suburbs listed: Newport, South Kingsville, and Spotswood. So do the rest of postcodes with duplicated values.
After doing my research, these postcodes are indeed recorded with multiple suburbs. To ensure accuracy of our data, we will merge the suburbs with same postcode as one row. We ended up with the following suburbs per postcode: 3015 Newport, South Kingsville, Spotswood 3020 Sunshine West, Sunshine North, Sunshine, Albion 3029 Tarneit, Hoppers Crossing 3030 Point Cook, Werribee South 3037 Sydenham, Hillside, Taylors Hill 3064 Craigieburn, Roxburgh Park |
The final result of our cleaned dataset will look like the table below:
We can visualize the above data in Python and the result is as follow:
For a better visualization, we can use other software such as Microsoft Excel and below is the result:
3. Summary report of Covid case in a day.
Conclusion
The above analytics help to explain the number of case reported in Victoria and answer the business questions as follow:
1. How many new total cases are there in a day.
2. How many new cases reported by Postcode & Suburb.
3. Summary report of Covid case in a day.
The above analytics help to explain the number of case reported in Victoria and answer the business questions as follow:
1. How many new total cases are there in a day.
2. How many new cases reported by Postcode & Suburb.
3. Summary report of Covid case in a day.