Menu
Machine Learning
Melbourne City Analysis with Python
This project applies Data Science methods on Victorian geospatial data to identify Melbourne's busiest areas and popular businesses in the neighborhood. The results of this study can be used to determine what businesses have the potential to develop in the area.
The technical environment was done in Jupyterlab/Python.
The technical environment was done in Jupyterlab/Python.
Datasets Explanatory
The data was collected from the Australia GeoNames website which is an open source data. It contains 200 rows and required for correction and cleansing. The following image is the screenshot of the raw data from the website:
The data was collected from the Australia GeoNames website which is an open source data. It contains 200 rows and required for correction and cleansing. The following image is the screenshot of the raw data from the website:
1. Gathering Data
The data is retrieved using web-scraping method and converted into a dataframe using Pandas library.
The data is retrieved using web-scraping method and converted into a dataframe using Pandas library.
As can be seen from the table, the attribute “Place” has inconsistent values in each row. The first row is the city name and the second row is a combination of latitude and longitude value. It needs to be separated into its own column: Place, Latitude, and Longitude.
2. Data Preparation
The first step is to split the rows into odd and even numbers; the odd row numbers have the value of latitude/longitude and the even row numbers contain the city name. The other unnecessary columns such as Unnamed:0, Country, Admin 1 to 3 are being dropped and the cleansed data is shown as below:
2. Data Preparation
The first step is to split the rows into odd and even numbers; the odd row numbers have the value of latitude/longitude and the even row numbers contain the city name. The other unnecessary columns such as Unnamed:0, Country, Admin 1 to 3 are being dropped and the cleansed data is shown as below:
The next step is to use Geopy library to detect the latitude and longitude of the area, then convert it to a Map as below:
For this report, the focus is to check the activity in Melbourne CBD’s main suburbs, therefore the other suburbs that are out of the boundary will be excluded.
3. Data Exploration
To explore what activities that have been carried out in the CBD area, Foursquare Places API will come in handy to obtain the recommended places at the time of execution.
Foursquare application offers a real-time access to its global database of rich venue data and user content.
After combined the datasets, the top 5 nearby venues that match the latitude and longitude of the Foursquare Places are shown below:
3. Data Exploration
To explore what activities that have been carried out in the CBD area, Foursquare Places API will come in handy to obtain the recommended places at the time of execution.
Foursquare application offers a real-time access to its global database of rich venue data and user content.
After combined the datasets, the top 5 nearby venues that match the latitude and longitude of the Foursquare Places are shown below:
The top 10 most common venues in Melbourne hotspot neighborhoods are shown as follow:
The 1st most common venue in Melbourne is Café , then followed by various restaurants.
Using the Foursquare Places API, we can learn more about any specific venue or store or shop, like their full address, their working hours, and their menu if they have one. We can also explore a given location by finding what popular sports exist in the vicinity of the location.
Using the Foursquare Places API, we can learn more about any specific venue or store or shop, like their full address, their working hours, and their menu if they have one. We can also explore a given location by finding what popular sports exist in the vicinity of the location.
4. Clustering Suburbs
In the clustering section, a Machine Learning technique is involved in grouping the data points.
K-Means algorithm was used to compute the distances between suburbs and venue categories. The best number of clusters for the data set is 3 clusters.
After examined each cluster and determine the venue, the following is the results of the different clusters:
Cluster 1 turns to be 0 results.
Cluster 2 with 238 observations
In the clustering section, a Machine Learning technique is involved in grouping the data points.
K-Means algorithm was used to compute the distances between suburbs and venue categories. The best number of clusters for the data set is 3 clusters.
After examined each cluster and determine the venue, the following is the results of the different clusters:
Cluster 1 turns to be 0 results.
Cluster 2 with 238 observations
Cluster 3:
In summary, Café has been the topmost business that runs the city.
However, depending on the location and culture of the community, various restaurant cuisines can also be considered as a new potential business by considering what services are in demand at the time of analysis.
The selected venues can also be clustered into a map as follow:
However, depending on the location and culture of the community, various restaurant cuisines can also be considered as a new potential business by considering what services are in demand at the time of analysis.
The selected venues can also be clustered into a map as follow:
Conclusion
With the combination of Geospatial data and Community survey data, it gave insightful information of what is trending in the area at the time of analysis. This result is very helpful for decision making, especially for someone who intended to open a business in the area without having to visit the location physically.
Investors may start considering which location that has more prospect to succeed and help the community grow.
Despite of the information reported here, we must consider the aspects of the location such as safety of the area, population growth, government future plan, community gathering activities and so on.
Each city has its own population and is varied from time to time.
Keeping the city alive is important for the country’s economy. The more people commute, the liveable the city will become. The same method of clustering can also be applied into different areas with different purposes of analysis.
With the combination of Geospatial data and Community survey data, it gave insightful information of what is trending in the area at the time of analysis. This result is very helpful for decision making, especially for someone who intended to open a business in the area without having to visit the location physically.
Investors may start considering which location that has more prospect to succeed and help the community grow.
Despite of the information reported here, we must consider the aspects of the location such as safety of the area, population growth, government future plan, community gathering activities and so on.
Each city has its own population and is varied from time to time.
Keeping the city alive is important for the country’s economy. The more people commute, the liveable the city will become. The same method of clustering can also be applied into different areas with different purposes of analysis.