Menu
Data Modelling
Bluetooth Low Energy (BLE) RSSI iBeacon
This project investigates the accuracy detection of indoor localization and navigation with BLE RSSI iBeacon devices which are installed on the first floor of Waldo Library, Western Michigan University.
The dataset is available to download from UCI Machine Learning Repository and it consists of 1420 instances and 13 iBeacon devices.
The data is examined by the performance of each iBeacon installed in a real-world environment.
The RSSI measurements are negative values. Bigger RSSI values indicate closer proximity to a given iBeacon (e.g. RSSI of -65 represent a closer distance to a given iBeacon compared to RSSI of -85). For out-of-range iBeacon, the RSSI is indicated by -200. The location related to RSSI readings are combined in one column consisting a letter for the column and a number for the row of the position.
For a better understanding about the position of each iBeacon inside the library, below is the map of Waldo Library:
The dataset is available to download from UCI Machine Learning Repository and it consists of 1420 instances and 13 iBeacon devices.
The data is examined by the performance of each iBeacon installed in a real-world environment.
The RSSI measurements are negative values. Bigger RSSI values indicate closer proximity to a given iBeacon (e.g. RSSI of -65 represent a closer distance to a given iBeacon compared to RSSI of -85). For out-of-range iBeacon, the RSSI is indicated by -200. The location related to RSSI readings are combined in one column consisting a letter for the column and a number for the row of the position.
For a better understanding about the position of each iBeacon inside the library, below is the map of Waldo Library:
As can be seen from the map, the locations related to RSSI readings are combined in one column consisting of a letter for the column and a number for the row of the position. The location of receiving RSSIs from iBeacon (b3001 to b3013); symbolic values showing the column and row of the location on the map.
Data Exploration
Data cleaning is performed to check on any missing values or irrelevant data and adjust the values to be acceptable for data modelling. We need to understand the shape of the data which can be useful in considering the features that will be used and performed in the next step. In this step, we will count the distributions of all variables, check the data type of each attribute, find the missing value, and the correlations.
In this report, we are interested in finding the relationship between the location and the iBeacon.
The features and labels are then identified as follow:
1. Location as the label
2. B3001 - B3013 as the features
Before training a model with the dataset, we need to handle the missing value problem that may impact the machine learning model’s quality. The missing value which is indicated as -200 is first being replaced with NaN, then we calculate the mean of the features before replacing the NaN value with the average score of each feature.
Descriptive Statistics
Upon checking the descriptive statistics, it has been found that feature b3002 has a minimum value of -198 and b3012 has a minimum value of -199. This could be an error and can be replaced as NaN/empty value. And below is the result of removed of all missing values:
Data Exploration
Data cleaning is performed to check on any missing values or irrelevant data and adjust the values to be acceptable for data modelling. We need to understand the shape of the data which can be useful in considering the features that will be used and performed in the next step. In this step, we will count the distributions of all variables, check the data type of each attribute, find the missing value, and the correlations.
In this report, we are interested in finding the relationship between the location and the iBeacon.
The features and labels are then identified as follow:
1. Location as the label
2. B3001 - B3013 as the features
Before training a model with the dataset, we need to handle the missing value problem that may impact the machine learning model’s quality. The missing value which is indicated as -200 is first being replaced with NaN, then we calculate the mean of the features before replacing the NaN value with the average score of each feature.
Descriptive Statistics
Upon checking the descriptive statistics, it has been found that feature b3002 has a minimum value of -198 and b3012 has a minimum value of -199. This could be an error and can be replaced as NaN/empty value. And below is the result of removed of all missing values:
The above Descriptive Statistics showing each mean of the feature, standard deviation, min and max value, and quantiles. This table will be useful for Data Modelling in the next step for the algorithms to calculate the best model to fit.
To explore each feature (b3001 – b3013), Histogram will be used to identify how the iBeacon devices are performing in the area compare to the other nearby devices.
To explore each feature (b3001 – b3013), Histogram will be used to identify how the iBeacon devices are performing in the area compare to the other nearby devices.
The Scatter Matrix visualisation will also be implemented to observe the similarities and differences between features visually.
To check the comparison between locations and iBeacon, it will be depicted using Scatter plots and the iBeacon are grouped based on the nearest location as per the map of Waldo library.
To check the comparison between locations and iBeacon, it will be depicted using Scatter plots and the iBeacon are grouped based on the nearest location as per the map of Waldo library.
Scatter plots of pair relationship between location and iBeacon.
Plausible Hypothesis
We are investigating the accuracy detection of the iBeacon in the location and compare the most/least visit location. For each graph shown above, b3002 to b3004 seems to be the busiest spots, followed by b3005 – b3007 as the second busiest, then for the rest of the groups are least visited.
Machine Learning technique will learn from the dataset, train the data, evaluate the data, and find the best models of algorithms to predict the unseen future data.
Data Modelling
The BLE RSSI dataset is considered as non-continuing and categorical data, the best method to implement is using Classification approach having K-Nearest Neighbors and Decision Tree as the classifiers.
KNN-Imputer
During data exploration, there have been found a large amount of -200 which indicates as missing values. In this report, we are investigating the accuracy detection of the iBeacon. Therefore, each missing value is imputed using the mean value from nearest eighbors found in the training set which is 5.
Train Test Split
The data will be trained and tested with 80% training value and 20% test value to find the best accuracy from the two models’ comparison. Both models will use the same train_test_split value.
K-Nearest Neighbors (KNN) Classifier
The KNN method gives flexibility in tuning the parameters of the n_neighbors, weights, p, and the others before fitting the data. In this report, it will implement 3 different parameters set-up:
1. n_neighbors = 5, weights = ‘uniform’, and p = 2
2. n_neighbors = 3, weights = ‘distance’, and p = 1
3. n_neighbors = 1, weights = ‘distance’, and p = 1
Decision Tree Classifier
The Decision Tree have two types of criterions to choose before fitting the data which are Gini and Entropy. There are also other parameters that can be tuned, however, in this report, it will only compare between the criterion and the others will be left default.
Results
The results of both methods in Data Modelling are combined into one table as below:
We are investigating the accuracy detection of the iBeacon in the location and compare the most/least visit location. For each graph shown above, b3002 to b3004 seems to be the busiest spots, followed by b3005 – b3007 as the second busiest, then for the rest of the groups are least visited.
Machine Learning technique will learn from the dataset, train the data, evaluate the data, and find the best models of algorithms to predict the unseen future data.
Data Modelling
The BLE RSSI dataset is considered as non-continuing and categorical data, the best method to implement is using Classification approach having K-Nearest Neighbors and Decision Tree as the classifiers.
KNN-Imputer
During data exploration, there have been found a large amount of -200 which indicates as missing values. In this report, we are investigating the accuracy detection of the iBeacon. Therefore, each missing value is imputed using the mean value from nearest eighbors found in the training set which is 5.
Train Test Split
The data will be trained and tested with 80% training value and 20% test value to find the best accuracy from the two models’ comparison. Both models will use the same train_test_split value.
K-Nearest Neighbors (KNN) Classifier
The KNN method gives flexibility in tuning the parameters of the n_neighbors, weights, p, and the others before fitting the data. In this report, it will implement 3 different parameters set-up:
1. n_neighbors = 5, weights = ‘uniform’, and p = 2
2. n_neighbors = 3, weights = ‘distance’, and p = 1
3. n_neighbors = 1, weights = ‘distance’, and p = 1
Decision Tree Classifier
The Decision Tree have two types of criterions to choose before fitting the data which are Gini and Entropy. There are also other parameters that can be tuned, however, in this report, it will only compare between the criterion and the others will be left default.
Results
The results of both methods in Data Modelling are combined into one table as below:
Both results show the classification rate of 18% - 30% which considered as low accuracy.
This could be due to too many missing values that are not part of the iBeacons problem. The location detection is changing from one location to another and was fetched by the other iBeacons.
This could be due to too many missing values that are not part of the iBeacons problem. The location detection is changing from one location to another and was fetched by the other iBeacons.
Conclusion
Overall, the result between the two classifiers are not much different, and we can tell that both models are performing well in identifying the best accuracy for supervised machine learning.
The iBeacon device has been helpful in navigating people, motion detection, track the most visit sites, and many more.
For this dataset, the missing values are not considered as the iBeacons are used to detect the signal strength of the mobile devices such as iPhone 6S or any other similar smartphone to interact with.
Majority people have smartphone and Machine Learning will keep on improving the efficiency of our living.
Overall, the result between the two classifiers are not much different, and we can tell that both models are performing well in identifying the best accuracy for supervised machine learning.
The iBeacon device has been helpful in navigating people, motion detection, track the most visit sites, and many more.
For this dataset, the missing values are not considered as the iBeacons are used to detect the signal strength of the mobile devices such as iPhone 6S or any other similar smartphone to interact with.
Majority people have smartphone and Machine Learning will keep on improving the efficiency of our living.