Question

I am trying to implement an indoor location tracking system using bluetooth dongles. The idea is to walk around with an android device and calculate your location in a room based on the signal strengths of bluetooth dongles placed around the room. In order to do this I have decided to use machine learning to approximate, as closely as possible, the RSSI as a distance, meters for example. I have been told by a lecturer in my college that LibSVM is what I'm looking for so I've been doing some reading. I had a look at this tutorial and can't seem to get my head around the data that's needed to train the system. The data that I will have is:

  • the locations of each dongle (along with a mac address) saved in a database, x and y coordinates
  • the Received Signal Strength Indicator (RSSI) of the dongles nearest to my android device
  • the mac addresses will be used to query the database for certain dongles

I understand the data has to be in SVM format but I'm a bit unsure of what it should be in terms of input data and output data. The example below, taken from the tutorial I've mentioned, shows that a man is a class and a woman is a class. So in my case would I have just one class "dongle"? And should all the values dongle reflect the values I have stored in my database?

man voice:low figure:big income:good

woman voice:high figure:slim income:fare

  1. Convert the feature values to its numeric representation. Let's say, that best salary would be 5 and worst salary 1 (or no salary = 0), the same with other enumarated variables.
  2. We have 2 classes, man and women . convert the classes to numeric values: man = 1, woman = -1
  3. Save it in libsvm data format:

[class/target] 1:[firstFeatureValue] 2:[secondFeatureValue] etc. ex.: a women with great salary, low voice and small figure would be encoded like: -1 1:5 2:1.5 3:1.8

In general the input file format of SVM is

[label] [index1]:[value1] [index2]:[value2] ... [label] [index1]:[value1] [index2]:[value2] ...

Could someone give me an example of what I should be aiming for?

This is all brand new to me so any helpful hints or tips to get me going would be great. Thanks in advance

Was it helpful?

Solution

I've implemented a WiFi fingerprinting for indoor localization, so I'm aware of some of the issues here.

First, to determine your location, are you performing fingerprinting or signal-strength trilateration (which people mistakingly call triangulation)? Trilateration is the process of intersecting multiple spheres to find a location in space. On the other hand, fingerprinting is a classification problem that resolves signals to a location with no actual distances calculated.

Trilateration is extremely difficult indoors due to wireless problems like multi-path fading. These effects will cause your signal to attenuate, which in turn will cause your distances estimates to be off.

Fingerprinting is simply a classification problem. Like trilateration, it makes the assumption that the location of dongles do not change. However, unlike trilateration, it does not use distances at all.

Trilateration has the advantage that, assuming that the distance estimates are correct (which in reality is difficult to attain), you will be able to resolve your location over a continuous (non-discrete) range. Since fingerprinting is a classification problem, it must classify to one of a fixed set of discrete locations; for example, if your Bluetooth radios are arranged along the perimeter of a room, you may end up discretizing the interior of the room into one of 3x3 possible locations.

If you are going with fingerprinting, then you will need to collect training data with feature vectors that looks like:

MAC_1:-87, MAC_2:-40, MAC_3:-91, class=location_A
MAC_1:-31, MAC_2:-90, MAC_3:-79, class=location_B

Where for each location in the room, you read the RSSI from all the available Bluetooth radios you can sense. You should take at least 10 readings for each location. For WiFi, the RSSI values are integers in units of decibels in the range of -100 to -1 (where, for example, -20 dB means you are really close to the radio).

Now, when you are trying to perform the classification, you will take a reading like:

MAC_1:-89, MAC_2:-71, MAC_3:-22, class=?

The problem is to classify those RSSI readings to one of the locations.

In my previous work, I used a Naive Bayes classifier rather than SVM because Naive Bayes accommodates missing features easily (by allowing you to give a small probability mass to the missing feature). Also, in Naive Bayes, I used a Gaussian PDF function to calculate the likelihood probability P(location | MAC_i = RSSI_i) since all the RSSI values are numbers.

OTHER TIPS

Since your output is a real number (distance) we are talking about a regression problem, not a classification problem. I am not clear if the value you are looking for is the closest distance to a dongle or if your output would be a set of distances to all dongles. That's something you need to clearify first.

There are several algorithm capable to do this but since you are asking about SVM I would scope this answer just to that. I am assuming that your output is just a value representing a distance, if you were expecting to have a multidimensional output and given that SVR (support vector regression) just provide one dimension output, you would need to train an instance per dimension.

One of the parameters of libsvm is svm_type, since the problem is a regression problem you should use option 3 - epsilon-SVR

For kernel-type I'd suggest to consider RBF (option 2 - radial basis function)

As for your data this is a possible arrangement:

| dongle 1           | dongle 2           | dongle 3           | desired output
| x    | y    | RSSI | x    | y    | RSSI | x    | y    | RSSI |   
---------------------------------------------------------------------------------
| 10.0 | 11.1 | 2.3  | 0.0  | 1.1  | 0.3  | 17.0 | 19.1 | 0.3  |     10.3
| 30.0 | 17.1 | 0.3  | 10.0 | 1.1  | 0.9  | 11.0 | 9.1  | 0.2  |     18.7

So that would translate to (braces are just for clarity):

[10.3] [1]:[10.0] [2]:[11.1] [3]:[2.3] [4]:[0.0] [5]:[1.1] [6]:[0.3] [7]:[17.0] [8]:[19.1] [9]:[0.3]
[18.7] [1]:[30.0] [2]:[17.1] [3]:[0.3] [4]:[10.0] [5]:[1.1] [6]:[0.9] [7]:[11.0] [8]:[9.1] [9]:[0.2]

It's always advisable to scale the data between [-1, 1] or [0, 1]. Additionally, you can find some example data here http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html

Hope this helps

I don't think you can use SVMs to do what you are saying (calculate your location in a room...) SVMs are a supervised, binary classification algorithm. That is, if you give it some data and some positive / negative classes, it will learn a classifier that can tell you if new, unobserved data points are positive or negative. Hence, you may be able to train an SVM to tell you if a person is on one side of the room versus the other (south side/north side), but not their actual location.

It seems that what you want to do doesn't require machine learning at all. See the following posts:

EDIT: Given your clarification, I would recommend using a k-nearest neighbors regression. SVM is definitely not appropriate for what you want to do; even when using SVM for regression it only works in one dimension..

What you want to do is take as much data as possible (data = RSSI, label = distances) and embed them in a metric space, probably in the dimension of the number of dongles you have. Then, given some new data (RSSI signal strengths), find the nearest neighbors in the space and compute some sort of mean over the distances.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top