How to use LibSVM in Java?

Question 1

I've implemented a WiFi fingerprinting for indoor localization, so I'm aware of some of the issues here.

First, to determine your location, are you performing fingerprinting or signal-strength trilateration (which people mistakingly call triangulation)? Trilateration is the process of intersecting multiple spheres to find a location in space. On the other hand, fingerprinting is a classification problem that resolves signals to a location with no actual distances calculated.

Trilateration is extremely difficult indoors due to wireless problems like multi-path fading. These effects will cause your signal to attenuate, which in turn will cause your distances estimates to be off.

Fingerprinting is simply a classification problem. Like trilateration, it makes the assumption that the location of dongles do not change. However, unlike trilateration, it does not use distances at all.

Trilateration has the advantage that, assuming that the distance estimates are correct (which in reality is difficult to attain), you will be able to resolve your location over a continuous (non-discrete) range. Since fingerprinting is a classification problem, it must classify to one of a fixed set of discrete locations; for example, if your Bluetooth radios are arranged along the perimeter of a room, you may end up discretizing the interior of the room into one of 3x3 possible locations.

If you are going with fingerprinting, then you will need to collect training data with feature vectors that looks like:

MAC_1:-87, MAC_2:-40, MAC_3:-91, class=location_A
MAC_1:-31, MAC_2:-90, MAC_3:-79, class=location_B

Where for each location in the room, you read the RSSI from all the available Bluetooth radios you can sense. You should take at least 10 readings for each location. For WiFi, the RSSI values are integers in units of decibels in the range of -100 to -1 (where, for example, -20 dB means you are really close to the radio).

Now, when you are trying to perform the classification, you will take a reading like:

MAC_1:-89, MAC_2:-71, MAC_3:-22, class=?

The problem is to classify those RSSI readings to one of the locations.

In my previous work, I used a Naive Bayes classifier rather than SVM because Naive Bayes accommodates missing features easily (by allowing you to give a small probability mass to the missing feature). Also, in Naive Bayes, I used a Gaussian PDF function to calculate the likelihood probability P(location | MAC_i = RSSI_i) since all the RSSI values are numbers.

Question 2

Since your output is a real number (distance) we are talking about a regression problem, not a classification problem. I am not clear if the value you are looking for is the closest distance to a dongle or if your output would be a set of distances to all dongles. That's something you need to clearify first.

There are several algorithm capable to do this but since you are asking about SVM I would scope this answer just to that. I am assuming that your output is just a value representing a distance, if you were expecting to have a multidimensional output and given that SVR (support vector regression) just provide one dimension output, you would need to train an instance per dimension.

One of the parameters of libsvm is svm_type, since the problem is a regression problem you should use option 3 - epsilon-SVR

For kernel-type I'd suggest to consider RBF (option 2 - radial basis function)

As for your data this is a possible arrangement:

| dongle 1           | dongle 2           | dongle 3           | desired output
| x    | y    | RSSI | x    | y    | RSSI | x    | y    | RSSI |   
---------------------------------------------------------------------------------
| 10.0 | 11.1 | 2.3  | 0.0  | 1.1  | 0.3  | 17.0 | 19.1 | 0.3  |     10.3
| 30.0 | 17.1 | 0.3  | 10.0 | 1.1  | 0.9  | 11.0 | 9.1  | 0.2  |     18.7

So that would translate to (braces are just for clarity):

[10.3] [1]:[10.0] [2]:[11.1] [3]:[2.3] [4]:[0.0] [5]:[1.1] [6]:[0.3] [7]:[17.0] [8]:[19.1] [9]:[0.3]
[18.7] [1]:[30.0] [2]:[17.1] [3]:[0.3] [4]:[10.0] [5]:[1.1] [6]:[0.9] [7]:[11.0] [8]:[9.1] [9]:[0.2]

It's always advisable to scale the data between [-1, 1] or [0, 1]. Additionally, you can find some example data here http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html

Hope this helps

Question 3

I don't think you can use SVMs to do what you are saying (calculate your location in a room...) SVMs are a supervised, binary classification algorithm. That is, if you give it some data and some positive / negative classes, it will learn a classifier that can tell you if new, unobserved data points are positive or negative. Hence, you may be able to train an SVM to tell you if a person is on one side of the room versus the other (south side/north side), but not their actual location.

It seems that what you want to do doesn't require machine learning at all. See the following posts:

EDIT: Given your clarification, I would recommend using a k-nearest neighbors regression. SVM is definitely not appropriate for what you want to do; even when using SVM for regression it only works in one dimension..

What you want to do is take as much data as possible (data = RSSI, label = distances) and embed them in a metric space, probably in the dimension of the number of dongles you have. Then, given some new data (RSSI signal strengths), find the nearest neighbors in the space and compute some sort of mean over the distances.