Using SIFT for Augmented Reality

https://stackoverflow.com/questions/1289010

18-09-2019
|

Question

I've come across MANY AR libraries/SDKs/APIs, all of them are marker-based, until I found this video, from the description and the comments, it looks like he's using SIFT to detect the object and follow it around.

I need to do that for Android, so I'm gonna need a full implementation of SIFT in pure Java.

I'm willing to do that but I need to know how SIFT is used for augmented reality first.

I could make use of any information you give.

Solution

In my opinion, trying to implement SIFT for a portable device is madness. SIFT is an image feature extraction algorithm, which includes complex math and certainly requires a lot of computing power. SIFT is also patented.

Still, if you indeed want to go forth with this task, you should do quite some research at first. You need to check things like:

Any variants of SIFT that enhance performance, including different algorithms all around
I would recommend looking into SURF which is very robust and much more faster (but still one of those scary algorithms)
Android NDK (I'll explain later)
Lots and lots of publications

Why Android NDK? Because you'll probably have a much more significant performance gain by implementing the algorithm in a C library that's being used by your Java application.

Before starting anything, make sure you do that research cause it would be a pity to realize halfway that the image feature extraction algorithms are just too much for an Android phone. It's a serious endeavor in itself implementing such an algorithm that provides good results and runs in an acceptable amount of time, let alone using it to create an AR application.

As in how you would use that for AR, I guess that the descriptor you get from running the algorithm on an image would have to be matched against with data saved in a central database. Then the results can be displayed to the user. The features of an image gathered from SURF are supposed to describe it such as that it can be then identified using those. I'm not really experienced on doing that but there's always resources on the web. You'd probably wanna start with generic stuff such as Object Recognition.

Best of luck :)

OTHER TIPS

If I where you, I'd look into how (and why) the SIFT feature works (as was said, its wikipedia-page offers a good cochise explanation, and for more details check the science paper (which is linked to at wikipedia)), and then build your own variant that suits your taste; i.e. has the optimal balance between performance and cpu-load, needed for your application.

For instance, I think Gaussian smoothing might be replaced by some faster way of smoothing.

Also, when you build your own variant, you don't have anything to do with patents (there already are lots of variants, like GLOH).

I have tried SURF for 330Mhz Symbian mobile and it was still too slow even with all optimizations and lookup tables. And SIFT should be even more slow. Everyone using FAST for mobiles now. Anyway feature extraction is not a biggest problem. Correspondence and clearing false positive in it is more difficult. FAST link http://svr-www.eng.cam.ac.uk/~er258/work/fast.html

I would recommend you to start by looking at the features already implemented in the OpenCV library, which include SURF, MSER and others:

http://opencv.willowgarage.com/documentation/cpp/feature_detection.html

This might be enough for your application and are faster than SIFT. And as mentioned above, SIFT is patented.

Also, start by making performance tests in your mobile platform, just by extracting the features at every frame, this way you'll have an idea which ones can run real-time or not.

Have you tried OpenCV's FAST implementation in the Android port? I've tested it out and it runs blazingly fast.

You can also compute reduced histogram descriptors around the detected FAST keypoints. I've heard of 3x3 rather than standard 4x4 of SIFT. That has a decent chance of working in real time if you optimize it heavily with NEON instructions. Otherwise, I'd recommend something fast and simple like sum of squared or absolute differences for a patch around the keypoints which are very fast.

SIFT is not a panacea. For real time video applications, it's usually overkill.

As always, Wikipedia is a good place to start from : http://en.wikipedia.org/wiki/Scale-invariant_feature_transform, but note that SIFT is patented.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow