Question

This question is an enhancement to the previous SO question.

Distance Calculation for massive number of devices/nodes

I have N mobile devices/nodes (say 100K) and I periodically obtain their location ( latitude , longtitude ) values.

Some of the devices are "logically connected" to roughly M other devices (say 10 in average). My program periodically compares the distance between the each device and its logically connected devices and determines if the distance is within a threshold (say 100 meters).

Furthermore number of logical connections "K" can also be more then one and (say 5 in average) Example is A can be connected to B,C for i.e. "parents" logic. A can also be connected to C,D,E,F for "work" logic

I need a robust algorithm to calculate these distances to the logically connected devices.

The complexity order of brute force approach would be NMK or (Θ3 in terms of order)

The program does this every 3 seconds (all devices are mobile), thus for instance 100K*10*5 = 5M calculations every 3 seconds is not good.

Any good/classical algorithms for this operation ?

Was it helpful?

Solution

I decided to rewrite my answer after a bit more thought.

The complexity of your problem is not O(N^3) in the worst case, it is actually only O(N^2) in the worst case. It's also not O(N*M*K) but rather O(N*(M+K)), where O(M+K) is O(N). However, the real complexity of your problem is O(E) where E is the total number of logical connections (number of work connections + number of parent connections). Unless you want to approximate, your solution cannot be better than O(E). Your averages suggest that you likely have on the order of 5 million connections, which is on the order of O(N log N).

You example uses two sets of logical connections. So you would simply cycle through each set and check if distance between the devices of the logical connection is within the threshold.

That being said, the example you gave and your assumed time complexity suggests you are interested in more than just if the individual connections are within threshold, but rather if sets of connections are within threshold. Specifically, in your example it would return True if parents logic: (A,B), (A,C) and Work logic (A,C),(A,D),(A,E),(A,F) are all True. In which case your best data structure would be a dictionary of dictionaries that looks like the following in Python (includes the optimization below): "parentsLogic[A][B] = (last position A, last position B, was within threshold)".

If it's common that the positions don't change much, you may obtain some run-time improvement by storing the previous positions and if they were within the threshold or not (Boolean). The benefit is that you can simply return the previous result if the two positions haven't changed and updating them if they have changed.

OTHER TIPS

You can use a brute force algorithm and sort the result then use the top best groups.

One thing you can do in addition to what was suggested in the answers to the previous question is to store a list of the nearby connected devices for every device and update it only for those devices that have moved by a significant distance since last update (and for the devices connected to those that have moved).

For example, if the threshold is 100 m, store a list of the connected devices within 200 m of every device, and update it for every device that has moved more than by 50 m since last update.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top