How can I architect P2P discovery?

https://softwareengineering.stackexchange.com/questions/298879

10-10-2020
|

Question

I've never really done network programming so this is all new to me. What I'm attempting to make is a Peer to Peer "chat" client. The idea is that anyone can connect to someone whose IP they know. All chat messages are to be distributed to everyone.

What I'm getting stuck on is the peer discovery. What I've done so far is:

App has an IP of a known person (user provided)
Connects to known IP
Known IP provides it's list of known Peers; we store it in peer list
Discovery loop:
1. Make a local copy of peer list
2. For each Peer in local peer list
  1. Request Peer list (i.e. step 2 + 3)
    - If error, remove this peer from original peer list
    - Otherwise, add new peers to original peer list
3. Queue another Discovery Loop in 1min

The listening for connections in step 3 and the discovery loop are asynchronous.

The reason a copy is made in the first place is due to some other design decisions for down the road.

The problem is that if I introduce a client with a bad peer on it's peer list, i.e. you can't connect to it, what happens is that this Peer is passed via step 3 when serving the known peer list and then other clients have it in theirs. They propagate this bad peer and later on, during the Discover Loop, remove that peer. But then, they will do another Discover Loop and will receive that Peer from someone else again.

The end result is that bad peer never leaves the network.

One option I can think of is a vague idea of having Peer "health" that each client keeps track of separately so that they can ignore peers provided on someone elses list. However, this would, I think, need to be somewhat "timed" so that when someone disconnects they aren't shut out of the network forever.

The other idea I had was for a broadcast message to all known peers when a bad peer is encountered. However, I didn't like this idea because it leaves it open for someone to make a different client that's job is to poison the network by insisting peers are bad. This would then require me to have peers check that a remove request is genuine by also attempting to connect to his peer immediately.

I'm looking for opinions on the two options above.

Solution

I'd be worried about the overhead in transmitting the whole list of peers each time. If you have N users, and during each 'discovery loop' each peer connects to the rest of the N peers and downloads a list of N peers, then the bandwidth dedicated to synchronizing the peers list grows like O(N^3)! By using some tricks to only transmit the peers that differ between the two you can get it down to O(N^2 log N) or O(N^2), but that's still pretty steep.

I heard some time ago of a synchronization model designed to avoid this problem. I don't remember the name so I'll call it "rumor mill synchronization."

Your client maintains three lists of peers:

Known good peers.
Recently changed peers. Each peer in this list is associated with a boolean 'added/removed' flag and a (local) time.
Possibly changed peers. Each peer is this list also has the 'added/removed' flag, and also has an associated counter.

You also choose a length of time for changes to be considered 'new,' and a required number of 'votes' before you commit a change (K). I would suggest that the former is fixed (based on your testing) and the latter is user-configurable (so they can set how trusting they are of the network). Here is the procedure:

You start off with the addresses of a few peers that you think are good; these might be your servers' addresses preloaded into the app, or a set of peers that the user chooses. Query a number of them, asking to download their entire list of known good peers. Check them against each other so that an attacker will have a more difficult time adding malicious addresses. (This takes O(KN) bandwidth.)

Assuming that you trust most of the peers you just talked to, you can add peers that are in a majority of their lists to your list of known good peers. Now we start the discovery loop.
Each cycle, you connect to a new peer from your known good list, and ask for their recently changed list. (If the network is relatively static, this list should be less than O(N).) If the network is large you can choose a random peer for this step, otherwise you should loop through the (randomly ordered) list. The goal is to talk to as many different peers as possible without biasing towards a certain subset.
- If they don't respond, remove them from your known good list and add them to your recently changed list as 'removed' (overwriting an 'added' entry for that peer if it exists; don't forget to update the time too).
- Otherwise, go through the received peers one by one.
  - If they're in your known good list, there's nothing to do.
  - If they're in your possible changes list, increment their counter; otherwise, add them to the list with their counter at one.
Check the list of possible changes to see if any have surpassed K votes; if so, add them to your list of recently changed, and add or remove them from your known good list as appropriate.
Check the list of recent changes to see if any changes on this list are 'stale,' i.e. older than your time threshold. If so, remove them.
Wait for a little, then go back to step 2. You should also repeat step 1 occasionally with random peers from your 'known good' list; treat this like step 2 but download their list of known good peers, not recently changed peers.
If you receive a connection from an unknown peer, treat this as in step 3, adding them to your recently changed and known good peers.

Note that this approach only works if the network is relatively static, otherwise the list of recent changes will encompass almost all the clients.

The implementation details might be a bit off (it was a long time ago that I read about this) but the idea is there; basically you ring up as many peers as you can and ask, 'hear about any new people?'

OTHER TIPS

Your solution only has a problem if there is a period when an invalid up can enter an ip list. If a bad ip can't enter a list it will die out quickly.

So the only change you need to make is to separate out the concepts of the verified list and the pending list.

If a client requests your list you only ever provide the verified list (this may contain bad ips but new bad ips can't enter it).
On each refresh cycle you ask each of your existing ips for their lists, if you can't contact them they are removed from your list. If they provide new ips they enter the pending list, missing ips are ignored.
Once the list of known ips has been polled you poll the pending list, if and only if they reply do they enter the verified list

I personally like the broadcast idea. It may be possible to implement security protocols to minimize the risk of an arp poison attack.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange