Autodiscovery in P2P Applications

Question 1

Grothoff and GauthierDickey from the GNUnet project (an anonymous censorship-resistant file-sharing network) researched on the question of bootstrapping a p2p network without any central hostlist.

They found that for the Gnutella (Limewire) network a random ip search needed on average 2500 connection attempts to find a peer.

In the paper they proposed a method which reduced the required connection attempts to 817 for Gnutella and 51 for the E2DK network.

Achieved was this through creating a statistical profile of p2p users for every DNS organization, this small (around 100kb) discovery database has to be created in advance and shipped with the p2p client.

Question 2

This is the holy grail of P2P. There isn't a magic solution really - there's no way a node can discover other nodes without a good known point to act as a reference (well, you can do so on a LAN by using broadcasting, but not on the internet). P2P filesharing tends to work by having known websites distributing 'start points' for discovery, and then further discovery (I would expect) can come from asking nodes what other nodes they know about.

A good place to start on research would be Distributed Hash Tables.

As for security, that topic will be in the literature somewhere, I should think - again I would recommend Wikipedia. Non-existent ones are trivially dealt with: if you can't contact an IP/port, don't keep it on your list, and if a node regularly provides non-existent pointers, consider de-prioritising it or removing it from your list entirely.

For evil nodes, it depends on your use case, but let's say you are doing file sharing. If you request a section of a file, check with several nodes what the file section's hash should be, and then request by hash. If the evil node gives you a chunk that has a different hash, then you can again de-prioritise or forget that node.

Distributed processing systems work a little differently: they tend to ask several unrelated nodes to perform the same work, and then they use a voting system (probably using hashing again) to determine whether evilness is at hand. If a node provides consistently bad results, the administrator is contacted or the IP is removed from the known nodes list.

Question 3

ok, for two peers to find each other they both have to know a common, lets say, mediator to exchange IPs once. You can use anything for this kind of the first handshake whilst being able to WRITE and READ from that "channel". i.e: DNS (your well known domains), e-Mail, IRC, Twitter, Facebook, dropbox, etc.