What to do instead of SQL joins while scaling horizontally?

Question 1

There are many approaches to make this work, the general idea is to shard your data in such a way as to group related data.

As a simple (trivial) example, if you have a Game database, you can shard Player and PlayerGame data by the same key (playerId). If there are other tables that are related, you can add those too, think of it as a "shard tree" of related tables. Then all the data for a given Player will be guaranteed to be in the same shard. You can then perform joins within a shard, but you cannot do inner joins across shards.

The other common technique is to replicate Global tables to all shards, these are typically tables that are not updated often, but are used in lots of joins.

With these two approaches you can:

Join within the Shard Tree (but not a cross-shard inner join, e.g., between 2 players)
Join from a sharded table to a Global table at any time

Then the other trick is distributed queries, where you may need to rollup results from many shards (e.g., a count of all Players).

Here is a white paper that describes a lot of this in more detail:

http://dbshards.com/dbshards/database-sharding-white-paper/

The key to this type of approach is to understand how you want to query the data. The answer above can also be useful, to de-normalize some data when you have to query it from a different perspective. In that case you need to write the data in two (or more) formats, and partition your shards according to each structure. Again using the simple example above, let's say you need to query all the Players for a single GameInstance. Now you could make a separate "shard tree" with GameInstance as the parent and PlayerGame as the child, sharded by GameInstanceId. Now that query will be efficient too.

The goal is to have as many single shard operations as you can, as distributed operations oddly enough are generally the "evil" of a distributed database cluster.

Question 2

Depending on the data you are using, you could potentially denormalize it and spread it across different DB nodes. That would make you writes a bit more tricky, but would improve read performance.