DISTRIBUTE BY notices in Greenplum

Question 1

DISTRIBUTED BY is how Greenplum determines which segment will store each row. Because Greenplum is an MPP database in most production databases you will have multiple segment servers. You want to make sure that the Distribution column is the column you will join on usaly.
temp_table is a table that will be created for you on the Greenplum cluster. If you haven't set search_path to something else it will be in the public schema.

Question 2

For your first question, the DISTRIBUTE BY clause is used for telling the database server how to store the database on the disk. (Create Table Documentation)

I did see one thing right away that could be wrong with the syntax on your Join clause where you say on a.x = s.x --> there is no table referenced as s. Maybe your problem is as simple as changing this to on a.x = b.x?

As far as where the temp table is stored, I believe it is generally stored on the database server. This would be a question for your DBA as it is a setup item when installing the database. You can always dump your data to a file on your computer and reload at a later time if you want to save your results (without printing.)

Question 3

As I know, tmp table is stored in memory. It is faster when there are less data and it is recommended to use temp table. In the opposite, as temp table is stored into memory, if there are too much data it will consume very large memory. It is recommended to use regular tables with distributed clause. As it will be distributed across your cluster.

In addition, tmp table is stored into a special schema, so you don't need to specify the schema name when creating the temp table, and it only exist in the current connection, after you close the current connection, postgresql will drop the table automatically.