Bigquery Shuffle failed with error

Question 1

Thanks Jordan for the insight

I guess case 2 was the cause of the problem

"The distribution of join keys is highly unbalanced. That is, if one LOCATIONID accounts for a large proportion of the rows in table1. Sometimes this might be expected. Sometimes, it is because of default value. For example, if table1 has a lot of rows where the LOCATIONID is not known, so by convention, 0 is used, this would mean that a lot of data gets hashed to the same location."

Most of the values in table1.LOCATIONID were NULL. So even though I had all the table2.LOCATIONID were unique it was failing.

As soon as I joined with column which has 99% of distinct values on both table1 and table2 then it worked like a charm

Question 2

There are a couple of possibilities:

Join explosion. Are the location ids in table2 unique? If not, you could be creating an NxM expansion where all N matching fields in table1 matches with all M matching fields and table2 and creates more rows in the output than in the input.
The distribution of join keys is highly unbalanced. That is, if one LOCATIONID accounts for a large proportion of the rows in table1. Sometimes this might be expected. Sometimes, it is because of default value. For example, if table1 has a lot of rows where the LOCATIONID is not known, so by convention, 0 is used, this would mean that a lot of data gets hashed to the same location.
It is also possible that this is something that should just work. If you provide a job id of a failed job, one of the BigQuery engineers can look up the issue and see what went wrong.

Note for these issues, the partitioning you're doing (ABS(ducc.LOCATIONID % 30 = 0) won't necessarily help, since the values that satisfy this will all get hashed to the same location.

You have a couple of things you can try:

If you've got a join explosion, you could do a GROUP EACH BY in a subselect in the right side of the join so you only get distinct values. For example:

SELECT 
ducc.*, dl.LOCATIONID, dl.LOCATIONNAME
FROM [table1] ducc
LEFT OUTER JOIN EACH 
    (SELECT LOCATIONID, MIN(LOCATIONNAME) as LOCATIONNAME 
     FROM [table2] 
     GROUP EACH BY LOCATIONID)  dl
ON ducc.LOCATIONID = dl.LOCATIONID
WHERE ABS(ducc.LOCATIONID % 30) = 0

Remove the EACH qualifier. This means that you won't have to do a shuffle. This only works if table2 is small enough. However, you can apply the filtering to that table instead, which may help, as in:

SELECT 
ducc.*, dl.LOCATIONID, dl.LOCATIONNAME
FROM [table1] ducc
LEFT OUTER JOIN 
    (SELECT LOCATIONID, LOCATIONNAME
     FROM [table2] 
     WHERE ABS(ducc.LOCATIONID % 30) = 0)  dl
ON ducc.LOCATIONID = dl.LOCATIONID

If the problem is that one of your hash buckets is too large because of a dummy value that gets matched, you can, of course, try filtering it out. If it is a legitimate value that has a large proportion of the matches, you can break up the query into pieces, doing the part that has 'too many matches' first as a JOIN without the each, and the other parts as a separate query with JOIN EACH. You can concatenate the results together by specifying that you want to append the results of the first query to the second.