Just to close the loop on this question; after investigation it turned out to be an error in the data that was masked by automatic-whitespace-deletion that was being done in the browser. See https://code.google.com/p/google-bigquery/issues/detail?id=89&q=join%20each for more information.
Join issue in BigQuery with two million row tables
-
08-07-2023 - |
Domanda
I posted this in the BigQuery issue tracker: (please star the issue if it affects you) https://code.google.com/p/google-bigquery/issues/detail?id=89&q=join%20each
What steps will reproduce the problem?
- See job personal-real-estate:job_up2I9A31Bo8NSvwD0XTWG2tBoVA
- I run
SELECT * FROM
(select *,integer(AD_STREET_NO_PROP) as str_no_prop, integer(CD_ADDR_ZIP_PROP) as CD_ADDR_ZIP_PROP1 from [acris_nyc.nyc_dof_SOA]
where NM_RECIPIENT_1 like '%THE MICHAEL R. BLOOMBERG REVOCABLE%') AS s
JOIN each
(select *,integer(hnum_lo) as str_num,integer(zip) as zip1 from [acris_nyc.nyc_dof_tc_Tentative_Assessment_Roll] where owner like '%BLOOM%' and txcl = '1') AS a
on s.str_no_prop = a.str_num and s.ad_street_1_prop = a.str_name order by NEW_FV_T desc limit 100
What is the expected output? What do you see instead?
I expect one record to be returned.
containing
17 as the str_num and "EAST 79 STREET" as the str_name
What version of the product are you using? On what operating system?
BigQuery on 4/22/2014 from chrome browser
Please provide any additional information below.
I try a very similar query on a much smaller set of tables and it works as expected.
SELECT * FROM (select *, integer(number) as inumber from [test_1.table1] where owner like '%BLOOM%') as a join each (select *, integer(number) as inumber from [test_1.table2] where owner like '%BLOOM%') as b on a.inumber=b.inumber and a.street = b.street
returns
Row a_number a_street a_owner a_inumber b_number b_street b_owner b_inumber
1 00000017 EAST 79 STREET BLOOMBERG, MICHAEL R 17 17 EAST 79 STREET THE MICHAEL R. BLOOMBERG REVOCABLE 17
If I query the individual tables in the 1 million row case they contain the data that should match when the join completes.
Is there any way to debug the actual join operation?
Thanks.
Soluzione
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow