Row limit on Google Big Query and modeling best pattern

https://stackoverflow.com/questions/22588092

19-06-2023
|

Question

Which is the best pattern for modeling in BQ, when you have high concurrency and large row size?

Since row limit is 2MB (JSON), I suppose you must put an entity in many tables and collect all attributes using one identifier.

Something like this:

id, attribute0_1, attribute1_1, ..., attributen_1 --> EntityTable1 (each row limit=2MB)
id, attribute0_2, attribute1_2, ..., attributen_2 --> EntityTable2 (each row limit=2MB)

Then you need to join by id. But you cannot join 2 big tables, one of them must < 8 MB.

Also you have a limit of 20 (+2) concurrent queries.

If I want to get all entity data with only 1 query, which is the best modeling approach?
If I am forced to have many queries, how to solve the concurrency limit issue?

Thank you!

Solution

Where do you see the requirement that you cannot join two big tables? If you use the 'EACH' keyword, you can join two tables of virtually any size. This is documented here.

Also, as of recently, BigQuery supports multi-way join, so the syntax for joining multiple large tables together should be somewhat more sane.

OTHER TIPS

I am not sure the 2MB JSON limit is accurate. There is a limit of 2MB value size but that tends to be much smaller than the corresponding JSON size which can easily be 5 times larger. The value size is computed as the size of the fields in the record:

Int = 8 bytes
Bool = 1 byte
Float = 8 bytes
String = UTF8 length + 2 bytes

Do the values in your record exceed 2MB. While JOIN EACH will work you will get much better performance if you can packed your data into a single table.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow