문제

A couple of questions about the TABLE_QUERY function:

  • The examples show using table_id in the query string, are there other fields available?
  • It seems difficult to debug. I'm getting "error evaluating subsidiary query" when I try to use it.
  • How does TABLE_QUERY() work?
도움이 되었습니까?

해결책

The TABLE_QUERY() function allows you to write a SQL WHERE clause that is evaluated to find which tables to run the query over. For instance, you can run the following query to count the rows in all tables in the publicdata:samples dataset that are older than 7 days:

SELECT count(*)
FROM TABLE_QUERY(publicdata:samples,
    "MSEC_TO_TIMESTAMP(creation_time) < "
    + "DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')")

Or you can run this to query over all tables that have ‘git’ in the name (which are the github_timeline and the github_nested sample tables) and find the most common urls:

SELECT url, COUNT(*)
FROM TABLE_QUERY(publicdata:samples, "table_id CONTAINS 'git'")
GROUP EACH BY url
ORDER BY url DESC
LIMIT 100

Despite being very powerful, TABLE_QUERY() can be difficult to use. The WHERE clause must be specified as a string, which can be a little bit awkward. Moreover, it can be difficult to debug, since when there is a problem, you only get the error “Error evaluating subsidiary query”, which isn’t always helpful.

How it works:

TABLE_QUERY() essentially executes two queries. When you run TABLE_QUERY(<dataset>, <table_query>), BigQuery executes SELECT table_id FROM <dataset>.__TABLES_SUMMARY__ WHERE <table_query> to get the list of table IDs to run the query on, then it executes your actual query over those tables.

The __TABLES__ portion of that query may look unfamiliar. __TABLES_SUMMARY__ is a meta-table containing information about tables in a dataset. You can use this meta-table yourself. For example, the query SELECT * FROM publicdata:samples.__TABLES_SUMMARY__ will return metadata about the tables in the publicdata:samples dataset.

Available Fields:

The fields of the __TABLES_SUMMARY__ meta-table (that are all available in the TABLE_QUERY query) include:

  • table_id: name of the table.
  • creation_time: time, in milliseconds since 1/1/1970 UTC, that the table was created. This is the same as the creation_time field on the table.
  • type: whether it is a view (2) or regular table (1).

The following fields are not available in TABLE_QUERY() since they are members of __TABLES__ but not __TABLES_SUMMARY__. They're kept here for historical interest and to partially document the __TABLES__ metatable:

  • last_modified_time: time, in milliseconds since 1/1/1970 UTC, that the table was updated (either metadata or table contents). Note that if you use the tabledata.insertAll() to stream records to your table, this might be a few minutes out of date.
  • row_count: number of rows in the table.
  • size_bytes: total size in bytes of the table.

How to debug

In order to debug your TABLE_QUERY() queries, you can do the same thing that BigQuery does; that is, you can run the the metatable query yourself. For example:

SELECT * FROM publicdata:samples.__TABLES_SUMMARY__ 
WHERE MSEC_TO_TIMESTAMP(creation_time)  < 
   DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')

lets you not only debug your query but also see what tables would be returned when you run the TABLE_QUERY function. Once you have debugged the inner query, you can put it together with your full query over those tables.

다른 팁

Alternative answer, for those moving forward to Standard SQL:

  • BigQuery Standard SQL doesn't support TABLE_QUERY, but it supports * expansion for table names.
  • When expanding table names *, you can use the meta-column _TABLE_SUFFIX to narrow the selection.
  • Table expansion with * only works when all tables have compatible schemas.

For example, to get the average worldwide NOAA GSOD temperature between 2010 and 2014:

#standardSQL
SELECT AVG(temp) avg_temp, _TABLE_SUFFIX y
FROM `bigquery-public-data.noaa.gsod_20*` #every year that starts with "20"
WHERE _TABLE_SUFFIX BETWEEN "10" AND "14" #only years between 2010 and 2014
GROUP BY y
ORDER BY y
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top