BigQuery の INFORMATION_SCHEMA でコストが高いクエリを特定する

gcp

INFORMATION_SCHEMA.JOBS で実行されたクエリや total_bytes_ billed が取得できる。重複しないよう statement_type = “SCRIPT” を除く。

SELECT
  CASE
    WHEN STARTS_WITH(job_id, 'materialized_view_refresh_') then 'materialized_view_refresh'
    WHEN STARTS_WITH(job_id, 'scheduled_query_') then 'scheduled_query'
    WHEN STARTS_WITH(job_id, 'bquxjob_') then 'bq user interface'
    ELSE 'other' 
  END AS job_type,
  user_email,
  query,
  state,
  total_bytes_billed / 1024 / 1024 / 1024 / 1024 * 6.25 /* USD / TB */ as cost
FROM
  `region-us.INFORMATION_SCHEMA.JOBS`
WHERE DATE(creation_time) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) AND statement_type != 'SCRIPT'
ORDER BY total_bytes_billed DESC

マテリアライズドビューの場合、クエリの値が CALL BQ.REFRESH_MATERIALIZED_VIEW(view_name) のような形式になるため、INFORMATION_SCHEMA.TABLES も見る。

SELECT table_name, ddl
FROM (account).(dataset).INFORMATION_SCHEMA.TABLES
WHERE table_type = 'MATERIALIZED VIEW'

参考

Categorize bigquery queries based on job_id coming from information_schema.jobs to look at costs