BigQuery の INFORMATION_SCHEMA でコストが高いクエリを特定する
gcpINFORMATION_SCHEMA.JOBS で実行されたクエリや total_bytes_ billed が取得できる。重複しないよう statement_type = “SCRIPT” を除く。
SELECT
CASE
WHEN STARTS_WITH(job_id, 'materialized_view_refresh_') then 'materialized_view_refresh'
WHEN STARTS_WITH(job_id, 'scheduled_query_') then 'scheduled_query'
WHEN STARTS_WITH(job_id, 'bquxjob_') then 'bq user interface'
ELSE 'other'
END AS job_type,
user_email,
query,
state,
total_bytes_billed / 1024 / 1024 / 1024 / 1024 * 6.25 /* USD / TB */ as cost
FROM
`region-us.INFORMATION_SCHEMA.JOBS`
WHERE DATE(creation_time) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) AND statement_type != 'SCRIPT'
ORDER BY total_bytes_billed DESC
マテリアライズドビューの場合、クエリの値が CALL BQ.REFRESH_MATERIALIZED_VIEW(view_name) のような形式になるため、INFORMATION_SCHEMA.TABLES も見る。
SELECT table_name, ddl
FROM (account).(dataset).INFORMATION_SCHEMA.TABLES
WHERE table_type = 'MATERIALIZED VIEW'
参考
Categorize bigquery queries based on job_id coming from information_schema.jobs to look at costs