Compare the performance between Redshift Serverless (Preview) and Athena by queries of TPC-DS, which is a database benchmark.
(PS: 2022-07-13) Following values were calculated at the rate at the time of preview. When it became GA, the rate dropped by about 30%.
Generate data with TPC-DS Connector for Glue - sambaiz-net
First, executed the following query to the json and parquet data.
select /* TPC-DS query96.tpl 0.1 */ count(*)
from store_sales
,household_demographics
,time_dim, store
where ss_sold_time_sk = time_dim.t_time_sk
and ss_hdemo_sk = household_demographics.hd_demo_sk
and ss_store_sk = s_store_sk
and time_dim.t_hour = 8
and time_dim.t_minute >= 30
and household_demographics.hd_dep_count = 5
and store.s_store_name = 'ese'
order by count(*)
limit 100;
The result was as follows.
service | format | run time (sec) | cost (ap-northeast-1) |
---|---|---|---|
Athena | json | 738.205 | $0.94 (188.29 GB) |
Redshift Serverless (Base RPU=128) | json | 1051.927 | $26.18 |
Redshift Serverless (Base RPU=256) | json | 1052.061 | $52.37 |
Athena | parquet | 9.71 | $0.027 (5.49 GB) |
Redshift Serverless (Base RPU=32) | parquet | 10.745 | $0.07 |
Redshift Serverless (Base RPU=128) | parquet | 5.831 | $0.15 |
Redshift Serverless (Base RPU=256) | parquet | 4.684 | $0.23 |
Costs were calculated with the rate of ap-northeast-1.
- Redshift Serverless: $0.70/RPU hour
- Athena: $5/TB
However, the computing usage for each query of Redshift Serverless cannot be obtained, so these costs are the approximate value of “Base RPU * run time”.
In json, Athena was faster, and Redshift Serverless doesn’t get faster even if Base RPU was increased. Most of the time may be spent on the loading part, which does not depend on computing resources. Since Athena is charged by the amount of load, the cost is increased not so much even if it takes a long time, but Redshift Serverless is not.
In parquet, the cost of Athena could be reduced by the amount of load reduction. Furthermore, the execution time was also significantly shortened, so the benefit of the cost reduction of Redshift Serverless was great. It was even shorter when the Base RPU was increased, probably because the ratio of loading part was smaller than json.
Columnar format Parquet structure and read optimization - sambaiz-net
Executed other queries with parquet.
query | service | run time (sec) | cost (ap-northeast-1) |
---|---|---|---|
query7.tpl 0.2 | Athena | 15.747 | $0.20 (40.84 GB) |
query7.tpl 0.2 | Redshift Serverless (Base RPU=32) | 31.432 | $0.20 |
query75.tpl 0.3 | Athena | 34.847 | $0.45 (90.73 GB) |
query75.tpl 0.3 | Redshift Serverless (Base RPU=32) | 204.145 | $1.27 |
query39.tpl 0.5 | Athena | 9.394 | $0.0001 (261.39 MB) |
query39.tpl 0.5 | Redshift Serverless (Base RPU=32) | 23.335 | $0.15 |
query80a.tpl 0.6 | Athena | 52.105 | $0.99 (198.92 GB) |
query80a.tpl 0.6 | Redshift Serverless (Base RPU=32) | 114.645 | $0.71 |
query80a.tpl 0.6 | Redshift Serverless (Base RPU=64) | 39.244 | $0.49 |
query64.tpl 0.20 | Athena | 70.997 | $0.66 (131.89 GB) |
query64.tpl 0.20 | Redshift Serverless (Base RPU=64) | 126.752 | $1.58 |
At least at this scale, Athena seems to outperform overall in terms of cost, but in some cases, Redshift Serverless is faster and cheaper. I think it is good to try it when the amount of loading is not a few, and the query is complicated and seems to use computing resources well.