Compare Redshift Serverless and Athena performances by TPC-DS queries

databaseawsprestoetl

Redshift Serverless and other serverless aggregation services, run query with Glue Data Catalog - sambaiz-net

Compare the performance between Redshift Serverless (Preview) and Athena by queries of TPC-DS, which is a database benchmark.

(PS: 2022-07-13) Following values were calculated at the rate at the time of preview. When it became GA, the rate dropped by about 30%.

Generate data with TPC-DS Connector for Glue - sambaiz-net

First, executed the following query to the json and parquet data.

select /* TPC-DS query96.tpl 0.1 */  count(*) 
from store_sales
    ,household_demographics 
    ,time_dim, store
where ss_sold_time_sk = time_dim.t_time_sk   
    and ss_hdemo_sk = household_demographics.hd_demo_sk 
    and ss_store_sk = s_store_sk
    and time_dim.t_hour = 8
    and time_dim.t_minute >= 30
    and household_demographics.hd_dep_count = 5
    and store.s_store_name = 'ese'
order by count(*)
limit 100;

The result was as follows.

service format run time (sec) cost (ap-northeast-1)
Athena json 738.205 $0.94 (188.29 GB)
Redshift Serverless (Base RPU=128) json 1051.927 $26.18
Redshift Serverless (Base RPU=256) json 1052.061 $52.37
Athena parquet 9.71 $0.027 (5.49 GB)
Redshift Serverless (Base RPU=32) parquet 10.745 $0.07
Redshift Serverless (Base RPU=128) parquet 5.831 $0.15
Redshift Serverless (Base RPU=256) parquet 4.684 $0.23

Costs were calculated with the rate of ap-northeast-1.

  • Redshift Serverless: $0.70/RPU hour
  • Athena: $5/TB

However, the computing usage for each query of Redshift Serverless cannot be obtained, so these costs are the approximate value of “Base RPU * run time”.

In json, Athena was faster, and Redshift Serverless doesn’t get faster even if Base RPU was increased. Most of the time may be spent on the loading part, which does not depend on computing resources. Since Athena is charged by the amount of load, the cost is increased not so much even if it takes a long time, but Redshift Serverless is not.

In parquet, the cost of Athena could be reduced by the amount of load reduction. Furthermore, the execution time was also significantly shortened, so the benefit of the cost reduction of Redshift Serverless was great. It was even shorter when the Base RPU was increased, probably because the ratio of loading part was smaller than json.

Columnar format Parquet structure and read optimization - sambaiz-net

Executed other queries with parquet.

query service run time (sec) cost (ap-northeast-1)
query7.tpl 0.2 Athena 15.747 $0.20 (40.84 GB)
query7.tpl 0.2 Redshift Serverless (Base RPU=32) 31.432 $0.20
query75.tpl 0.3 Athena 34.847 $0.45 (90.73 GB)
query75.tpl 0.3 Redshift Serverless (Base RPU=32) 204.145 $1.27
query39.tpl 0.5 Athena 9.394 $0.0001 (261.39 MB)
query39.tpl 0.5 Redshift Serverless (Base RPU=32) 23.335 $0.15
query80a.tpl 0.6 Athena 52.105 $0.99 (198.92 GB)
query80a.tpl 0.6 Redshift Serverless (Base RPU=32) 114.645 $0.71
query80a.tpl 0.6 Redshift Serverless (Base RPU=64) 39.244 $0.49
query64.tpl 0.20 Athena 70.997 $0.66 (131.89 GB)
query64.tpl 0.20 Redshift Serverless (Base RPU=64) 126.752 $1.58

At least at this scale, Athena seems to outperform overall in terms of cost, but in some cases, Redshift Serverless is faster and cheaper. I think it is good to try it when the amount of loading is not a few, and the query is complicated and seems to use computing resources well.