EMRクラスタで動かしたSparkのログをFluent BitでNew Relicに集約する

2022-09-04 aws spark monitoring fluentd newrelic

Cluster modeでSpark jobを動かすとログが step/ 下に出力されないためコンソール上でもログを確認しづらいのでNew Relicに集約してみる。

AWS CLIでEMRクラスタを立ち上げSparkのアプリケーションを実行する - sambaiz-net

New Relicでインフラやアプリケーションをモニタリングする - sambaiz-net

自前のFluent Bitで送る方法

省メモリ版のFluentdであるFluent Bitをインストールしログを送信する。

Fluent Bitのインストールと設定

Bootstrap actionsでFluent Bitをインストールし、 New Relicのプラグインを配置する。

Fluent Bit 1.9 から YAMLでも設定ファイルが書けるようになったが、現状 not recommended for production とのことなので旧来の形式で書いている。

Cluster mode では stdout/stderr に出力したログはDriverが動いているCore nodeの /mnt/var/log/hadoop-yarn/containers/application_id/container_id/(stdout|stderr) に出力される。

#!/bin/bash -e

sudo tee /etc/yum.repos.d/fluent-bit.repo << EOF > /dev/null
[fluent-bit]
name = Fluent Bit
baseurl = https://packages.fluentbit.io/amazonlinux/2/\$basearch/
gpgcheck=1
gpgkey=https://packages.fluentbit.io/fluentbit.key
enabled=1
EOF

sudo yum install -y fluent-bit-1.9.7-1

cd /etc/fluent-bit/

sudo wget https://github.com/newrelic/newrelic-fluent-bit-output/releases/download/v1.14.0/out_newrelic-linux-arm64-1.14.0.so
sudo tee plugins.conf << EOF > /dev/null
[PLUGINS]
    Path /etc/fluent-bit/out_newrelic-linux-arm64-1.14.0.so
EOF

sudo tee fluent-bit.conf << EOF > /dev/null
[SERVICE]
    plugins_file plugins.conf
    http_server  On
    http_listen  0.0.0.0
    http_port    2020

[INPUT]
    name tail
    path /mnt/var/log/hadoop-yarn/containers/*/*/stdout
    
[FILTER]
    name modify
    match *
    add emr_cluster_name $(aws emr list-clusters --query "Clusters[?Id=='$(sudo cat /mnt/var/lib/info/job-flow.json | jq -r ".jobFlowId")']" --region <region> | jq -r ".[0].Name")

[OUTPUT]
    name newrelic
    match *
    licenseKey \${NEW_RELIC_LICENSE_KEY}
EOF

sudo mkdir -p /lib/systemd/system/fluent-bit.service.d
sudo tee /lib/systemd/system/fluent-bit.service.d/newrelicenv.conf << EOF > /dev/null
[Service]
    Environment="NEW_RELIC_LICENSE_KEY=<license_key>"
EOF

sudo systemctl start fluent-bit

Fluent Bitのメトリクスを送る

http_server On で起動すると次のようなメトリクスを取ることができる。

$ curl localhost:2020/api/v1/metrics | jq
{
  "input": {
    "tail.0": {
      "records": 0,
      "bytes": 0,
      "files_opened": 0,
      "files_closed": 0,
      "files_rotated": 0
    }
  },
  "filter": {},
  "output": {
    "newrelic.0": {
      "proc_records": 0,
      "proc_bytes": 0,
      "errors": 0,
      "retries": 0,
      "retries_failed": 0,
      "dropped_records": 0,
      "retried_records": 0
    }
  }
}

New RelicのInfrastructure agentをインストールし、New Relic Flexでこれを送信する。

#!/bin/bash -e

sudo curl -o /etc/yum.repos.d/newrelic-infra.repo https://download.newrelic.com/infrastructure_agent/linux/yum/amazonlinux/2/aarch64/newrelic-infra.repo
sudo yum -q makecache -y --disablerepo='*' --enablerepo='newrelic-infra'
sudo yum install newrelic-infra -y

NEW_RELIC_LICENSE_KEY=<license_key>
echo "license_key: ${NEW_RELIC_LICENSE_KEY}" | sudo tee -a /etc/newrelic-infra.yml
    
sudo tee /etc/newrelic-infra/integrations.d/fluentbit.yml << EOF > /dev/null
integrations:
- name: nri-flex
  config:
    name: fluentbit
    apis:
    - event_type: fluentbit
      url: http://localhost:2020/api/v1/metrics
EOF
    
sudo systemctl start newrelic-infra

NRQLで値を取得できる。

FROM fluentbit SELECT max(output.newrelic.0.errors) FACET tags.Name TIMESERIES

Infrastructure agent の Fluent Bit でログを送る方法

実は Infrastructure agent にも Fluent Bit を利用したログ転送の機能があり、 /etc/newrelic-infra/logging.d/logging.yml に設定ファイルを置くとログが送られる。通常の Fluent Bit の設定ファイルをimportすることもできて New Relic への OUTPUT は動的に追加される。

sudo tee /etc/newrelic-infra/logging.d/fluentbit.conf << EOF > /dev/null
[SERVICE]
    plugins_file plugins.conf
    http_server  On
    http_listen  0.0.0.0
    http_port    2020

[INPUT]
    name tail
    path /mnt/var/log/hadoop-yarn/containers/*/*/stdout

[FILTER]
    name modify
    match *
    add emr_cluster_name $(aws emr list-clusters --query "Clusters[?Id=='$(sudo cat /mnt/var/lib/info/job-flow.json | jq -r ".jobFlowId")']" --region ap-northeast-1 | jq -r ".[0].Name")
EOF

sudo tee /etc/newrelic-infra/logging.d/logging.yml << EOF > /dev/null
logs:
  - name: fluentbit-import
    fluentbit:
      config_file: /etc/newrelic-infra/logging.d/fluentbit.conf
EOF

参考

hadoop - Where does YARN application logs get stored in EMR before sending to S3 - Stack Overflow