How Hadoop YARN allocates resources to applications and check how much resources are allocatedhadoop
YARN is a module that manages resources of a Hadoop cluster and schedules.
How Hadoop YARN allocates resources to applications
Once ResourceManager (RM) receives an application from a client, it launches ApplicationMaster (AM) and passes information for executing the application. ApplicationMaster asks ResourceManager for allocating resources. After allocated, next it communicates with NodeManagers (NMs) running on each node, and then starts containers and runs the application.
The Scheduler which allocates resources in Hadoop YARN, and Dominant Resource Fairness (DRF) - sambaiz-net
Events such as resource allocated are notified to ApplicationMaster through AMRMClientAsync.CallbackHandler called by ResourceManager. Similarly, events such as container starting and stopping are notified through NMClientAsync.CallbackHandler called by NodeManagers.
Check how much resources are allocated
yarn command is available.
# aws emr add-steps --cluster-id $CLUSTER_ID --steps Type=Spark,Name="Spark Program",ActionOnFailure=CONTINUE,Args=[--class,org.apache.spark.examples.SparkPi,/usr/lib/spark/examples/jars/spark-examples.jar,10] $ yarn application -list ... Total number of applications (application-types: , states: [SUBMITTED, ACCEPTED, RUNNING] and tags: ):1 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1669193308183_0002 Spark Pi SPARK hadoop default ACCEPTED UNDEFINED 0% N/A $ yarn application -status application_1669193308183_0002 ... Application Report : Application-Id : application_1669193308183_0002 Application-Name : Spark Pi Application-Type : SPARK User : hadoop Queue : default Application Priority : 0 Start-Time : 1669193960802 Finish-Time : 1669193970732 Progress : 100% State : FINISHED Final-State : SUCCEEDED Tracking-URL : ip-172-31-33-99.ap-northeast-1.compute.internal:18080/history/application_1669193308183_0002/1 RPC Port : -1 AM Host : 172.31.39.48 Aggregate Resource Allocation : 61411 MB-seconds, 14 vcore-seconds Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds Log Aggregation Status : SUCCEEDED Diagnostics : Unmanaged Application : false Application Node Label Expression : <Not set> AM container Node Label Expression : <DEFAULT_PARTITION> TimeoutType : LIFETIME ExpiryTime : UNLIMITED RemainingTime : -1seconds $ yarn top YARN top - 09:07:05, up 0d, 0:18, 0 active users, queue(s): root NodeManager(s): 1 total, 1 active, 0 unhealthy, 1 decommissioned, 0 lost, 0 rebooted Queue(s) Applications: 1 running, 3 submitted, 0 pending, 2 completed, 0 killed, 0 failed Queue(s) Mem(GB): 10 available, 0 allocated, 0 pending, 0 reserved Queue(s) VCores: 3 available, 1 allocated, 0 pending, 0 reserved Queue(s) Containers: 1 allocated, 0 pending, 0 reserved APPLICATIONID USER TYPE QUEUE PRIOR #CONT #RCONT VCORES RVCORES MEM RMEM VCORESECS MEMSECS %PROGR TIME NAME application_1669193308183_0003 hadoop spark default 0 1 0 1 0 0G 0G 0 0 0.00 00:00:00 Spark Pi
It can also check allocated resources each node.
$ yarn node -list -showDetails ... Total Nodes:1 Node-Id Node-State Node-Http-Address Number-of-Running-Containers ip-172-31-39-48.ap-northeast-1.compute.internal:8041 RUNNING ip-172-31-39-48.ap-northeast-1.compute.internal:8042 2 Detailed Node Information : Configured Resources : <memory:11712, vCores:4> Allocated Resources : <memory:11712, vCores:2> Resource Utilization by Node : PMem:2672 MB, VMem:2672 MB, VCores:1.7560812 Resource Utilization by Containers : PMem:275 MB, VMem:2054 MB, VCores:1.74 Node-Labels :