The Scheduler which allocates resources in Hadoop YARN, and Dominant Resource Fairness (DRF)

hadoopread_paperalgorithm

YARN’s ResourceManager consists of ApplicationsManager, which receives applications from clients and launches ApplicationMaster, and Scheduler, which receives requests from ApplicationMaster and allocates resources.

How Hadoop YARN allocates resources to applications and check how much resources are allocated - sambaiz-net

Scheduler

Scheduler has implementations such as CapacityScheduler, which aims to maximize the throughput in multi-tenant clusters, and FairScheduler, which allocates fair resources to all applications. You can choose to use which one with yarn.resourcemanager.scheduler.class, and CapacityScheduler is used in EMR by default.

ResourceCalculator

CapacityScheduler has settings for ResourceCalculator, which calculates the amount of allocated resources, and it seems to determine the allocation of resources. Whereas DefaultResourceCalculator, which is used by default, watches only the memory, DominantResourceCalculator watches resources that have a larger share, so it attempts to equalize CPU shares for CPU-heavy tasks and memory shares for memory-heavy tasks. This resource allocation method is called Dominant Resource Fairness (DRF).

Dominant Resource Fairness: Fair Allocation of Multiple Resource Types, Ghodsi et al. (2011)

The paper introduces Asset Fairness and Competitive Equilibrium from Equal Income (CEEI) as other methods and compares them from the following viewpoints.

  • Sharing incentive: Should user be better to share resources rather than own exclusively?
  • Strategy-proofness: Are there benefits for users to lie about their needs?
  • Envy-freeness: Aren’t resources allocated in preference to other users?
  • Pareto efficiency: Can’t users increase their amounts without reducing the amounts for other users?

Asset Fairness is a method that treats different resources such as CPU, memory usage, bandwidth, etc. In the following cases, User2 would be better off occupying half the resources and not fulfilling the Sharing incentive.

Asset Fairness treats different resources such as CPU, memory usage, and bandwidth as the same one, and balances the sum. In the following cases, User2 would be better off occupying half the resources, so this method violates Sharing incentive.

  • The amount of each resource is (30, 30) and there are 2 users
  • User1’s task needs (1, 3) resources -> 3 tasks (6, 18) are allocated
  • User2’s task needs (1, 1) resources -> 12 tasks (12, 12) are allocated

CEEI is a method that users receive equal resources and then negotiate with others to determine amounts through the Nash bargaining solution. This method violates Strategy-proofness in the following cases.

  • Total amount of each resource is (100, 100) and there are 2 users
  • User1’s task needs (16, 1) resources -> 100/31 = 3.2 tasks are allocated
  • User2’s task needs (1, 2) resources -> 1500/31 = 48.8 tasks are allocated
  • If User1 lies and says the task needs (16, 8) resources, 25/6 = 4.2 tasks are allocated

On the other hand, DRF doesn’t violate all viewpoints, and the result that it succeeded to use any resources efficiently compared to per fixed-size-slot allocation, is shown.

References

Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads | AWS Big Data Blog

Dominant Resource Fairness on YARN - Qiita