Vinod Kumar Vavilapallih, Arun C Murthyh, Chris Douglasm, Sharad Agarwali, Mahadev Konarh, Robert Evansy ,Thomas Gravesy, Jason Lowey, Hitesh Shahh, Siddharth Sethh Bikas Sahah Carlo Curinom Owen O’Malleyh, Sanjay Radiah Benjamin Reed f and Eric Baldeschwielerh
According to the paper Apache Hadoop YARN stands for Yet Another Resource Negotiator) is an open-source software framework that was developed as part of the Hadoop project. The framework was designed to overcome the limitations of the original Hadoop MapReduce framework by providing a more flexible and scalable approach to resource management in large clusters.
The YARN framework consists of two key components, first is a central Resource Manager (RM) and second is multiple Node Managers (NMs) that are distributed across the cluster. The Resource Manager is responsible for managing and allocating resources, while the Node Managers manage the resources on individual nodes and communicate with the Resource Manager to request additional resources.
The design of the YARN framework is based on the concept of -containers, which are self-contained units of computing resources that can be allocated to individual applications. Each container includes a fixed amount of CPU, memory, and other resources, and is isolated from other containers running on the same node. This design allows for more efficient use of resources and provides a more granular level of control over resource allocation.
YARN supports a variety of application frameworks, including MapReduce, Apache Spark, Apache Tez, and Apache Flink. Each application framework has its own Application Master (AM), which is responsible for coordinating the execution of the application and managing its resource requirements. The Application Master runs as a container within the YARN framework and communicates with the Resource Manager to request and manage resources.
The YARN framework also includes a pluggable scheduler, which allows for different scheduling algorithms to be used based on the specific requirements of each application. The default scheduler is the Capacity Scheduler, which is designed to support multiple tenants and enforce resource allocation quotas.
In conclusion, YARN framework’s best benefit is its ability to support a wide range of applications with varying resource requirements. This flexibility allows organizations to use Hadoop for a wider range of use cases, from batch processing to real-time data processing and machine learning.