About Alauda Build of Kueue
TOC
1. Introduction to Alauda Build of Kueue
Alauda Build of Kueue is a kubernetes-native system that manages quotas and how jobs consume them. Alauda Build of Kueue decides when a job should wait, when a job should be admitted to start (as in pods can be created) and when a job should be preempted (as in active pods should be deleted).
Alauda Build of Kueue does not replace any existing components in a Kubernetes cluster, but instead integrates with the existing Kubernetes API server, scheduler, and cluster autoscaler components.
Alauda Build of Kueue supports all-or-nothing semantics. This means that either an entire job with all of its components is admitted to the cluster, or the entire job is rejected if it does not fit on the cluster.
2. Installing Alauda Build of Kueue
2.1. Downloading Cluster plugin
Alauda Build of Kueue cluster plugin can be retrieved from Customer Portal.
Please contact Consumer Support for more information.
2.2. Uploading the Cluster plugin
For more information on uploading the cluster plugin, please refer to
2.3. Installing Alauda Build of Kueue
- Go to the
Administrator->Marketplace->Cluster Pluginpage, switch to the target cluster, and then deploy theAlauda Build of KueueCluster plugin.
Note: Deploy form parameters can be kept as default or modified after knowing how to use them.
- Verify result. You can see the status of "Installed" in the UI or you can check the pod status:
2.4 Upgrading Alauda Build of Kueue
- Upload the new version for package of Alauda Build of Kueue plugin to ACP.
- Go to the
Administrator->Clusters->Target Cluster->Functional Componentspage, then click theUpgradebutton, and you will see theAlauda Build of Kueuecan be upgraded.
3. Setup RBAC
When you install Alauda Build of Kueue, the following set of ClusterRoles are created for the two main personas that we assume will interact with Kueue:
- kueue-batch-admin-role includes the permissions to manage ClusterQueues, Queues, Workloads, and ResourceFlavors.
- kueue-batch-user-role includes the permissions to manage Jobs and to view Queues and Workloads.
3.1. Giving permissions to a batch administrator
A batch administrator typically requires the kueue-batch-admin-role ClusterRole for all the namespaces.
To bind the kueue-batch-admin-role role to a batch administrator, represented by the user admin@cpaas.com, create a ClusterRoleBinding with a manifest similar to the following:
To create the ClusterRoleBinding, save the preceding manifest and run the following command:
3.2. Giving permissions to a batch user
A batch user typically requires permissions to:
- Create and view Jobs in their namespace.
- View the queues available in their namespace.
- View the status of their Workloads in their namespace.
To give these permissions to a user team-a-owner@cpaas.com for the namespace team-a, create a RoleBinding with a manifest similar to the following:
To create the RoleBinding, save the preceding manifest and run the following command:
4. Configuring quotas
As an administrator, you can use Alauda Build of Kueue to configure quotas to optimize resource allocation and system throughput for user workloads. You can configure quotas for compute resources such as CPU, memory, pods, and GPU.
You can configure quotas in Alauda Build of Kueue by completing the following steps:
- Configure a cluster queue.
- Configure a resource flavor.
- Configure a local queue.
- Users can then submit their workloads to the local queue.
4.1. Configuring a cluster queue
A cluster queue is a cluster-scoped resource, represented by a ClusterQueue object, that governs a pool of resources such as GPU, CPU, memory, and pods. Cluster queues can be used to define usage limits, quotas for resource flavors, order of consumption, and fair sharing rules.
Note: The cluster queue is not ready for use until a ResourceFlavor object has also been configured.
Prerequisites
- You have cluster administrator permissions or the
kueue-batch-admin-rolerole.
Procedure
-
Create a ClusterQueue object as a YAML file:
- Defines which namespaces can use the resources governed by this cluster queue. An empty
namespaceSelectoras shown in the example means that all namespaces can use these resources. - Defines the resource types governed by the cluster queue. This example
ClusterQueueobject governs CPU, memory, pod, and GPU resources. - Defines the resource flavor that is applied to the resource types listed. In this example, the
default-flavorresource flavor is applied to CPU, memory, pod, and GPU resources. - Defines the resource requirements for admitting jobs. This example cluster queue only admits jobs if the following conditions are met:
- The sum of the CPU requests is less than or equal to 9.
- The sum of the memory requests is less than or equal to 36Gi.
- The total number of pods is less than or equal to 5.
- The sum of the GPU tasks is less than or equal to 20 If you use the Alauda Build of Hami(Refer callout 5).
- The sum of the total GPU cores requests is less than or equal to 300 If you use the Alauda Build of Hami.
- The sum of the total GPU memory requests is less than or equal to 20480.
- The sum of the GPU requests is less than or equal to 100 If you use the Alauda Build of NVIDIA GPU Device Plugin.(Refer callout 7)
- Defines the resources requirements for Alauda Build of Hami. If you not use the Alauda Build of Hami, please delete it.
- Defines the resource flavor that is applied to the resource types listed. In this example, the
t4-flavorresource flavor is applied to Nvidia T4 GPU cards.If you don't want to configure quotas for specific card types, you can fill indefault-flavor. - Defines the resources requirements for Alauda Build of NVIDIA GPU Device Plugin. If you not use the Alauda Build of NVIDIA GPU Device Plugin, please delete it.
- Defines the resource flavor that is applied to the resource types listed. In this example, the
a30-flavorresource flavor is applied to Nvidia A30 GPU cards.If you don't want to configure quotas for specific card types, you can fill indefault-flavor.
- Defines which namespaces can use the resources governed by this cluster queue. An empty
-
Apply the
ClusterQueueobject by running the following command:
4.2. Configuring a resource flavor
After you have configured a ClusterQueue object, you can configure a ResourceFlavor object.
Resources in a cluster are typically not homogeneous. If the resources in your cluster are homogeneous, you can use an empty ResourceFlavor instead of adding labels to custom resource flavors.
You can use a custom ResourceFlavor object to represent different resource variations that are associated with cluster nodes through labels, taints, and tolerations. You can then associate workloads with specific node types to enable fine-grained resource management.
Prerequisites
- You have cluster administrator permissions or the kueue-batch-admin-role role.
Procedure
-
Create a ResourceFlavor object as a YAML file:
Example of an empty ResourceFlavor object
Example of a custom ResourceFlavor object for Nvidia Tesla T4 GPU
Example of a custom ResourceFlavor object for Nvidia A30 GPU
-
Apply the ResourceFlavor object by running the following command:
4.3. Configuring a local queue
A local queue is a namespaced object, represented by a LocalQueue object, that groups closely related workloads that belong to a single namespace.
As an administrator, you can configure a LocalQueue object to point to a cluster queue. This allocates resources from the cluster queue to workloads in the namespace specified in the LocalQueue object.
Prerequisites
- You have cluster administrator permissions or the kueue-batch-admin-role role.
- You have created a ClusterQueue object.
Procedure
-
Create a LocalQueue object as a YAML file:
Example of a basic LocalQueue object
-
Apply the LocalQueue object by running the following command:
4.4. Configuring a default local queue
As a cluster administrator, you can improve quota enforcement in your cluster by managing all jobs in selected namespaces without needing to explicitly label each job. You can do this by creating a default local queue.
A default local queue serves as the local queue for newly created jobs that do not have the kueue.x-k8s.io/queue-name label. After you create a default local queue, any new jobs created in the namespace without a kueue.x-k8s.io/queue-name label automatically update to have the kueue.x-k8s.io/queue-name: default label.
Prerequisites
- You have cluster administrator permissions or the
kueue-batch-admin-rolerole. - You have created a ClusterQueue object.
Procedure
-
Create a LocalQueue object named default as a YAML file:
Example of a default LocalQueue object
-
Apply the
LocalQueueobject by running the following command:
Verification
- Create a job in the same namespace as the default local queue.
- Observe that the job updates with the
kueue.x-k8s.io/queue-name: defaultlabel.
5. Monitoring pending workloads
Alauda Build of Kueue provides the VisibilityOnDemand feature to monitor pending workloads. A workload is an application that runs to completion. It can be composed by one or multiple pods that, loosely or tightly coupled, as a whole, complete a task. A workload is the unit of admission in Alauda Build of Kueue.
The VisibilityOnDemand feature provides the ability for batch administrators to monitor the pipeline of pending jobs in the cluster queue and the local queue and batch users just for local queue, and help users to estimate when their jobs will start.
You can regulate inbound requests and high request volumes, and provide user permissions for viewing the pending workloads.
5.1. API Priority and Fairness
Alauda Build of Kueue uses Kubernetes API Priority and Fairness (APF) To help manage pending workloads. APF is a flow control mechanism that allows you to define API-level policies to regulate inbound requests to the API server. It protects the API server from being overwhelmed by unexpectedly high request volume, while protecting critical traffic from the throttling effect on best-effort workloads.
5.2. Providing user permissions
You can configure role-based access control (RBAC) objects for the users of your Alauda Build of Kueue deployment. These objects determine which types of users can create which types of Alauda Build of Kueue objects.
You need to provide permissions to the users that require access to the specific APIs.
- If the user needs access to the pending workloads from the ClusterQueue resource, a ClusterRoleBinding schema needs to be created referencing the ClusterRole
kueue-batch-admin-role. - If the user needs access to the pending workloads from the LocalQueue resource, a RoleBinding schema needs to be created referencing the ClusterRole
kueue-batch-user-role.
5.3 Monitoring pending workloads on demand
To test the monitoring of pending workloads, you must correctly configure both the ClusterQueue and the LocalQueue resources. After that, you can create jobs on that LocalQueue. Kueue manages the workload object created from the job so, when a job is submitted and saturates the ClusterQueue, its corresponding workloads can be seen in the list of pending workloads.
Prerequisites
- You have cluster administrator permissions.
The following procedure tells you how to install and test workload monitoring.
Procedure
-
Create the assets by running the following command:
-
Create the following file with the job manifest:
-
Create the six jobs by running the following command:
5.3.1. Viewing pending workloads in ClusterQueue
To view all pending workloads at the cluster level, administrators can use the ClusterQueue object visibility endpoint of visibility API for Alauda Build of Kueue. This endpoint returns a list of all workloads currently waiting for admission by that ClusterQueue resource.
Procedure
-
To view pending workloads in
ClusterQueuerun the following command:You should get results similar to:
You can pass the following optional query parameters:
limit <integer>: 1000 is the default. Specifies the maximum number of pending workloads that should be fetched.offset <integer>: 0 is the default. Specifies the position of the first pending workload that should be fetched, starting from 0. -
To view only 1 pending workloads use, starting from position 1 in ClusterQueue run:
5.3.2. Viewing pending workloads in LocalQueue
To view the pending workloads submitted by a specific tenant within their namespace, users can query the LocalQueue resource visibility endpoint of visibility API for Alauda Build of Kueue. This provides an ordered list of their jobs waiting in that queue.
Procedure
-
To view pending workloads in LocalQueue run the following command:
You should get results similar to:
You can pass the following optional query parameters:
limit <integer>: 1000 is the default. Specifies the maximum number of pending workloads that should be fetched.offset <integer>: 0 is the default. Specifies the position of the first pending workload that should be fetched, starting from 0. -
To view only one pending workload starting from position 0 in LocalQueue run the following command:
6. Using cohorts
You can use cohorts to group cluster queues and determine which cluster queues are able to share borrowable resources with each other. Borrowable resources are defined as the unused nominal quota of all the cluster queues in a cohort.
Using cohorts can help to optimize resource utilization by preventing under-utilization and enabling fair sharing configurations. Cohorts can also help to simplify resource management and allocation between teams, because you can group cluster queues for related workloads or for each team. You can also use cohorts to set resource quotas at a group level to define the limits for resources that a group of cluster queues can consume.
6.1. Configuring cohorts within a cluster queue spec
You can add a cluster queue to a cohort by specifying the name of the cohort in the .spec.cohort field of the ClusterQueue object, as shown in the following example:
All cluster queues that have a matching spec.cohort are part of the same cohort.
If the spec.cohort field is omitted, the cluster queue does not belong to any cohort and cannot access borrowable resources.
7. Configuring fair sharing
Fair sharing is a preemption strategy that is used to achieve an equal or weighted share of borrowable resources between the tenants of a cohort. Borrowable resources are the unused nominal quota of all the cluster queues in a cohort.
7.1. Cluster queue weights
Share values are represented as the weight value in a ClusterQueue object. Share values are important because they allow administrators to prioritize specific job types or teams. Critical applications or high-priority teams can be configured with a weighted value so that they receive a proportionally larger share of the available resources. Configuring weights ensures that unused resources are distributed according to defined organizational or project priorities rather than on a first-come, first-served basis.
The weight value, or share value, defines a comparative advantage for the cluster queue when competing for borrowable resources. Generally, Alauda Build of Kueue admits jobs with a lower share value first. Jobs with a higher share value are more likely to be preempted before those with lower share values.
Example cluster queue with a fair sharing weight configured
If you don't set the weight, the default value is 1. A weight value of 0 represents an infinite share value. This means that the cluster queue is always at a disadvantage compared to others, and its workloads are always the first to be preempted when fair sharing is enabled.
8. Gang scheduling
The gang scheduling is timeout-based implementation of the All-or-nothing scheduling in Alauda Build of Kueue.
Gang scheduling ensures that a group or gang of related jobs only start when all required resources are available. Alauda Build of Kueue enables gang scheduling by suspending jobs until the Alauda Container Platform cluster can guarantee the capacity to start and execute all of the related jobs in the gang together.
Gang scheduling is important if you are working with expensive, limited resources, such as GPUs. Gang scheduling can prevent jobs from claiming but not using GPUs, which can improve GPU utilization and can reduce running costs. Gang scheduling can also help to prevent issues like resource segmentation and deadlocking.
8.1 Configuring gang scheduling
The gang scheduling is enabled by default. As a cluster administrator, you can update the timeout or disable the gang scheduling by modifying the deployment form params of the Alauda Build of Kueue cluster plugin.