Optimize Kubernetes Costs with Workload Recommendations

Updated 1 week ago by Archana Singh

One of the most impactful ways to reduce spend on Kubernetes infrastructure is to make sure your clusters are optimally sized for the workloads and node pools they run. Harness Cloud Cost Management (CCM) provides recommendations for your Kubernetes clusters. These recommendations show you resource optimization opportunities to potentially reduce your monthly spend.

The recommendations are computed by analyzing the past utilization of CPU and memory of your workload. The implementation uses a histogram method to compute the recommendations.

You can also tune recommendations by changing the Quality of Service (QoS) and the percentage of recommended CPU and memory requests/limits buffer.

This topic describes how CCM computes workload recommendations and how you can use them to potentially reduce monthly costs.

Before using recommendations in your cluster environment, ensure that you evaluate their impact thoroughly. The person reviewing the recommendations should be able to understand the impacts identified in the recommendations, as well as the impact on the infrastructure and business.

Using recommendations without proper assessment could result in unexpected changes, such as issues with system performance or poor reliability.

In this topic:

Before You Begin

Review: How are Workload Recommendations Computed?

In Harness CCM, the workload recommendations are computed by analyzing the past utilization of CPU and memory of your workload. The implementation uses a histogram method to compute the recommendations.

The computation adds a 15% buffer to the recommended resources by default. CCM also allows you to add any additional buffer using the Tune recommendations option.

When you enable Cost Visibility for your Kubernetes Cluster, the Delegate associated with your Connector starts collecting CPU and memory resource utilization metrics for every node and pod (including individual containers) present in the cluster every minute using a metrics server.  CCM relies on the Metrics Server and initializes recommendations after an initial data collection of 24 hours. The Metrics Server is queried by the controller every minute for utilization data.

The utilization data collected every minute is then aggregated in the Delegate for a 20-minute window. The 20-minute aggregated data is then sent to Harness:

  • CPU: For CPU values, a partial histogram for the last 20 minutes is sent. 
  • Memory: The maximum value of the memory is sent as a single data point.

We use this data for further processing and to compute the complete histogram displayed in the UI (partial histograms are merged for the CPU and memory histogram is computed using the data points).

Each of these daily histograms has an equal weightage for a given workload. As a result, if you select the last 30 days of data to aggregate, we will assign equal weightage to each of the 30 days.

You can choose to tune the recommendations by changing the Quality of Service (QoS) and the percentage of recommended CPU and memory requests/limits buffer. See Tune Recommendations.

You can also customize your recommendations by increasing or decreasing the request and limit coverage of the CPU and memory samples. Simply drag the slider of the number of samples to adjust the percentile. The slider indicates the percentile of all the CPU and memory samples that are covered to compute the resource recommendations. Based on your selection, the recommendations for your workload are made.

Why Histogram?

A histogram is used to account for the seasonality of high resource utilization on certain days of the week. Assume your application receives a lot of traffic (and thus a lot of resource utilization) on weekends and we're using a decaying histogram. In that case:

  • If you view workload recommendation on Friday and selected the last seven days of utilization data, then Saturday will be given the least weightage, followed by Sunday, hence the recommended resources will be low.
  • If you view the workload recommendation on Monday, Sunday will be given the most weightage, hence your recommended resources may be high.

To avoid this, we use the histogram method and give equal weight to all previous days.

Review: Types of Workload Recommendations

The recommendations are categorized as the following:

  • Cost Optimized
  • Performance Optimized

Cost Optimized

The cost-optimized recommendations are computed as the following:

  • The lower bound is based on the 50th percentiles of CPU samples and memory peaks.
  • There is no upper bound for CPU samples and memory peaks are based on the 95th percentile. It is recommended not to set the upper bound for the CPU samples, because the CPU is considered a compressible resource. If your application starts hitting the CPU limits, Kubernetes starts throttling the pods.
  • There is no limit set for the CPU. This is to avoid the throttling of the workloads when there is a spike in CPU usage. The usage of the CPU may go up in certain conditions, for example during the pod start. In such a scenario, the CPU can burst and get the required additional resources.
  • The potential monthly spend and savings are calculated based on the 90th percentiles of CPU samples and memory peaks.

Since the recommendations are computed using the 50th percentile of the CPU samples and memory peaks, this may potentially lead to system performance issues. Before using cost-optimized recommendations, ensure that you evaluate the recommendation's impact thoroughly.

Performance Optimized

The performance-optimized recommendations are computed using the 95th percentile of CPU samples and memory peaks. Because of this, the probability of having any effect on the performance is minimum. However, the cost may go high for the resources that are optimized using this method.

The potential monthly spend and savings are calculated based on the 90th percentiles of CPU samples and memory peaks.

Example

Let's try to understand how the recommendations are computed using the following example. The following example illustrates how the resources can be optimized using the cost-optimized recommendations:

Current Resources:
limits:
memory: 8Gi
cpu: 1
requests:
memory: 8Gi
cpu: 1

Recommended Resources based on utilization data for the last 7 days (QoS: Guaranteed, %of buffer: 0)

limits:
memory: 3.5Gi
cpu: 1.1
requests:
memory: 3.5Gi
cpu: 1.1

The current resources are provisioned using 8Gi memory and 1 CPU, the recommended resources require only 3.5Gi memory and 1.1m CPU for limits and requests both.

Step: View Recommendations

Once you enable CCM, it may take up to 48 hours for the recommendations to appear in Cloud Costs. It depends on the time at which CCM receives the utilization data for the workload.
  1. In Cloud Costs, click Recommendations.

    The recommendations page displays the following information:
    • A breakdown of all the available recommendations.
    • Potential Monthly Savings across your Kubernetes clusters if you apply the recommendations.
    • Forecasted Monthly Spend across your Kubernetes clusters if you do not apply the recommendations.
  2. The Recommendation Breakdown displays the following information:

    Monthly Savings

    Potential monthly savings for your resource, if you apply the recommendations.

    Resource Name

    Name of the resource for which CCM displays the recommendation.

    Resource Type

    Type of the resource for which the CCM has provided the recommendations. For example, node pool or workload.

    Monthly Cost

    Potential Monthly Savings for the resource, if you apply the recommendations.

    Recommendation Type

    Type of the recommendation for your resource. For example, rightsizing or resizing. Based on your resource type, CCM recommends rightsizing or resizing your CPU, memory, or node counts.

  3. Click the recommendation for which you want to view the details. You can use a filter to select the resource or recommendation for which you want to view the details.
    You can filter by:
    • Name: Each Kubernetes namespace in the cluster.
    • Resource Type: The type of resources for which the recommendation is displayed. Currently, CCM supports node pool and workload.
    • Namespace: Each Kubernetes namespace in the cluster.
    • Cluster Name: Each Kubernetes cluster in your infrastructure.
    • Monthly savings greater than: Filter by potential monthly savings greater than the specified amount. For example, all the recommendations with potential monthly savings of more than $1000.
    • Monthly cost greater than: Filter by forecasted monthly spend greater than the specified amount. For example, all the recommendations with forecasted monthly spend of more than $1000.
  4. The recommendation for the selected resource is displayed.
  5. Select the number of days to compute recommendations based on the utilization data. You can select the last 7 or 30 days.

You can use this information to optimize your resources to potentially reduce your monthly cloud costs.

Step: Tune Recommendations

Tune the recommendations by changing the Quality of Service (QoS) and the percentage of recommended CPU and memory requests/limits buffer.

QoS

You can set burstable or guaranteed QoS for recommended requests/limits. For more information on QoS, see Configure Quality of Service for Pods.

Burstable

A pod is assigned to a QoS class of burstable if:

  • The pod does not meet the criteria for QoS class Guaranteed.
  • At least one container in the pod has a memory or CPU request.

In this QoS class, resources can go beyond requests up to the limits. This recommended action does not affect other workloads and stays within bounds of usage. The scheduler uses the request to place the pod on a node, but the pod can use more resources up to the limit before it’s killed or throttled. For more information, see the QoS class of Burstable.

We do not recommend CPU limits in the case of burstable QoS.

Guaranteed

In this Quality of Service (QoS) class, resource requests and limits are set exactly to the same values which guarantees that the resources requested by the container will be available to it when it gets scheduled. This is considered as the ideal QoS class for the most stable Kubernetes clusters.

For a pod to be given a QoS class of guaranteed:

  • Every container in the pod must have a memory limit and a memory request, and they must be the same.
  • Every container in the pod must have a CPU limit and a CPU request, and they must be the same.

For more information, see the QoS class of Guaranteed.

To change the QoS, do the following:

  1. In the Recommendations details page, go to the Tune Recommendations section.
  2. In Quality Of Service(QoS), select Burstable or Guaranteed.
    Based on your selection, the recommendations for your resources are made.

Buffer

You can set the percentage of buffer for recommended requests/limits.

By default, the recommendation computation adds a 15% buffer to the recommended resources. This option allows you to add an additional buffer to your resources.

  1. In the Recommendations details page, go to the Tune Recommendations section.
  2. In % Buffer to be considered for CPU/Memory values, drag the slider to increase or decrease the percentage of the buffer. By default, it is set to 0.
    Depending on the percentage of the buffer that you set, the recommendations for your resources are made.

Step: View Workload Details

You can also view the following information:

  • Details of the pods associated with the workload
  • CPU and memory utilization of the workload

In the Recommendations details page, in Workload Details, click View more details.

The following details are displayed:

  1. Workload details: The following information is displayed:
    • Workload
    • Namespace
    • Workload Type
    • Cluster
  2. Cost details: The following cost details are displayed:
    • Total Cost: For Kubernetes clusters, the total cost is the sum of all the node costs. For ECS clusters, the sum of all container instances.
    • Idle Cost: Idle cost is the cost of idle resources (CPU and memory) allocated to a Kubernetes pod or an Amazon ECS task but is not utilized. For more information, see Idle Cost.
    • Utilized Cost: Utilized cost is the cost estimate for the utilized node or pod resources.
  3. Aggregation: The aggregated CPU and memory utilization value of your workload. The values are displayed as the following:
    • Time-weighted: This considers the active duration of pods when calculating the CPU and memory utilization.
    • Absolute: This simply aggregates the CPU and memory utilization values.

Let's try to understand how time-weighted and absolute values are calculated using the following examples:

Example 1

Let's assume you want to check the CPU requests of your workload between 3 a.m. and 4 a.m. Imagine there were two pods during that duration:

  1. Each pod requesting 0.4 CPU
  2. 1st pod was deleted at 3:53 a.m. So the 1st pod was active for 53 minutes in that duration
  3. 2nd pod was created at 3.53 a.m., so it was active for 7 mins in that duration

In the case of time-weighted, the utilization value is calculated as the following:

[(cpu request of pod 1) * (active time) + (cpu request of pod 2) * (active time)]/ total duration

which is equal to [(0.4*53) + (0.4*7)]/60 = 0.4

In the case of absolute, the utilization value is calculated as the following:

(cpu request of pod 1) + (cpu request of pod 2)

which is equal to 0.4 + 0.4 = 0.8

Example 2

Let's assume you want to check the CPU requests of three workloads in your cluster:

  1. Each workload requesting 0.4 CPU
  2. Workload 1 runs from 0-25 mins into the hour
  3. Workload 2 runs from 15-40 mins into the hour
  4. Workload 3 runs from 35-60 mins into the hour

In the case of time-weighted, the utilization value is calculated as the following:

[(cpu request of pod 1) * (active time) + (cpu request of pod 2) * (active time) + (cpu request of pod 3) * (active time)]/ total duration

which is equal to ((0.4*25) + (0.4*25) + (0.4*25))/60 = 0.5

In the case of absolute, the utilization value is calculated as the following:

(cpu request of pod 1) + (cpu request of pod 2) + (cpu request of pod 3)

which is equal to 0.4 + 0.4 + 0.4 = 1.2

Next Steps

See Also

The following blog post walks you through workload recommendations:

Cloud Cost Workload Recommendations


Please Provide Feedback