Harness Chaos Engineering Basics (Public Preview)

Updated 1 month ago by Michael Cretzman

This topic covers Harness Chaos Engineering basics for the Harness Chaos Engineering Public Preview.

ChaosCenter

The ChaosCenter is the graphical interface for interacting with Harness Chaos Engineering.

The ChaosCenter is how you create Chaos Workflows, monitor experiments, and so on.

Dashboard

The dashboard displays the Workflows, Agents, and Projects for your account.

Projects

Projects are how Harness Chaos Engineering enables teams to collaborate. A Project could represent a business unit, division, or simply a development project for an app.

By default, a Harness Chaos Engineering User gets their own Project, but they can belong to other users/teams' projects also.

Agents

Agents are used to connect the Harness Chaos Engineering control pane with the Kubernetes clusters you want to target.

There are two types of agents, described below.

Self Agents

A Self Agent is installed in the same Kubernetes cluster that is hosting the Harness Chaos Engineering control plane.

When you install Harness Chaos Engineering into a Kubernetes cluster, a Self Agent is installed automatically.

External Agents

External Agents are installed in a Kubernetes cluster that is external to the cluster hosting the Harness Chaos Engineering installation you are viewing in your ChaosCenter.

External Agents let you run experiments (inject chaos) into external clusters without having to install Harness Chaos Engineering in each cluster. They let you manage all your experiments from one ChaosCenter.

External Agents are installed in clusters using the chaosctl CLI tool. The binary and instructions are located in the Agents > Connect the Agent page in ChaosCenter. The chaosctl tool is used to register the External Agent with your ChaosCenter.

Workflows

A chaos Workflow is a collection of one or more chaos experiments ordered to achieve a desired chaos impact on the resources in a Kubernetes Cluster.

You can add as many experiments as needed and reorder their sequence. Workflows can run experiments in serial or parallel to achieve a desired chaos injection scenario.

Workflows use Agents to run an experiment in a target cluster.

Technically speaking, a Workflow is a GitOps Argo manifest containing the steps for the experiment.

Workflow Types

You can create Workflows in different ways:

  • Predefined Workflow templates: Harness Chaos Engineering includes predefined Workflow templates to get you started.
  • Create a new Workflow by cloning an existing Workflow: Workflows can be templated and then reused with this option.
  • Create a new Workflow using experiments from ChaosHubs.
  • Import a Workflow using YAML: you can upload a workflow manifest for your Workflow.

Tuning and Resilience Score

You can tune Workflows to edit the sequence of their steps and add Probes to test hypotheses. You can tune Workflows by editing them visually or using YAML in ChaosCenter.

Tuning

You can tune each experiment in a Workflow using environment variables.

For example, you can specify the chaos duration and interval. You can also specify what percentage of pods are used.

Resiliency Score

The successful outcome of each test in a Workflow carries a certain weight. Weights are used to calculate the Resiliency Score at the end of test. You can adjust the weights of the experiments in the Workflow.

A Resiliency Score is the measure of how resilient your Workflow is considering all the chaos experiments and their individual result points.

For details on the concept of Resilience Score, see How the Resilience Score Algorithm works in Litmus.

Editing Workflows

Experiments can be changed when they are added to a Workflow.

For example, when an experiment is used in Workflow A it might target a different namespace, Kubernetes Kind, or label than when it is used in Workflow B.

Probes

Probes are editable checks you can define for any chaos experiment.

For example, let's say you have an experiment that deletes pods in a cluster that hosts a Web interface microservice. You can set an HTTP probe to check that users can still access the Web interface during the chaos experiment. The probe could perform GET requests on the Web interface URL continuously and verify that they return HTTP 200 status codes.

A Probe's success impacts the verdict of the experiment. If a Probe fails, it impacts the success of the experiment.

Types of Probes

There are several types of probes:

  • HTTP: query for the health/downstream of URIs.
  • CMD: To execute any health check function implemented as a shell command.
  • Kubernetes: To perform CRUD operations against native and custom Kubernetes resources.
  • Prom: To execute Prometheus query (promql) and match Prometheus metrics for specific criteria.

These probes can be used in isolation or in several combinations to achieve the desired checks.

Probe Modes

Probes can be executed in the following modes:

  • SOT: executed at the start of a test as a pre-chaos check.
  • EOT: executed at the end of a test as a post-chaos check.
  • Edge: executed before and after the chaos.
  • Continuous: executed continuously, with a specified polling interval during the chaos injection.
  • OnChaos: executed continuously, with a specified polling interval strictly for chaos duration of chaos.

Scheduling

You can set Workflows to run on a schedule (hour, day, month, etc).

ChaosHubs

ChaosHubs links to online marketplaces for different chaos experiments. These can be open source as well as closed source experiments. They can be added to ChaosCenter from remote Git repos and remote ChaosHubs.

ChaosHubs contain charts that define different experiments. Each chart has manifests that contain the information for the experiment.

Litmus ChaosHub

As you can see when you click ChaosHubs, the default hub links to the repo https://github.com/litmuschaos/chaos-charts.

This repo is used for the Litmus ChaosHub website hub.litmuschaos.io. Litmus ChaosHub is an open source marketplace hosting all the different chaos experiments offered by Litmus.

From ChaosCenter, you can use all of the experiments in Litmus ChaosHub.

Additional ChaosHubs

You can connect additional hubs to ChaosCenter ChaosHubs by clicking Connect New Hub.

Typically, you fork the Litmus ChaosHub, add your own experiment charts, and then add the new hub in your ChaosCenter ChaosHubs. Also, you can make your forked repo private before adding it as a ChaosHub.

Analytics

Analytics tells you how well your chaos engineering Workflows are working: which ones have succeeded, failed, and are running. It displays execution statistics over time and totals.

You can drill down and see the statistics for each Workflow.

Workflow Comparison

Analytics also lets you compare the resilience scores of multiple Workflows.

Next Steps

Give Harness Chaos Engineering a try using the Harness Chaos Engineering Quickstart (Public Preview).


Please Provide Feedback