Harness Chaos Engineering Quickstart (Public Preview)

Updated 2 weeks ago by Michael Cretzman

This quickstart shows you how to perform a chaos experiment on an application in a Kubernetes cluster. We use a predefined Workflow template that is hosted on the open source Litmus ChaosHub.

For the public preview, you have two options for using Harness Chaos Engineering:

  • On-Prem: Harness Chaos Engineering is installed on-premise with a license from Harness. You can request a license from the Harness Chaos Engineering sign up page.
  • SaaS: You can also use the free SaaS version of Harness Chaos Engineering at https://cloud.chaosnative.com/signin. No license is required, but you will need to create an access key, download it, and use it to connect your Kubernetes cluster or namespace to Harness Chaos Engineering Cloud.

Once you have Harness Chaos Engineering running in your environment, this quickstart will take 5 minutes to complete.

Harness Chaos Engineering Free Version: The installation steps in this quickstart walk you through setting up Harness Chaos Engineering in your own environment (on-premise). You can also use the free SaaS version of Harness Chaos Engineering at https://cloud.chaosnative.com/signin. The free version is limited to 2 Agents and can run up to a maximum of 60 Workflows per month. If you use the free SaaS version, simply jump to Step 2.

Objectives

You'll learn how to:

  1. Install Harness Chaos Engineering in a Kubernetes cluster.
  2. Create a Harness Chaos Engineering Workflow to run a real world chaos experiment using the predefined podtato-head chaos workflow.
    Podtato-head is a prototypical cloud-native application built to colorfully demonstrate delivery scenarios using many different tools and services.
  3. Run a pod-delete fault experiment as part of the Workflow.
  4. Review how Probes can be added to experiments to test hypotheses.
  5. Analyze the chaos experiment results.

Before You Begin

Prerequisites

  • Kubernetes cluster:
    • A general purpose compute engine like GCP e2-standard-2 is sufficient.
    • Kubernetes >= 1.17.
  • Helm 3 installed in the cluster. See Installing Helm from Helm.

Step 1: Install Harness Chaos Engineering

Harness Chaos Engineering Free Version: The following steps walk you through setting up Harness Chaos Engineering in your own environment (on-premise). You can also use the free SaaS version of Harness Chaos Engineering at https://cloud.chaosnative.com/signin. The free version is limited to 2 Agents and can run up to a maximum of 60 Workflows per month. If you use the free SaaS version, simply jump to Step 2.

You can install Harness Chaos Engineering using Helm or Kubernetes. Both methods are described below.

The following steps are covered in the public Harness Chaos Engineering GitHub repo https://github.com/chaosnative/hce-charts.
Install using Helm

Add the Harness Helm repository:

helm repo add harness https://hce.chaosnative.com

View the repo:

helm repo list

Output:

NAME     URL
harness https://hce.chaosnative.com

Update the Harness Helm repo:

helm repo update

Output:

Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "harness" chart repository
Update Complete. ⎈Happy Helming!⎈

Install Harness Chaos Engineering ChaosCenter:

helm install -n litmus hce harness/hce --create-namespace

Output:

NAME: hce
LAST DEPLOYED: Fri Mar 25 01:43:14 2022
NAMESPACE: litmus
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing hce 😀

Your release is named hce and it's installed to namespace: litmus.

Visit https://harness.io to find more info.

Install using kubectl

Install Harness Chaos Engineering with cluster-admin permissions in the litmus namespace by default.

kubectl apply -f https://hce.chaosnative.com/manifests/2.8.0/hce-cluster-scope.yaml

Or, to scope to a namespace (when you do not have access to the entire cluster), replace litmus with the desired namespace:

kubectl create ns litmus
kubectl apply -f https://hce.chaosnative.com/manifests/2.8.0/hce-crds.yaml
kubectl apply -f https://hce.chaosnative.com/manifests/2.8.0/hce-namespace.yaml -n litmus

Output:

clusterrole.rbac.authorization.k8s.io/argo-cr-for-litmusportal-server created
clusterrolebinding.rbac.authorization.k8s.io/argo-crb-for-litmusportal-server created
clusterrole.rbac.authorization.k8s.io/litmus-cluster-scope-for-litmusportal-server created
clusterrolebinding.rbac.authorization.k8s.io/litmus-cluster-scope-crb-for-litmusportal-server created
clusterrole.rbac.authorization.k8s.io/litmus-admin-cr-for-litmusportal-server created
clusterrolebinding.rbac.authorization.k8s.io/litmus-admin-crb-for-litmusportal-server created
clusterrole.rbac.authorization.k8s.io/chaos-cr-for-litmusportal-server created
clusterrolebinding.rbac.authorization.k8s.io/chaos-crb-for-litmusportal-server created
clusterrole.rbac.authorization.k8s.io/subscriber-cr-for-litmusportal-server created
clusterrolebinding.rbac.authorization.k8s.io/subscriber-crb-for-litmusportal-server created
clusterrole.rbac.authorization.k8s.io/event-tracker-cr-for-litmusportal-server created
clusterrolebinding.rbac.authorization.k8s.io/event-tracker-crb-for-litmusportal-server created
clusterrole.rbac.authorization.k8s.io/litmus-server-cr created
clusterrolebinding.rbac.authorization.k8s.io/litmus-server-crb created
namespace/litmus created
serviceaccount/litmus-server-account created
secret/litmus-portal-admin-secret created
secret/regcred created
configmap/litmus-portal-admin-config created
configmap/litmusportal-frontend-nginx-configuration created
deployment.apps/license-server created
service/license-service created
deployment.apps/litmusportal-frontend created
service/litmusportal-frontend-service created
deployment.apps/litmusportal-server created
service/litmusportal-server-service created
deployment.apps/litmusportal-auth-server created
service/litmusportal-auth-server-service created
statefulset.apps/mongo created
service/mongo-service created

Accessing ChaosCenter

Using a NodePort

To setup and login to ChaosCenter, expand the list of new services and copy the PORT of the litmusportal-frontend-service service:

kubectl get svc -n litmus

Output example:

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                         AGE
chaos-litmus-portal-mongo ClusterIP 10.104.107.117 <none> 27017/TCP 2m
litmusportal-frontend-service NodePort 10.101.81.70 <none> 9091:30385/TCP 2m
litmusportal-server-service NodePort 10.108.151.79 <none> 9002:32456/TCP,9003:31160/TCP 2m

In this case, the PORT for litmusportal-frontend-service is 30385. Yours will be different.

Once you have the PORT copied in your clipboard, simply use your IP and PORT in this manner <NODEIP>:<PORT> to access ChaosCenter.

For example:

http://172.17.0.3:30385/

Where 172.17.0.3 is the NodePort IP and 30385 is the frontend service PORT.

Using a LoadBalancer

To setup and login to ChaosCenter with LoadBalancer, patch the Frontend Service hce-frontend-service and expose the External IP.

kubectl patch svc litmusportal-frontend-service -p '{"spec": {"type": "LoadBalancer"}}' -n litmus

View the LoadBalancer IP (this can take a few minutes):

kubectl get svc -n litmus

Output:

NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                         AGE
hce-frontend-service LoadBalancer 10.96.3.43 34.71.48.119 9091:32576/TCP 175m
hce-headless-mongo ClusterIP None <none> 27017/TCP 175m
hce-license-service ClusterIP 10.96.3.83 <none> 80/TCP 175m
hce-mongo ClusterIP 10.96.10.223 <none> 27017/TCP 175m
hce-server-service NodePort 10.96.1.137 <none> 9002:30066/TCP,9003:32468/TCP 175m

The EXTERNAL-IP might say pending for a few minutes while the load balancer is set up.

Simply use your EXTERNAL-IP and PORT in this manner <EXTERNAL-IP>:<PORT> to access ChaosCenter.

http://34.71.48.119:9091/

This is an example IP and port. Yours will be different.

ChaosCenter is displayed in your browser:

Log in with the default credentials:

  • Username: admin
  • Password: litmus

Once you log in, you'll be asked to set a new password.

Next, upload your license file and click Activate License.

You can request a license from the Harness Chaos Engineering sign up page.

Drag and drop your license into Upload License File and click Activate License.

You're ready to go.

Step 2: Schedule a New Workflow

In Harness Chaos Engineering ChaosCenter, click Workflows, and then click Schedule a Workflow.

The Workflow wizard appears.

Step 3: Choose an Agent

Agents are used to connect Harness Chaos Engineering control plane with the Kubernetes clusters you want to target for chaos injection.

A Self-Agent is an Agent running on the same cluster as Harness Chaos Engineering and is installed with the Harness Chaos Engineering control plane.

Select Self-Agent and click Next.

If the Self-Agent does not appear, the related Kubernetes workload might not be in an active state. Wait a few minutes for it to reach steady state. If, after a while, it is still inactive, ensure that ingress to the litmusportal-server-service NodePort is not restricted and a firewall exception is present for that NodePort, if applicable.

Step 4: Choose a Workflow

A chaos Workflow is a collection of one or more chaos experiments ordered to achieve a desired chaos impact on the resources in a Kubernetes Cluster.

There are many different ways to create and execute a Workflow.

For this quickstart, we'll use the predefined Workflow template hosted on the open source Litmus ChaosHub, named podtato-head.

The podtato-head is intended for illustrating the capabilities of Harness Chaos Engineering and not meant to target real-life applications.

Click Create a new Workflow from one of the pre-defined chaos Workflow templates. The ChaosHubs appear.

Select Litmus ChaosHub. The podtato-head appears.

Select podtato-head and then click Next.

In Workflow Settings, you can enter the Workflow name and the target namespace in the cluster. This predefined Workflow template runs in the litmus namespace.

The Description setting is populated with the default description of the Workflow, but you can edit the description. For example, if you will be editing the sequence of the Workflow, you might want to add that information to the description.

Click Next.

Step 5: Tune the Workflow

You can see the steps of the Workflow are already defined. Double-click on the diagram to zoom in.

Here are the steps:

  1. install-application: installs the podtato-head application (it is removed later by delete-application).
  2. install-chaos-experiments: installs the chaos experiments in the Workflow that comprise pod-delete.
  3. pod-delete: this experiment deletes target pods for a given interval. The recovery process of the deleted pods is taken care of by Kubernetes.
  4. revert-chaos: deletes the resources created during Workflow execution. This can include Workflow pods, experiment pods, chaos runner pod, etc.
  5. delete-application: deletes the podtato-head application.

You can edit the sequence of an experiment by clicking the Edit Sequence button, but we will not be editing the sequence in this quickstart.

You can tune each experiment in a Workflow. This quickstart will not tune the experiment, but feel free to explore the tuning settings by clicking the edit button.

You can then walk through the different options.

In Target Application, in appns, you can see the target namespace for the experiment. In this Workflow, the target namespace is {{workflow.parameters.adminModeNamespace}}. The application isn't installed yet, so this runtime variable will resolve to the actual namespace where the application is installed.

In Define the steady state for this application, you can see the Probe that is defined for this experiment. This Probe tests a steady-state hypothesis to see whether the podtato-head website is available during the execution of the experiment.

Click Show Details for a quick look at the Probe.

In the details, you can see valuable information:

  • url: the application URL that you can use to view the application when the experiment is running
  • method: this is the criteria Harness will evaluate continuously during the experiment (see Continuous under Mode). In this case, HTTP GET requests are made to the URL and Harness Chaos Engineering verifies if it received an HTTP 200 response.

Close the experiment tuning settings, and then click Next.

Step 6: Adjust the Resiliency Score

In Resiliency Score, you can see the weights for the experiments.

A Resiliency Score is the measure of how resilient your Workflow is considering all the chaos experiments and their individual result points.

The successful outcome of each test in a Workflow carries a certain weight. Weights are used to calculate the Resiliency Score at the end of the test. You can adjust the weights of the experiments in the Workflow by dragging the slider.

For this quickstart, just keep the default Resiliency Score.

Click Next.

Step 7: Choose a Schedule

You can schedule the Workflow to run manually or on a recurring schedule.

Click Schedule now, and click Next.

Step 8: Run the Workflow and View Results

Chaos Center shows a summary of the Workflow.

You can edit any of the settings. For this quickstart, we'll leave the defaults.

Click Finish.

Congratulations! You've successfully scheduled your first Workflow with Harness Chaos Engineering. Now let's take a look at the running Workflow and results of the experiment.

Click the Go to Workflow link to see the Workflow in action, or click Workflows in the navigation.

You'll see the Workflow running.

Click the name of the Workflow to see its details.

Click any step in the Workflow to see its details and logs.

In the Graph View or Table View, click View Logs & Results for the pod-delete step.

For pod-delete, click Chaos Results.

You can see the success of the chaos experiment.

Chaos Experiment Result:

Experiment Verdict: Pass

Phase: Completed

Fail Step: N/A

Chaos Engine Result:

Passed Runs: 1

Failed Runs: 0

Stopped Runs: 0

Click Probe Result to view the success/failure of the steady-state hypothesis constraint (podtato-head website availability through pod deletion period) and the experiment verdict:

Probe Success Percentage: 100%

Probe Status:
Name: check-podtato-main-access-url

Type: httpProbe

Continuous: "Passed 👍 "

Step 9: View the Analytics

Click Analytics.

Analytics tells you how well your chaos engineering Workflows are working: which have succeeded, failed, and are running. It displays run statistics over time and totals.

In Workflow Comparison, you can compare the results of multiple Workflows.

If this is the first time a Workflow has been run on this installation of Harness Chaos Engineering, you will only see one entry.

For the podtato-head-xx Workflow, click Statistics. You can see all of the stats for the Workflow.

You can also jump to the statistics by selecting Show the statistics from the Workflows page.

Clean Up

Kubernetes

Uninstall Harness Chaos Enterprise from the cluster:

kubectl delete namespaces litmus

Output:

namespace "litmus" deleted

Helm

Uninstall Harness Chaos Enterprise:

helm uninstall hce --namespace=litmus

Output:

release "hce" uninstalled

Summary

In this quickstart, you learned how to:

  1. Create a Harness Chaos Engineering Workflow to run a real-world chaos experiment.
  2. Run a pod-delete fault experiment as part of the Workflow.
  3. Analyze the chaos experiment results.

Next Steps

Explore more public experiment workflows from Litmus ChaosHub.

Learn more about Litmus experiments.


Please Provide Feedback