Azure Chaos Studio is a managed chaos engineering service that allows you to introduce controlled faults into your Azure environment to validate resilience and robustness. In this post, you’ll learn how to use Chaos Studio to run experiments on Azure Kubernetes Service (AKS) clusters.


Prerequisites

  • An existing AKS cluster
  • Azure CLI installed and logged in
  • kubectl configured
  • AKS cluster must be running in a supported region
  • Chaos Studio enabled in your subscription

Register Required Resource Providers

az provider register --namespace Microsoft.Chaos
az provider register --namespace Microsoft.ContainerService

Enable Chaos Studio on Your AKS Cluster

az chaos target create --resource-id $(az aks show -g <resource-group> -n <aks-name> --query id -o tsv) \
  --target-type Microsoft-AzureKubernetesServiceCluster \
  --location <region>

You can also enable Chaos Studio from the Azure Portal under your AKS resource > Chaos Studio > Enable.


Create a Chaos Experiment

You can define a chaos experiment using an ARM template or with the Azure CLI.

Example experiment (restart AKS pods):

az chaos experiment create --name aks-restart-test \
  --resource-group <resource-group> \
  --location <region> \
  --identity-type SystemAssigned \
  --selectors selector-id="mySelector",type="List",targets="aks-resource-id" \
  --steps "name=restart-pods,actions='type=continuous-reboot'" \
  --duration PT5M

Or define it as a JSON file and apply it using:

az chaos experiment create --from-file aks-chaos-experiment.json

Run the Experiment

az chaos experiment start --name aks-restart-test --resource-group <resource-group>

You can monitor the experiment progress in the Azure Portal or via:

az chaos experiment show --name aks-restart-test --resource-group <resource-group>

Chaos engineering is not about breaking things — it’s about learning how your system breaks, and building confidence in its ability to recover. Start small, iterate, and build trust in your architecture.