Many pod scenarios now support the exclude_label parameter to protect critical pods while testing others. See individual scenario pages (Pod Failures, Pod Network Chaos) for details.

Supported Chaos Scenarios

Pod & Container Disruptions

Pod Failures

pod_disruption_scenarios

Injects pod failures to test application resilience and recovery mechanisms

Cloud Agnostic

Container Failures

container_scenarios

Injects container failures based on the provided kill signal

Cloud Agnostic

KubeVirt VM Outage

kubevirt_vm_outage

Simulates VM-level disruptions by deleting Virtual Machine Instances to test resilience and recovery

Cloud Agnostic

Node & Cluster Failures

Node Failures

node_scenarios

Injects node failures through OpenShift/Kubernetes and cloud APIs

Alibaba AWS Azure BareMetal Docker/Podman (kind) IBM Cloud IBM Power GCP OpenStack VMWare

Power Outages

cluster_shut_down_scenarios

Shuts down the cluster for a specified duration and verifies cluster health upon restart

Alibaba AWS Azure BareMetal Docker/Podman (kind) IBM Cloud IBM Power GCP OpenStack VMWare

Zone Outages

zone_outages_scenarios

Creates zone outages to observe impact on cluster availability and application resilience

AWS GCP

Node CPU Hog

hog_scenarios

Hogs CPU resources on targeted nodes to test resource contention

Cloud Agnostic

Node Memory Hog

hog_scenarios

Hogs memory resources on targeted nodes to test memory pressure handling

Cloud Agnostic

Node IO Hog

hog_scenarios

Hogs IO resources on targeted nodes to test disk performance degradation

Cloud Agnostic

Network Disruptions

Network Chaos

network_chaos_scenarios

Introduces network latency, packet loss, and bandwidth restriction using tc and Netem

Cloud Agnostic

Pod Network Chaos

pod_network_scenarios

Introduces network chaos at pod level including latency, packet loss, and bandwidth restriction

Cloud Agnostic

Network Chaos NG

network_chaos_ng_scenarios

Next-generation network filtering scenarios with improved infrastructure

Cloud Agnostic

DNS Outages

network_chaos_ng_scenarios

Blocks all outgoing DNS traffic from pods, preventing hostname resolution

Cloud Agnostic

ETCD Split Brain

network_chaos_ng_scenarios

Isolates etcd nodes to force leader re-election and test cluster resilience

Cloud Agnostic

Aurora Disruption

network_chaos_ng_scenarios

Blocks MySQL and PostgreSQL traffic to AWS Aurora database engines

AWS

EFS Disruption

network_chaos_ng_scenarios

Blocks connections to AWS EFS, causing temporary failure of mounted volumes

AWS

Application & Service Disruptions

Application Outages

application_outages_scenarios

Isolates application Ingress/Egress traffic to test dependency handling and recovery timing

Cloud Agnostic

Service Disruption

service_disruption_scenarios

Deletes all objects within a namespace to test service recovery and data resilience

Cloud Agnostic

Service Hijacking

service_hijacking_scenarios

Hijacks service HTTP traffic to simulate custom responses and test client error handling

Cloud Agnostic

Syn Flood

syn_flood_scenarios

Generates substantial TCP traffic directed at Kubernetes services to test DDoS resilience

Cloud Agnostic

HTTP Load

http_load_scenarios

Generates distributed HTTP load against target endpoints using Vegeta load testing pods deployed inside the cluster

Cloud Agnostic

Storage & Data Disruptions

PVC Disk Fill

pvc_scenarios

Fills up PersistentVolumeClaims to test disk space exhaustion handling

Cloud Agnostic

System & Time Disruptions

Time Skew

time_scenarios

Skews system time and date to test time-sensitive applications and certificate handling

Cloud Agnostic

1 - Krkn-Hub All Scenarios Variables

These variables are to be used for the top level configuration template that are shared by all the scenarios in Krkn-hub.

Each section below corresponds to a section in the Krkn config reference. Set variables on the host running the container:

export <parameter_name>=<value>

Kraken

Signal and status publishing settings. See Kraken config for full details.

Parameter	Description	Default
`PUBLISH_KRAKEN_STATUS`	Publish kraken status to the signal address	True
`SIGNAL_ADDRESS`	Address to publish kraken status to	0.0.0.0
`PORT`	Port to publish kraken status to	8081
`SIGNAL_STATE`	Waits for the RUN signal when set to PAUSE before running the scenarios, refer docs for more details	RUN

Cerberus

Cluster health monitoring integration. See Cerberus config for full details.

Parameter	Description	Default
`CERBERUS_ENABLED`	Set this to true if cerberus is running and monitoring the cluster	False
`CERBERUS_URL`	URL to poll for the go/no-go signal	http://0.0.0.0:8080

Performance Monitoring

Prometheus metrics collection and alert evaluation. See Performance Monitoring config for full details.

Parameter	Description	Default
`DEPLOY_DASHBOARDS`	Deploys mutable grafana loaded with dashboards visualizing performance metrics pulled from in-cluster prometheus. The dashboard will be exposed as a route.	False
`CAPTURE_METRICS`	Captures metrics as specified in the profile from in-cluster prometheus. Default metrics captures are listed here	False
`ENABLE_ALERTS`	Evaluates expressions from in-cluster prometheus and exits 0 or 1 based on the severity set. Default profile.	False
`ALERTS_PATH`	Path to the alerts file to use when ENABLE_ALERTS is set	config/alerts
`CHECK_CRITICAL_ALERTS`	When enabled will check prometheus for critical alerts firing post chaos	False

Resiliency Score

Resiliency scoring configuration. See Resiliency Score config for full details.

Parameter	Description	Default
`RESILIENCY_RUN_MODE`	Resiliency scoring mode: `standalone` embeds score in telemetry, `detailed` prints JSON report to stdout, `disabled` turns off scoring	standalone
`RESILIENCY_FILE`	Path to a YAML file containing SLO definitions; defaults to the alerts profile or `config/alerts.yaml`	config/alerts.yaml

Elastic

Elasticsearch storage for telemetry and metrics. See Elastic config for full details.

Parameter	Description	Default
`ELASTIC_SERVER`	URL of the Elasticsearch instance to store telemetry data	blank
`ELASTIC_INDEX`	Elasticsearch index pattern to post results to	blank

Tunings

Execution timing and iteration controls. See Tunings config for full details.

Parameter	Description	Default
`WAIT_DURATION`	Duration in seconds to wait between each chaos scenario	60
`ITERATIONS`	Number of times to execute the scenarios	1
`DAEMON_MODE`	Iterations are set to infinity which means that the kraken will cause chaos forever	False

Telemetry

Run data collection and upload settings. See Telemetry config for full details.

Parameter	Description	Default
`TELEMETRY_ENABLED`	Enable/disables the telemetry collection feature	False
`TELEMETRY_API_URL`	Telemetry service endpoint	https://ulnmf9xv7j.execute-api.us-west-2.amazonaws.com/production
`TELEMETRY_USERNAME`	Telemetry service username	redhat-chaos
`TELEMETRY_PASSWORD`	Telemetry service password	No default
`TELEMETRY_PROMETHEUS_BACKUP`	Enables/disables prometheus data collection	True
`TELEMTRY_FULL_PROMETHEUS_BACKUP`	If set to False only the /prometheus/wal folder will be downloaded	False
`TELEMETRY_BACKUP_THREADS`	Number of telemetry download/upload threads	5
`TELEMETRY_ARCHIVE_PATH`	Local path where the archive files will be temporarily stored	/tmp
`TELEMETRY_MAX_RETRIES`	Maximum number of upload retries (if 0 will retry forever)	0
`TELEMETRY_RUN_TAG`	If set, this will be appended to the run folder in the bucket (useful to group the runs)	chaos
`TELEMETRY_GROUP`	If set will archive the telemetry in the S3 bucket on a folder named after the value	default
`TELEMETRY_ARCHIVE_SIZE`	The size of the prometheus data archive in KB	1000
`TELEMETRY_LOGS_BACKUP`	Logs backup to S3	False
`TELEMETRY_FILTER_PATTER`	Filter logs based on certain timestamp patterns	`["(\\w{3}\\s\\d{1,2}\\s\\d{2}:\\d{2}:\\d{2}\\.\\d+).+", ...]`
`TELEMETRY_CLI_PATH`	OC CLI path, if not specified will be searched in $PATH	blank

Note

For setting the TELEMETRY_ARCHIVE_SIZE, the lower the value the higher the number of archive files produced and uploaded (processed by TELEMETRY_BACKUP_THREADS simultaneously). For unstable or slow connections, keep this value low and increase TELEMETRY_BACKUP_THREADS so that on upload failure only the failed chunk is retried.

Health Checks

Application endpoint monitoring during chaos. See Health Checks config for full details.

Parameter	Description	Default
`HEALTH_CHECK_URL`	URL to continually check and detect downtimes	blank
`HEALTH_CHECK_INTERVAL`	Interval in seconds at which to run health checks	2
`HEALTH_CHECK_BEARER_TOKEN`	Bearer token used for authenticating into health check URL	blank
`HEALTH_CHECK_AUTH`	Tuple of (username, password) used for authenticating into health check URL	blank
`HEALTH_CHECK_EXIT_ON_FAILURE`	If True, exits when health check fails for application	blank
`HEALTH_CHECK_VERIFY`	Health check URL SSL validation	False

Virt Checks

KubeVirt VMI SSH connection monitoring during chaos. See Virt Checks config for full details.

Parameter	Description	Default
`KUBE_VIRT_CHECK_INTERVAL`	Interval in seconds at which to test kubevirt connections	2
`KUBE_VIRT_NAMESPACE`	Namespace to find VMIs in and watch	blank
`KUBE_VIRT_NAME`	Regex style name to match VMIs to watch	blank
`KUBE_VIRT_FAILURES`	If True, will only report when ssh connections fail to VMI	blank
`KUBE_VIRT_DISCONNECTED`	Use disconnected check by passing cluster API	False
`KUBE_VIRT_NODE_NAME`	If set, will filter VMs to only track ones running on the specified node	blank
`KUBE_VIRT_EXIT_ON_FAIL`	Fails run if VMs still have false status at end of run	False
`KUBE_VIRT_SSH_NODE`	If set, will be a backup way to SSH to a node. Should be a node not targeted in chaos	blank

2 - Krknctl All Scenarios Variables

These variables are to be used for the top level configuration template that are shared by all the scenarios in Krknctl.

Each section below corresponds to a section in the Krkn config reference. Pass flags when running a scenario:

krknctl run <scenario> --<parameter> <value>

Kraken

General run settings. See Kraken config for full details.

Parameter	Description	Type	Possible Values	Default
`--krkn-kubeconfig`	Sets the path where krkn will search for kubeconfig in container	string	-	/home/krkn/.kube/config
`--uuid`	Sets krkn run uuid instead of generating it	string	-	-
`--krkn-debug`	Enables debug mode for Krkn	enum	True/False	False

Cerberus

Cluster health monitoring integration. See Cerberus config for full details.

Parameter	Description	Type	Possible Values	Default
`--cerberus-enabled`	Enables Cerberus Support	enum	True/False	False
`--cerberus-url`	Cerberus http url	string	-	http://0.0.0.0:8080

Performance Monitoring

Prometheus metrics collection and alert evaluation. See Performance Monitoring config for full details.

Parameter	Description	Type	Possible Values	Default
`--capture-metrics`	Enables metrics capture	enum	True/False	False
`--enable-alerts`	Enables cluster alerts check	enum	True/False	False
`--alerts-path`	Allows to specify a different alert file path	string	-	config/alerts.yaml
`--metrics-path`	Allows to specify a different metrics file path	string	-	config/metrics-aggregated.yaml
`--check-critical-alerts`	Enables checking for critical alerts	enum	True/False	False

Resiliency Score

Resiliency scoring configuration. See Resiliency Score config for full details.

Parameter	Description	Type	Possible Values	Default
`--resiliency-score`	Enables resiliency scoring in detailed mode, outputting a full JSON resiliency report to stdout after each scenario	enum	True/False	False
`--disable-resiliency-score`	Disables resiliency score calculation entirely	enum	True/False	False
`--resiliency-file`	Path to a YAML file containing SLO definitions for resiliency scoring; defaults to the alerts profile or `config/alerts.yaml`	string	-	config/alerts.yaml

Elastic

Elasticsearch storage for telemetry and metrics. See Elastic config for full details.

Parameter	Description	Type	Possible Values	Default
`--enable-es`	Enables elastic search data collection	enum	True/False	False
`--es-server`	Elasticsearch instance URL	string	-	http://0.0.0.0
`--es-port`	Elasticsearch instance port	number	-	443
`--es-username`	Elasticsearch instance username	string	-	elastic
`--es-password`	Elasticsearch instance password	string	-	-
`--es-verify-certs`	Enables elasticsearch TLS certificate verification	enum	True/False	False
`--es-metrics-index`	Index name for metrics in Elasticsearch	string	-	krkn-metrics
`--es-alerts-index`	Index name for alerts in Elasticsearch	string	-	krkn-alerts
`--es-telemetry-index`	Index name for telemetry in Elasticsearch	string	-	krkn-telemetry

Tunings

Execution timing and iteration controls. See Tunings config for full details.

Parameter	Description	Type	Possible Values	Default
`--wait-duration`	Waits for a certain amount of time after the scenario	number	-	1
`--iterations`	Number of times the same chaos scenario will be executed	number	-	1
`--daemon-mode`	If set the scenario will execute forever	enum	True/False	False

Telemetry

Run data collection and upload settings. See Telemetry config for full details.

Parameter	Description	Type	Possible Values	Default
`--telemetry-enabled`	Enables telemetry support	enum	True/False	False
`--telemetry-api-url`	API endpoint for telemetry data	string	-	https://ulnmf9xv7j.execute-api.us-west-2.amazonaws.com/production
`--telemetry-username`	Username for telemetry authentication	string	-	redhat-chaos
`--telemetry-password`	Password for telemetry authentication	string	-	-
`--telemetry-prometheus-backup`	Enables Prometheus backup for telemetry	enum	True/False	True
`--telemetry-full-prometheus-backup`	Enables full Prometheus backup for telemetry	enum	True/False	False
`--telemetry-backup-threads`	Number of threads for telemetry backup	number	-	5
`--telemetry-archive-path`	Path to save telemetry archive	string	-	/tmp
`--telemetry-max-retries`	Maximum retries for telemetry operations	number	-	0
`--telemetry-run-tag`	Tag for telemetry run	string	-	chaos
`--telemetry-group`	Group name for telemetry data	string	-	default
`--telemetry-archive-size`	Maximum size for telemetry archives in KB	number	-	1000
`--telemetry-logs-backup`	Enables logs backup for telemetry	enum	True/False	False
`--telemetry-filter-pattern`	Filter pattern for telemetry logs	string	-	`["\\w{3}\\s\\d{1,2}\\s\\d{2}:\\d{2}:\\d{2}\\.\\d+", ...]`
`--telemetry-cli-path`	Path to telemetry CLI tool (oc)	string	-	-
`--telemetry-events-backup`	Enables events backup for telemetry	enum	True/False	True

Note

For –telemetry-archive-size, the lower the value the higher the number of archive files produced and uploaded (processed by –telemetry-backup-threads simultaneously). For unstable or slow connections, keep this value low and increase –telemetry-backup-threads so that on upload failure only the failed chunk is retried.

Health Checks

Application endpoint monitoring during chaos. See Health Checks config for full details.

Parameter	Description	Type	Possible Values	Default
`--health-check-url`	URL to check the health of	string	-	-
`--health-check-interval`	How often to check the health check urls (seconds)	number	-	2
`--health-check-auth`	Authentication tuple to authenticate into health check URL	string	-	-
`--health-check-bearer-token`	Bearer token to authenticate into health check URL	string	-	-
`--health-check-exit`	Exit on failure when health check URL is not able to connect	string	-	-
`--health-check-verify`	SSL verification for health check URL	string	-	false

Virt Checks

KubeVirt VMI SSH connection monitoring during chaos. See Virt Checks config for full details.

Parameter	Description	Type	Possible Values	Default
`--kubevirt-check-interval`	How often to check the KubeVirt VMs SSH status (seconds)	number	-	2
`--kubevirt-namespace`	KubeVirt namespace to check the health of	string	-	-
`--kubevirt-name`	KubeVirt regex names to watch	string	-	-
`--kubevirt-only-failures`	KubeVirt checks only report if failure occurs	enum	True/False	false
`--kubevirt-disconnected`	KubeVirt checks in disconnected mode, bypassing the cluster’s API	enum	True/False	false
`--kubevirt-ssh-node`	KubeVirt backup node to SSH into when checking VMI IP address status	string	-	false
`--kubevirt-exit-on-failure`	KubeVirt fails run if VMs still have false status	enum	True/False	false
`--kubevirt-node-node`	Only track VMs in KubeVirt on given node name	string	-	false

3 - Supported Cloud Providers

AWS

NOTE: For clusters with AWS make sure AWS CLI is installed and properly configured using an AWS account. This should set a configuration file at $HOME/.aws/config for your the AWS account. If you have multiple profiles configured on AWS, you can change the profile by setting export AWS_DEFAULT_PROFILE=<profile-name>

export AWS_DEFAULT_REGION=<aws-region>

This configuration will work for self managed AWS, ROSA and Rosa-HCP

GCP

NOTE: For clusters with GCP make sure GCP CLI is installed.

A google service account is required to give proper authentication to GCP for node actions. See here for how to create a service account.

NOTE: A user with ‘resourcemanager.projects.setIamPolicy’ permission is required to grant project-level permissions to the service account.

After creating the service account, enable it by exporting the credentials path or running gcloud init:

export GOOGLE_APPLICATION_CREDENTIALS="<serviceaccount.json>"

In krkn-hub, you’ll need to both set the environment variable and also copy the file to the local container:

-e GOOGLE_APPLICATION_CREDENTIALS=<container_creds_file>

The container path needs to match the path mounted via the -v flag below:

-v <local_gcp_creds_file>:<container_creds_file>:Z

Example:

podman run -e GOOGLE_APPLICATION_CREDENTIALS=/home/krkn/GCP_app.json -e DURATION=10 --net=host  -v <kubeconfig>:/home/krkn/.kube/config:Z -v <local_gcp_creds_file>:/home/krkn/GCP_app.json:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:...

Openstack

NOTE: For clusters with Openstack Cloud, ensure to create and source the OPENSTACK RC file to set the OPENSTACK environment variables from the server where Kraken runs.

Azure

NOTE: You will need to create a service principal and give it the correct access, see here for creating the service principal and setting the proper permissions.

To properly run the service principal requires “Azure Active Directory Graph/Application.ReadWrite.OwnedBy” api permission granted and “User Access Administrator”.

Before running you will need to set the following:

export AZURE_SUBSCRIPTION_ID=<subscription_id>
export AZURE_TENANT_ID=<tenant_id>
export AZURE_CLIENT_SECRET=<client secret>
export AZURE_CLIENT_ID=<client id>

Note

This configuration will only work for self managed Azure, not ARO. ARO service puts a deny assignment in place over cluster managed resources, that only allows the ARO service itself to modify the VM resources. This is a capability unique to Azure and the structure of the service to prevent customers from hurting themselves. Refer to the links below for more documentation around this.

Alibaba

See the Installation guide to install alicloud cli.

export ALIBABA_ID=<access_key_id>
export ALIBABA_SECRET=<access key secret>
export ALIBABA_REGION_ID=<region id>

Refer to region and zone page to get the region id for the region you are running on.

Set cloud_type to either alibaba or alicloud in your node scenario yaml file.

VMware

Set the following environment variables:

export VSPHERE_IP=<vSphere_client_IP_address>
export VSPHERE_USERNAME=<vSphere_client_username>
export VSPHERE_PASSWORD=<vSphere_client_password>

These are the credentials that you would normally use to access the vSphere client.

IBMCloud

If no api key is set up with proper VPC resource permissions, use the following to create:

Access group
Service ID with the following access:
- With policy VPC Infrastructure Services
- Resources = All
- Roles:
  - Editor
  - Administrator
  - Operator
  - Viewer
API Key

Set the following environment variables:

export IBMC_URL=https://<region>.iaas.cloud.ibm.com/v1
export IBMC_APIKEY=<ibmcloud_api_key>

IBMCloud Power

If no api key is set up with proper VPC resource permissions, use the following to create:

Access group
Service ID with the following access:
- With policy Power Virtual Server Workspace
- Resources = All
- Roles:
  - Editor
  - Administrator
  - Operator
  - Viewer
  - Manager
  - Serivce Configuration Reader
  - Key Manager
API Key

Set the following environment variables:

export IBMC_POWER_URL="https://<region>.power-iaas.cloud.ibm.com"
export IBMC_APIKEY=<ibmcloud_api_key>
export IBMC_POWER_CRN=<workspace_crn>

4 - Application Outage Scenarios

Application outages

Scenario to block the traffic ( Ingress/Egress ) of an application matching the labels for the specified duration of time to understand the behavior of the service/other services which depend on it during downtime. This helps with planning the requirements accordingly, be it improving the timeouts or tweaking the alerts etc.

You can add in your applications URL into the health checks section of the config to track the downtime of your application during this scenario

Rollback Scenario Support

Krkn supports rollback for Application outages. For more details, please refer to the Rollback Scenarios documentation.

Debugging steps in case of failures

Kraken creates a network policy blocking the ingress/egress traffic to create an outage, in case of failures before reverting back the network policy, you can delete it manually by executing the following commands to stop the outage:

$ oc delete networkpolicy/kraken-deny -n <targeted-namespace>

How to Run Application Outage Scenarios

Choose your preferred method to run application outage scenarios:

Sample scenario config

Example scenario file: app_outage.yaml

application_outage:                                  # Scenario to create an outage of an application by blocking traffic
  duration: 600                                      # Duration in seconds after which the routes will be accessible. Default if omitted: 60
  namespace: <namespace-with-application>            # Namespace to target - all application routes will go inaccessible if pod selector is empty
  pod_selector: {app: foo}                           # Pods to target
  exclude_label: ""                                  # Optional label selector to exclude pods. Supports dict, string, or list format
  block: [Ingress, Egress]                           # It can be Ingress or Egress or Ingress, Egress

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - application_outages_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - application_outages_scenarios:
            - scenarios/app-outage-1.yaml
            - scenarios/app-outage-2.yaml
            - scenarios/app-outage-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - application_outages_scenarios:
            - scenarios/app-outage.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - container_scenarios:
            - scenarios/container-kill.yaml
        - application_outages_scenarios:  # Same type can appear multiple times
            - scenarios/app-outage-2.yaml

Run

python run_kraken.py --config config/config.yaml

This scenario disrupts the traffic to the specified application to be able to understand the impact of the outage on the dependent service/user experience. Refer docs for more details.

Run

If enabling Cerberus to monitor the cluster and pass/fail the scenario post chaos, refer docs. Make sure to start it before injecting the chaos and set CERBERUS_ENABLED environment variable for the chaos injection container to autoconnect.

$ podman run --name=<container_name> \
  --net=host \
  --env-host=true \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:application-outages

$ podman logs -f <container_name or container_id>

$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}"

Note

–env-host: This option is not available with the remote Podman client, including Mac and Windows (excluding WSL2) machines. Without the –env-host option you’ll have to set each environment variable on the podman command line like -e <VARIABLE>=<value>

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:application-outages

# OR

$ docker run -e <VARIABLE>=<value> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:application-outages

$ docker logs -f <container_name or container_id>

$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}"

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && \
chmod 444 ~/kubeconfig && \
docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v ~/kubeconfig:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
DURATION	Duration in seconds after which the routes will be accessible	number	600
NAMESPACE	Namespace to target - all application routes will go inaccessible if pod selector is empty ( Required )	string	No default
POD_SELECTOR	Pods to target. For example “{app: foo}”	string	No default
EXCLUDE_LABEL	Pods to exclude after getting list of pods from POD_SELECTOR to target. For example “{app: foo}”	string	No default
BLOCK_TRAFFIC_TYPE	It can be Ingress or Egress or Ingress, Egress ( needs to be a list )	string	[Ingress]

Note

Defining the NAMESPACE parameter is required for running this scenario while the pod_selector is optional. In case of using pod selector to target a particular application, make sure to define it using the following format with a space between key and value: “{key: value}”.

Note

In case of using custom metrics profile or alerts profile when CAPTURE_METRICS or ENABLE_ALERTS is enabled, mount the metrics profile from the host on which the container is run using podman/docker under /home/krkn/kraken/config/metrics-aggregated.yaml and /home/krkn/kraken/config/alerts.

For example:

$ podman run --name=<container_name> \
  --net=host \
  --env-host=true \
  --pull=always \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:application-outages

krknctl run application-outages [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Required	Default
`--namespace`	Namespace to target - all application routes will go inaccessible if pod selector is empty	string	True
`--chaos-duration`	Set chaos duration (in sec) as desired	number	False	600
`--pod-selector`	Pods to target. For example “{app: foo}”	string	True
`--exclude-selector`	Pods to exclude after using pod-selector to target. For example “{app: foo}”	string	False
`--block-traffic-type`	It can be [Ingress] or [Egress] or [Ingress, Egress]	string	False	“[Ingress, Egress]”

Behavior Notes

Empty --pod-selector: When left empty, krkn creates a NetworkPolicy that targets all pods in the namespace, causing a namespace-wide outage.
Automatic cleanup: After --chaos-duration expires, krkn automatically deletes the NetworkPolicy it created and traffic resumes. A rollback handler is also registered to ensure cleanup if the scenario fails unexpectedly.

To see all available scenario options

krknctl run application-outages --help

Demo

See a demo of this scenario:

5 - Aurora Disruption Scenario

This scenario blocks a pod’s outgoing MySQL and PostgreSQL traffic, effectively preventing it from connecting to any AWS Aurora SQL engine. It works just as well for standard MySQL and PostgreSQL connections too.

This uses the pod network filter scenario but set with specific parameters to disrupt aurora

How to Run Aurora Disruption Scenarios

Choose your preferred method to run aurora disruption scenarios:

Example scenario file: aurora_disruption.yml

Scenario config

- id: pod_network_filter
  wait_duration: 0
  test_duration: 60
  label_selector: ''
  service_account: ''
  namespace: 'default'
  instance_count: 1
  execution: parallel
  ingress: false
  egress: true
  target: node
  interfaces: []
  ports: [3306,5432]
  taints: []
  protocols:
    - tcp
  image: quay.io/krkn-chaos/krkn-network-chaos:latest

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/aurora-disruption-1.yaml
            - scenarios/aurora-disruption-2.yaml
            - scenarios/aurora-disruption-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/aurora-disruption.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - container_scenarios:
            - scenarios/container-kill.yaml
        - network_chaos_ng_scenarios:  # Same type can appear multiple times
            - scenarios/aurora-disruption-2.yaml

Run

python run_kraken.py --config config/config.yaml

This scenario disrupts a targeted zone in the public cloud by blocking egress and ingress traffic to understand the impact on both Kubernetes/OpenShift platforms control plane as well as applications running on the worker nodes in that zone. More information is documented here

Run

podman run -v ~/.kube/config:/home/krkn/.kube/config:z -e TEST_DURATION="60" \
    -e INGRESS="false" -e EGRESS="true" -e PROTOCOLS="tcp" -e PORTS="3306,5432" \
    -e POD_NAME="target-pod" quay.io/krkn-chaos/krkn-hub:pod-network-filter

To run aurora-disruption using krknctl, feel free to adjust the pod-name as needed for the name of the pod on your cluster

krknctl run pod-network-filter \
 --chaos-duration 60 \
 --pod-name target-pod \
 --ingress false \
 --egress true \
 --protocols tcp \
 --ports 3306,5432

6 - Container Scenarios

Kraken uses the `oc exec` command to `kill` specific containers in a pod. This can be based on the pods namespace or labels. If you know the exact object you want to kill, you can also specify the specific container name or pod name in the scenario yaml file. These scenarios are in a simple yaml format that you can manipulate to run your specific tests or use the pre-existing scenarios to see how it works.

Recovery Time Metrics in Krkn Telemetry

Krkn tracks three key recovery time metrics for each affected container:

pod_rescheduling_time - The time (in seconds) that the Kubernetes cluster took to reschedule the pod after it was killed. This measures the cluster’s scheduling efficiency and includes the time from pod deletion until the replacement pod is scheduled on a node. In some cases when the container gets killed, the pod won’t fully reschedule so the pod rescheduling might be 0.0 seconds
pod_readiness_time - The time (in seconds) the pod took to become ready after being scheduled. This measures application startup time, including container image pulls, initialization, and readiness probe success.
total_recovery_time - The total amount of time (in seconds) from pod deletion until the replacement pod became fully ready and available to serve traffic. This is the sum of rescheduling time and readiness time.

These metrics appear in the telemetry output under PodsStatus.recovered for successfully recovered pods. Pods that fail to recover within the timeout period appear under PodsStatus.unrecovered without timing data.

Example telemetry output:

{
  "recovered": [
    {
      "pod_name": "backend-7d8f9c-xyz",
      "namespace": "production",
      "pod_rescheduling_time": 43.62235879898071,
      "pod_readiness_time": 0.0,
      "total_recovery_time": 43.62235879898071
    }
  ],
  "unrecovered": []
}

How to Run Container Scenarios

Choose your preferred method to run container scenarios:

Example scenario files from scenarios-hub:

Example Config

The following are the components of Kubernetes for which a basic chaos scenario config exists today.

scenarios:
- name: "<name of scenario>"
  namespace: "<specific namespace>" # can specify "*" if you want to find in all namespaces
  label_selector: "<label of pod(s)>"
  container_name: "<specific container name>"  # This is optional, can take out and will kill all containers in all pods found under namespace and label
  pod_names:  # This is optional, can take out and will select all pods with given namespace and label
  - <pod_name>
  exclude_label: "<label to exclude pods from chaos>" # Optional: pods matching this label will be excluded from disruption
  count: <number of containers to disrupt, default=1>
  action: <kill signal to run. For example 1 ( hang up ) or 9. Default is set to 1>
  expected_recovery_time: <number of seconds to wait for container to be running again> (defaults to 60seconds)

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - container_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - container_scenarios:
            - scenarios/container-kill-1.yaml
            - scenarios/container-kill-2.yaml
            - scenarios/container-kill-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - container_scenarios:
            - scenarios/container-kill.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - container_scenarios:  # Same type can appear multiple times
            - scenarios/container-kill-2.yaml

Run

python run_kraken.py --config config/config.yaml

This scenario disrupts the containers matching the label in the specified namespace on a Kubernetes/OpenShift cluster.

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:container-scenarios
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:container-scenarios
$ docker run \
  -e <VARIABLE>=<value> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:container-scenarios

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Default
NAMESPACE	Targeted namespace in the cluster	openshift-etcd
LABEL_SELECTOR	Label of the container(s) to target	k8s-app=etcd
EXCLUDE_LABEL	Pods to exclude after getting list of pods from LABEL_SELECTOR to target. For example “app=foo”	No default
DISRUPTION_COUNT	Number of containers to disrupt	1
CONTAINER_NAME	Name of the container to disrupt	etcd
ACTION	kill signal to run. For example 1 ( hang up ) or 9	1
EXPECTED_RECOVERY_TIME	Time to wait before checking if all containers that were affected recover properly	60

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:container-scenarios

krknctl run container-scenarios [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Required	Default
`--namespace`	Targeted namespace in the cluster	string	No	openshift-etcd
`--label-selector`	Label of the container(s) to target	string	No	k8s-app=etcd
`--exclude-selector`	Pods to exclude from targeting. For example “{app: foo}”	string	No	""
`--disruption-count`	Number of containers to disrupt	number	No	1
`--container-name`	Name of the container to disrupt	string	No	etcd
`--action`	kill signal to run. For example 1 ( hang up ) or 9	string	No	1
`--expected-recovery-time`	Time to wait before checking if all containers that were affected recover properly	number	No	60

Behavior Notes

Recovery monitoring: After disrupting containers, krkn monitors for recovery up to --expected-recovery-time seconds. If any containers remain unrecovered after the timeout, the scenario reports failure.

To see all available scenario options

krknctl run container-scenarios --help

Demo

See a demo of this scenario:

7 - DNS Outage Scenarios

This scenario blocks all outgoing DNS traffic from a specific pod, effectively preventing it from resolving any hostnames or service names.

How to Run DNS Outage Scenarios

Choose your preferred method to run DNS outage scenarios:

Example scenario file: dns_outage.yml

Sample scenario config

- id: pod_network_filter
  wait_duration: 0
  test_duration: 60
  label_selector: ''
  service_account: ''
  namespace: 'default'
  instance_count: 1
  execution: parallel
  ingress: false
  egress: true
  target: <pod_name>
  interfaces: []
  ports: [53]
  taints: []
  protocols:
    - tcp
    - udp
  image: quay.io/krkn-chaos/krkn-network-chaos:latest

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/dns-outage-1.yaml
            - scenarios/dns-outage-2.yaml
            - scenarios/dns-outage-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/dns-outage.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - container_scenarios:
            - scenarios/container-kill.yaml
        - network_chaos_ng_scenarios:  # Same type can appear multiple times
            - scenarios/dns-outage-2.yaml

Run

python run_kraken.py --config config/config.yaml

podman run -v ~/.kube/config:/home/krkn/.kube/config:z -e TEST_DURATION="60" \
    -e INGRESS="false" -e EGRESS="true" -e PROTOCOLS="tcp,udp" -e PORTS="53" \ 
    -e POD_NAME="target-pod" quay.io/krkn-chaos/krkn-hub:pod-network-filter

krknctl run pod-network-filter \
 --chaos-duration 60 \
 --pod-name target-pod \
 --ingress false \
 --egress true \
 --protocols tcp,udp \
 --ports 53

8 - EFS Disruption Scenarios

This scenario creates an outgoing firewall rule on specific nodes in your cluster, chosen by node name or a selector. This rule blocks connections to AWS EFS, leading to a temporary failure of any EFS volumes mounted on those affected nodes.

How to Run EFS Disruption Scenarios

Choose your preferred method to run EFS disruption scenarios:

Example scenario file: efs_disruption.yml

Sample scenario config

- id: node_network_filter
  wait_duration: 0
  test_duration: 60
  label_selector: ''
  service_account: ''
  namespace: 'default'
  instance_count: 1
  execution: parallel
  ingress: false
  egress: true
  target: '<NODE_NAME>'
  interfaces: []
  ports: [2049]
  taints: []
  protocols:
    - tcp
    - udp
  image: quay.io/krkn-chaos/krkn-network-chaos:latest

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/efs-disruption-1.yaml
            - scenarios/efs-disruption-2.yaml
            - scenarios/efs-disruption-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/efs-disruption.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - network_chaos_ng_scenarios:  # Same type can appear multiple times
            - scenarios/efs-disruption-2.yaml

Run

python run_kraken.py --config config/config.yaml

Run

podman run -v ~/.kube/config:/home/krkn/.kube/config:z -e TEST_DURATION="60" \
    -e INGRESS="false" -e EGRESS="true" -e PROTOCOLS="tcp,udp" -e PORTS="2049" \
    -e NODE_NAME="<node_name>" quay.io/krkn-chaos/krkn-hub:node-network-filter

krknctl run node-network-filter \
 --chaos-duration 60 \
 --node-name kind-control-plane \
 --ingress false \
 --egress true \
 --protocols tcp,udp \
 --ports 2049

9 - ETCD Split Brain Scenarios

This scenario isolates an etcd node by blocking its network traffic. This action forces an etcd leader re-election. Once the scenario concludes, the cluster should temporarily exhibit a split-brain condition, with two etcd leaders active simultaneously. This is particularly useful for testing the etcd cluster’s resilience under such a challenging state.

DANGER

This scenario carries a significant risk: it might break the cluster API, making it impossible to automatically revert the applied network rules. The iptables rules will be printed to the console, allowing for manual reversal via a shell on the affected node. This scenario is best suited for disposable clusters and should be used at your own risk.

How to Run ETCD Split Brain Scenarios

Choose your preferred method to run ETCD split brain scenarios:

Example scenario file: etcd_split_brain.yml

To run

Sample scenario config

- id: node_network_filter
  wait_duration: 0
  test_duration: 60
  label_selector: ''
  service_account: ''
  namespace: 'default'
  instance_count: 1
  execution: parallel
  ingress: false
  egress: true
  target: '<NODE_NAME>'
  interfaces: []
  ports: [2379, 2380]
  taints: []
  protocols:
    - tcp
  image: quay.io/krkn-chaos/krkn-network-chaos:latest

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/etcd-split-brain-1.yaml
            - scenarios/etcd-split-brain-2.yaml
            - scenarios/etcd-split-brain-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/etcd-split-brain.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - network_chaos_ng_scenarios:  # Same type can appear multiple times
            - scenarios/etcd-split-brain-2.yaml

Run

python run_kraken.py --config config/config.yaml

DANGER

podman run -v ~/.kube/config:/home/krkn/.kube/config:z -e TEST_DURATION="60" -e INGRESS="false" -e EGRESS="true" -e PROTOCOLS="tcp" -e PORTS="2379,2380" -e NODE_NAME="kind-control-plane" quay.io/krkn-chaos/krkn-hub:node-network-filter

krknctl run node-network-filter \
 --chaos-duration 60 \
 --node-name <node_name> \
 --ingress false \
 --egress true \
 --protocols tcp \
 --ports 2379,2380

10 - Hog Scenarios

Hog Scenarios background

Hog Scenarios are designed to push the limits of memory, CPU, or I/O on one or more nodes in your cluster. They also serve to evaluate whether your cluster can withstand rogue pods that excessively consume resources without any limits.

These scenarios involve deploying one or more workloads in the cluster. Based on the specific configuration, these workloads will use a predetermined amount of resources for a specified duration.

Config Options

Common options

Option	Type	Description
`duration`	number	the duration of the stress test in seconds
`workers`	number (Optional)	the number of threads instantiated by stress-ng, if left empty the number of workers will match the number of available cores in the node.
`hog-type`	string (Enum)	can be cpu, memory or io.
`image`	string	the container image of the stress workload (quay.io/krkn-chaos/krkn-hog)
`namespace`	string	the namespace where the stress workload will be deployed
`node-selector`	string (Optional)	defines the node selector for choosing target nodes. If not specified, one schedulable node in the cluster will be chosen at random. If multiple nodes match the selector, all of them will be subjected to stress. If number-of-nodes is specified, that many nodes will be randomly selected from those identified by the selector.
`taints`	list (Optional) default []	list of taints for which tolerations need to be created. Example: [“node-role.kubernetes.io/master:NoSchedule”]
`number-of-nodes`	number (Optional)	restricts the number of selected nodes by the selector

Available Scenarios

Hog scenarios:

Rollback Scenario Support

Krkn supports rollback for all available Hog scenarios. For more details, please refer to the Rollback Scenarios documentation.

10.1 - CPU Hog Scenario

Overview

The CPU Hog scenario is designed to create CPU pressure on one or more nodes in your Kubernetes/OpenShift cluster for a specified duration. This scenario helps you test how your cluster and applications respond to high CPU utilization.

How It Works

The scenario deploys a stress workload pod on targeted nodes. These pods use stress-ng to consume CPU resources according to your configuration. The workload runs for a specified duration and then terminates, allowing you to observe your cluster’s behavior under CPU stress.

When to Use

Use the CPU Hog scenario to:

Test your cluster’s ability to handle CPU resource contention
Validate that CPU resource limits and quotas are properly configured
Evaluate the impact of CPU pressure on application performance
Test whether your monitoring and alerting systems properly detect CPU saturation
Verify that the Kubernetes scheduler correctly handles CPU-constrained nodes
Simulate scenarios where rogue pods consume excessive CPU without limits

Key Configuration Options

In addition to the common hog scenario options, CPU Hog scenarios support:

Option	Type	Description
`cpu-load-percentage`	number	The percentage of CPU that will be consumed by the hog
`cpu-method`	string	The CPU load strategy adopted by stress-ng (see stress-ng documentation for available options)

How to Run CPU Hog Scenarios

Choose your preferred method to run CPU hog scenarios:

To enable this plugin add the pointer to the scenario input file scenarios/kube/cpu-hog.yml as described in the Usage section.

Example scenario file: cpu-hog.yml

`cpu-hog` options

In addition to the common hog scenario options, you can specify the below options in your scenario configuration to specificy the amount of CPU to hog on a certain worker node

Option	Type	Description
`cpu-load-percentage`	number	the amount of cpu that will be consumed by the hog
`cpu-method`	string	reflects the cpu load strategy adopted by stress-ng, please refer to the stress-ng documentation for all the available options

Usage

To enable hog scenarios edit the kraken config file, go to the section kraken -> chaos_scenarios of the yaml structure and add a new element to the list named hog_scenarios then add the desired scenario pointing to the hog.yaml file.

kraken:
    ...
    chaos_scenarios:
        - hog_scenarios:
            - scenarios/kube/cpu-hog.yml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - hog_scenarios:
            - scenarios/kube/cpu-hog-1.yml
            - scenarios/kube/cpu-hog-2.yml
            - scenarios/kube/cpu-hog-3.yml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - hog_scenarios:
            - scenarios/kube/cpu-hog.yml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - hog_scenarios:  # Same type can appear multiple times
            - scenarios/kube/cpu-hog-2.yml

Run

python run_kraken.py --config config/config.yaml

This scenario hogs the cpu on the specified node on a Kubernetes/OpenShift cluster for a specified duration. For more information refer the following documentation.

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-cpu-hog
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-cpu-hog
$ docker run \
  -e <VARIABLE>=<value> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-cpu-hog

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Default
TOTAL_CHAOS_DURATION	Set chaos duration (in sec) as desired	60
NODE_CPU_CORE	Number of cores (workers) of node CPU to be consumed	2
NODE_CPU_PERCENTAGE	Percentage of total cpu to be consumed	50
NAMESPACE	Namespace where the scenario container will be deployed	default
NODE_SELECTOR	Defines the node selector for choosing target nodes. If not specified, one schedulable node in the cluster will be chosen at random. If multiple nodes match the selector, all of them will be subjected to stress. If number-of-nodes is specified, that many nodes will be randomly selected from those identified by the selector.	""
TAINTS	List of taints for which tolerations need to be created. Example: [“node-role.kubernetes.io/master:NoSchedule”]	[]
NUMBER_OF_NODES	Restricts the number of selected nodes by the selector	""
IMAGE	The container image of the stress workload	quay.io/krkn-chaos/krkn-hog

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-cpu-hog

krknctl run node-cpu-hog [--<parameter> <value>]

Can also set any global variable listed here

Parameter	Description	Type	Required	Default
`--chaos-duration`	Set chaos duration (in secs) as desired	number	No	60
`--cores`	Number of cores (workers) of node CPU to be consumed	number	No
`--cpu-percentage`	Percentage of total cpu to be consumed	number	No	50
`--namespace`	Namespace where the scenario container will be deployed	string	No	default
`--node-selector`	Node selector where the scenario containers will be scheduled in the format “=”. NOTE: Will be instantiated a container per each node selected with the same scenario options. If left empty a random node will be selected	string	No
`--taints`	List of taints for which tolerations need to be created. For example [“node-role.kubernetes.io/master:NoSchedule”]"	string	No	[]
`--number-of-nodes`	restricts the number of selected nodes by the selector	number	No
`--image`	The hog container image. Can be changed if the hog image is mirrored on a private repository	string	No	quay.io/krkn-chaos/krkn-hog

To see all available scenario options

krknctl run node-cpu-hog --help

Demo

See a demo of this scenario:

10.2 - IO Hog Scenario

Overview

The IO Hog scenario is designed to create disk I/O pressure on one or more nodes in your Kubernetes/OpenShift cluster for a specified duration. This scenario helps you test how your cluster and applications respond to high disk I/O utilization and storage-related bottlenecks.

How It Works

The scenario deploys a stress workload pod on targeted nodes. These pods use stress-ng to perform intensive write operations to disk, consuming I/O resources according to your configuration. The scenario supports attaching node paths to the pod as a hostPath volume or using custom pod volume definitions, allowing you to test I/O pressure on specific storage targets.

When to Use

Use the IO Hog scenario to:

Test your cluster’s behavior under disk I/O pressure
Validate that I/O resource limits are properly configured
Evaluate the impact of disk I/O contention on application performance
Test whether your monitoring systems properly detect disk saturation
Verify that storage performance meets requirements under stress
Simulate scenarios where pods perform excessive disk writes
Test the resilience of persistent volume configurations
Validate disk I/O quotas and rate limiting

Key Configuration Options

In addition to the common hog scenario options, IO Hog scenarios support:

Option	Type	Description
`io-block-size`	string	The size of each individual write operation performed by the stressor
`io-write-bytes`	string	The total amount of data that will be written by the stressor. Can be specified as a percentage (%) of free space on the filesystem or in absolute units (b, k, m, g for Bytes, KBytes, MBytes, GBytes)
`io-target-pod-folder`	string	The path within the pod where the volume will be mounted
`io-target-pod-volume`	dictionary	The pod volume definition that will be stressed by the scenario (typically a `hostPath` volume)

WARNING

Modifying the structure of io-target-pod-volume might alter how the hog operates, potentially rendering it ineffective.

Example Values

io-block-size: "1m" - Write in 1 megabyte blocks
io-block-size: "4k" - Write in 4 kilobyte blocks
io-write-bytes: "50%" - Write data equal to 50% of available free space
io-write-bytes: "10g" - Write 10 gigabytes of data

How to Run IO Hog Scenarios

Choose your preferred method to run IO hog scenarios:

To enable this plugin add the pointer to the scenario input file scenarios/kube/io-hog.yaml as described in the Usage section.

Example scenario file: io-hog.yml

`io-hog` options

In addition to the common hog scenario options, you can specify the below options in your scenario configuration to target specific pod IO

Option	Type	Description
`io-block-size`	string	the block size written by the stressor
`io-write-bytes`	string	the total amount of data that will be written by the stressor. The size can be specified as % of free space on the file system or in units of Bytes, KBytes, MBytes and GBytes using the suffix b, k, m or g
`io-target-pod-folder`	string	the folder where the volume will be mounted in the pod
`io-target-pod-volume`	dictionary	the pod volume definition that will be stressed by the scenario.

WARNING

Modifying the structure of io-target-pod-volume might alter how the hog operates, potentially rendering it ineffective.

Usage

kraken:
    ...
    chaos_scenarios:
        - hog_scenarios:
            - scenarios/kube/io-hog.yml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - hog_scenarios:
            - scenarios/kube/io-hog-1.yml
            - scenarios/kube/io-hog-2.yml
            - scenarios/kube/io-hog-3.yml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - hog_scenarios:
            - scenarios/kube/io-hog.yml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - hog_scenarios:  # Same type can appear multiple times
            - scenarios/kube/io-hog-2.yml

Run

python run_kraken.py --config config/config.yaml

This scenario hogs the IO on the specified node on a Kubernetes/OpenShift cluster for a specified duration. For more information refer the following documentation.

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/root/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-io-hog
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/root/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-io-hog
$ docker run \
  -e <VARIABLE>=<value> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/root/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-io-hog

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
TOTAL_CHAOS_DURATION	Set chaos duration (in sec) as desired	number	180
IO_BLOCK_SIZE	string size of each write in bytes. Size can be from 1 byte to 4m	string	1m
IO_WORKERS	Number of stressorts	number	5
IO_WRITE_BYTES	string writes N bytes for each hdd process. The size can be expressed as % of free space on the file system or in units of Bytes, KBytes, MBytes and GBytes using the suffix b, k, m or g	string	10m
NAMESPACE	Namespace where the scenario container will be deployed	string	default
NODE_SELECTOR	defines the node selector for choosing target nodes. If not specified, one schedulable node in the cluster will be chosen at random. If multiple nodes match the selector, all of them will be subjected to stress. If number-of-nodes is specified, that many nodes will be randomly selected from those identified by the selector.	string	""
TAINTS	List of taints for which tolerations need to be created. Example: [“node-role.kubernetes.io/master:NoSchedule”]	string	[]
NODE_MOUNT_PATH	the local path in the node that will be mounted in the pod and that will be filled by the scenario	string	/root
NUMBER_OF_NODES	restricts the number of selected nodes by the selector	number	""
IMAGE	the container image of the stress workload	string	quay.io/krkn-chaos/krkn-hog

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/root/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/root/kraken/config/alerts \
  -v <path-to-kube-config>:/root/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-io-hog

krknctl run node-io-hog [--<parameter> <value>]

Can also set any global variable listed here

Parameter	Description	Type	Required	Default
`--chaos-duration`	Set chaos duration (in sec) as desired	number	No	60
`--io-block-size`	Size of each write in bytes. Size can be from 1 byte to 4 Megabytes (allowed suffix are b,k,m)	string	No	1m
`--io-workers`	Number of stressor instances	number	No	5
`--io-write-bytes`	string writes N bytes for each hdd process. The size can be expressed as % of free space on the file system or in units of Bytes, KBytes, MBytes and GBytes using the suffix b, k, m or g	string	No	10m
`--node-mount-path`	the path in the node that will be mounted in the pod and where the io hog will be executed. NOTE: be sure that kubelet has the rights to write in that node path	string	No	/root
`--namespace`	Namespace where the scenario container will be deployed	string	No	default
`--node-selector`	Node selector where the scenario containers will be scheduled in the format “=”. NOTE: Will be instantiated a container per each node selected with the same scenario options. If left empty a random node will be selected	string	No
`--taints`	List of taints for which tolerations need to be created. For example [“node-role.kubernetes.io/master:NoSchedule”]"	string	No	[]
`--number-of-nodes`	restricts the number of selected nodes by the selector	number	No
`--image`	The hog container image. Can be changed if the hog image is mirrored on a private repository	string	No	quay.io/krkn-chaos/krkn-hog

To see all available scenario options

krknctl run node-io-hog --help

10.3 - Memory Hog Scenario

Overview

The Memory Hog scenario is designed to create virtual memory pressure on one or more nodes in your Kubernetes/OpenShift cluster for a specified duration. This scenario helps you test how your cluster and applications respond to memory exhaustion and pressure conditions.

How It Works

The scenario deploys a stress workload pod on targeted nodes. These pods use stress-ng to allocate and consume memory resources according to your configuration. The workload runs for a specified duration, allowing you to observe how your cluster handles memory pressure, OOM (Out of Memory) conditions, and eviction scenarios.

When to Use

Use the Memory Hog scenario to:

Test your cluster’s behavior under memory pressure
Validate that memory resource limits and quotas are properly configured
Test pod eviction policies when nodes run out of memory
Verify that the kubelet correctly evicts pods based on memory pressure
Evaluate the impact of memory contention on application performance
Test whether your monitoring systems properly detect memory saturation
Simulate scenarios where rogue pods consume excessive memory without limits
Validate that memory-based horizontal pod autoscaling works correctly

Key Configuration Options

In addition to the common hog scenario options, Memory Hog scenarios support:

Option	Type	Description
`memory-vm-bytes`	string	The amount of memory that the scenario will attempt to allocate and consume. Can be specified as a percentage (%) of available memory or in absolute units (b, k, m, g for Bytes, KBytes, MBytes, GBytes)

Example Values

memory-vm-bytes: "80%" - Consume 80% of available memory
memory-vm-bytes: "2g" - Consume 2 gigabytes of memory
memory-vm-bytes: "512m" - Consume 512 megabytes of memory

How to Run Memory Hog Scenarios

Choose your preferred method to run memory hog scenarios:

To enable this plugin add the pointer to the scenario input file scenarios/kube/memory-hog.yml as described in the Usage section.

Example scenario file: memory-hog.yml

`memory-hog` options

In addition to the common hog scenario options, you can specify the below options in your scenario configuration to specificy the amount of memory to hog on a certain worker node

Option	Type	Description
`memory-vm-bytes`	string	the amount of memory that the scenario will try to hog.The size can be specified as % of free space on the file system or in units of Bytes, KBytes, MBytes and GBytes using the suffix b, k, m or g

Usage

kraken:
    ...
    chaos_scenarios:
        - hog_scenarios:
            - scenarios/kube/memory-hog.yml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - hog_scenarios:
            - scenarios/kube/memory-hog-1.yml
            - scenarios/kube/memory-hog-2.yml
            - scenarios/kube/memory-hog-3.yml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - hog_scenarios:
            - scenarios/kube/memory-hog.yml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - hog_scenarios:  # Same type can appear multiple times
            - scenarios/kube/memory-hog-2.yml

Run

python run_kraken.py --config config/config.yaml

This scenario hogs the memory on the specified node on a Kubernetes/OpenShift cluster for a specified duration. For more information refer the following documentation.

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-memory-hog
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-memory-hog
$ docker run \
  -e <VARIABLE>=<value> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-memory-hog

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Default
TOTAL_CHAOS_DURATION	Set chaos duration (in sec) as desired	60
MEMORY_CONSUMPTION_PERCENTAGE	percentage (expressed with the suffix %) or amount (expressed with the suffix b, k, m or g) of memory to be consumed by the scenario	90%
NUMBER_OF_WORKERS	Total number of workers (stress-ng threads)	1
NAMESPACE	Namespace where the scenario container will be deployed	default
NODE_SELECTOR	defines the node selector for choosing target nodes. If not specified, one schedulable node in the cluster will be chosen at random. If multiple nodes match the selector, all of them will be subjected to stress. If number-of-nodes is specified, that many nodes will be randomly selected from those identified by the selector.	""
TAINTS	List of taints for which tolerations need to be created. Example: [“node-role.kubernetes.io/master:NoSchedule”]	[]
NUMBER_OF_NODES	restricts the number of selected nodes by the selector	""
IMAGE	the container image of the stress workload	quay.io/krkn-chaos/krkn-hog

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-memory-hog

krknctl run node-memory-hog [--<parameter> <value>]

Can also set any global variable listed here

Parameter	Description	Type	Required	Default
`--chaos-duration`	Set chaos duration (in sec) as desired	number	No	60
`--memory-workers`	Total number of workers (stress-ng threads)	number	No	1
`--memory-consumption`	percentage (expressed with the suffix %) or amount (expressed with the suffix b, k, m or g) of memory to be consumed by the scenario	string	No	90%
`--namespace`	Namespace where the scenario container will be deployed	string	No	default
`--node-selector`	Node selector where the scenario containers will be scheduled in the format “=”. NOTE: Will be instantiated a container per each node selected with the same scenario options. If left empty a random node will be selected	string	No
`--taints`	List of taints for which tolerations need to be created. For example [“node-role.kubernetes.io/master:NoSchedule”]"	string	No	[]
`--number-of-nodes`	restricts the number of selected nodes by the selector	number	No
`--image`	The hog container image. Can be changed if the hog image is mirrored on a private repository	string	No	quay.io/krkn-chaos/krkn-hog

To see all available scenario options

krknctl run node-memory-hog --help

Demo

See a demo of this scenario:

11 - HTTP Load Scenarios

HTTP Load Scenarios

This scenario generates distributed HTTP load against one or more target endpoints using Vegeta load testing pods deployed inside the Kubernetes cluster. It leverages the distributed nature of Kubernetes clusters to instantiate multiple load generator pods, significantly increasing the effectiveness of the load test.

The scenario supports multiple concurrent pods, configurable request rates, multiple HTTP methods (GET, POST, PUT, DELETE, PATCH, HEAD), custom headers, request bodies, and comprehensive results collection with aggregated metrics across all pods.

The configuration allows for the specification of multiple node selectors, enabling Kubernetes to schedule the attacker pods on a user-defined subset of nodes to make the test more realistic.

The attacker container source code is available here.

How to Run HTTP Load Scenarios

Choose your preferred method to run HTTP load scenarios:

Example scenario file: http_load_scenario.yml

Sample scenario config

- http_load_scenario:
    runs: 1                                            # number of times to execute the scenario
    number-of-pods: 2                                  # number of attacker pods instantiated
    namespace: default                                 # namespace to deploy load testing pods
    image: quay.io/krkn-chaos/krkn-http-load:latest    # http load attacker container image
    attacker-nodes:                                    # node affinity to schedule the attacker pods
      node-role.kubernetes.io/worker:                  # per each node label selector can be specified
        - ""                                           # multiple values so the kube scheduler will schedule
                                                       # the attacker pods in the best way possible
                                                       # set empty value `attacker-nodes: {}` to let kubernetes schedule the pods
    targets:                                           # Vegeta round-robins across all endpoints
      endpoints:                                       # supported methods: GET, POST, PUT, DELETE, PATCH, HEAD
        - url: "https://your-service.example.com/health"
          method: "GET"
        - url: "https://your-service.example.com/api/data"
          method: "POST"
          headers:
            Content-Type: "application/json"
            Authorization: "Bearer your-token"
          body: '{"key":"value"}'

    rate: "50/1s"                                      # request rate per pod: "50/1s", "1000/1m", "0" for max throughput
    duration: "30s"                                    # attack duration: "30s", "5m", "1h"
    workers: 10                                        # initial concurrent workers per pod
    max_workers: 100                                   # maximum workers per pod (auto-scales)
    connections: 100                                   # max idle connections per host
    timeout: "10s"                                     # per-request timeout
    keepalive: true                                    # use persistent HTTP connections
    http2: true                                        # enable HTTP/2
    insecure: false                                    # skip TLS verification (for self-signed certs)

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - http_load_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - http_load_scenarios:
            - scenarios/http-load-1.yaml
            - scenarios/http-load-2.yaml
            - scenarios/http-load-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - http_load_scenarios:
            - scenarios/http-load.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - syn_flood_scenarios:
            - scenarios/syn-flood.yaml
        - http_load_scenarios:  # Same type can appear multiple times
            - scenarios/http-load-2.yaml

Run

python run_kraken.py --config config/config.yaml

HTTP Load scenario

This scenario generates distributed HTTP load against one or more target endpoints using Vegeta load testing pods deployed inside the cluster.

Run

$ podman run --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -e TARGET_ENDPOINTS="GET https://myapp.example.com/health" \
  -e NAMESPACE=<target_namespace> \
  -e TOTAL_CHAOS_DURATION=30s \
  -e NUMBER_OF_PODS=2 \
  -e NODE_SELECTORS=<key>=<value>;<key>=<othervalue> \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:http-load

$ podman logs -f <container_name or container_id>

$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}"

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -e TARGET_ENDPOINTS="GET https://myapp.example.com/health" \
  -e NAMESPACE=<target_namespace> \
  -e TOTAL_CHAOS_DURATION=30s \
  -e NUMBER_OF_PODS=2 \
  -e NODE_SELECTORS=<key>=<value>;<key>=<othervalue> \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:http-load

$ docker logs -f <container_name or container_id>

$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}"

TIP: Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && \
chmod 444 ~/kubeconfig && \
docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v ~/kubeconfig:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:http-load

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Default
TARGET_ENDPOINTS	Semicolon-separated list of target endpoints. Format: METHOD URL;METHOD URL HEADER1:VAL1,HEADER2:VAL2 BODY. Example: GET https://myapp.example.com/health;POST https://myapp.example.com/api Content-Type:application/json {"key":"value"}	Required
RATE	Request rate per pod (e.g. 50/1s, 1000/1m, 0 for max throughput)	50/1s
TOTAL_CHAOS_DURATION	Duration of the load test (e.g. 30s, 5m, 1h)	30s
NAMESPACE	The namespace where the attacker pods will be deployed	default
NUMBER_OF_PODS	The number of attacker pods that will be deployed	2
WORKERS	Initial number of concurrent workers per pod	10
MAX_WORKERS	Maximum number of concurrent workers per pod (auto-scales)	100
CONNECTIONS	Maximum number of idle open connections per host	100
TIMEOUT	Per-request timeout (e.g. 10s, 30s)	10s
IMAGE	The container image that will be used to perform the scenario	quay.io/krkn-chaos/krkn-http-load:latest
INSECURE	Skip TLS certificate verification (for self-signed certs)	false
NODE_SELECTORS	The node selectors are used to guide the cluster on where to deploy attacker pods. You can specify one or more labels in the format key=value;key=value2 (even using the same key) to choose one or more node categories. If left empty, the pods will be scheduled on any available node, depending on the cluster’s capacity.

NOTE In case of using custom metrics profile or alerts profile when CAPTURE_METRICS or ENABLE_ALERTS is enabled, mount the metrics profile from the host on which the container is run using podman/docker under /home/krkn/kraken/config/metrics-aggregated.yaml and /home/krkn/kraken/config/alerts. For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:http-load

krknctl run http-load (optional: --<parameter>:<value> )

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Default
`--target-endpoints`	Semicolon-separated list of target endpoints. Format: METHOD URL;METHOD URL HEADER1:VAL1,HEADER2:VAL2 BODY. Example: GET https://myapp.example.com/health;POST https://myapp.example.com/api Content-Type:application/json {"key":"value"}	string	Required
`--rate`	Request rate per pod (e.g. 50/1s, 1000/1m, 0 for max throughput)	string	50/1s
`--chaos-duration`	Duration of the load test (e.g. 30s, 5m, 1h)	string	30s
`--namespace`	The namespace where the attacker pods will be deployed	string	default
`--number-of-pods`	The number of attacker pods that will be deployed	number	2
`--workers`	Initial number of concurrent workers per pod	number	10
`--max-workers`	Maximum number of concurrent workers per pod (auto-scales)	number	100
`--connections`	Maximum number of idle open connections per host	number	100
`--timeout`	Per-request timeout (e.g. 10s, 30s)	string	10s
`--image`	The container image that will be used to perform the scenario	string	quay.io/krkn-chaos/krkn-http-load:latest
`--insecure`	Skip TLS certificate verification (for self-signed certs)	string	false
`--node-selectors`	The node selectors are used to guide the cluster on where to deploy attacker pods. You can specify one or more labels in the format key=value;key=value2 (even using the same key) to choose one or more node categories. If left empty, the pods will be scheduled on any available node, depending on the cluster s capacity.	string

To see all available scenario options

krknctl run http-load --help

12 - KubeVirt VM Outage Scenario

Simulating VM-level disruptions in KubeVirt/OpenShift CNV environments

This scenario enables the simulation of VM-level disruptions in clusters where KubeVirt or OpenShift Containerized Network Virtualization (CNV) is installed. It allows users to delete a Virtual Machine Instance (VMI) to simulate a VM crash and test recovery capabilities.

Purpose

The kubevirt_vm_outage scenario deletes a specific KubeVirt Virtual Machine Instance (VMI) to simulate a VM crash or outage. This helps users:

Test the resilience of applications running inside VMs
Verify that VM monitoring and recovery mechanisms work as expected
Validate high availability configurations for VM workloads
Understand the impact of sudden VM failures on workloads and the overall system

Prerequisites

Before using this scenario, ensure the following:

KubeVirt or OpenShift CNV is installed in your cluster
The target VMI exists and is running in the specified namespace
Your cluster credentials have sufficient permissions to delete and create VMIs

Parameters

The scenario supports the following parameters:

Parameter	Description	Required	Default
vm_name	The name of the VMI to delete	Yes	N/A
namespace	The namespace where the VMI is located	No	“default”
timeout	How long to wait (in seconds) before attempting recovery for VMI to start running again	No	60
kill_count	How many VMI’s to kill serially	No	1

Expected Behavior

When executed, the scenario will:

Validate that KubeVirt is installed and the target VMI exists
Save the initial state of the VMI
Delete the VMI
Wait for the VMI to become running or hit the timeout
Attempt to recover the VMI:
- If the VMI is managed by a VirtualMachine resource with runStrategy: Always, it will automatically recover
- If automatic recovery doesn’t occur, the plugin will manually recreate the VMI using the saved state
Validate that the VMI is running again

Note

If the VM is managed by a VirtualMachine resource with runStrategy: Always, KubeVirt will automatically try to recreate the VMI after deletion. In this case, the scenario will wait for this automatic recovery to complete.

Validating VMI SSH Connection

While the kubvirt outage is running you can enable kube virt checks to check the ssh connection to a list of VMIs to test if an outage of one VMI effects any others become unready/unconnectable. See more details on how to enable these checks in kubevirt checks

Advanced Use Cases

Testing High Availability VM Configurations

This scenario is particularly useful for testing high availability configurations, such as:

Clustered applications running across multiple VMs
VMs with automatic restart policies
Applications with cross-VM resilience mechanisms

Recovery Strategies

The plugin implements two recovery strategies:

Automated Recovery: If the VM is managed by a VirtualMachine resource with runStrategy: Always, the plugin will wait for KubeVirt’s controller to automatically recreate the VMI.
Manual Recovery: If automatic recovery doesn’t occur within the timeout period, the plugin will attempt to manually recreate the VMI using the saved state from before the deletion.

Recovery Time Metrics in Krkn Telemetry

Krkn tracks three key recovery time metrics for each affected VMI:

pod_rescheduling_time - The time (in seconds) that the Kubernetes cluster took to reschedule the VMI after it was deleted. This measures the cluster’s scheduling efficiency and includes the time from VMI deletion until the replacement VMI is scheduled on a node.
pod_readiness_time - The time (in seconds) the VMI took to become ready after being scheduled. This measures VMI startup time, including container image pulls, VM boot process, and readiness probe success.
total_recovery_time - The total amount of time (in seconds) from VMI deletion until the replacement VMI became fully ready and available. This is the sum of rescheduling time and readiness time.

These metrics appear in the telemetry output under PodsStatus.recovered for successfully recovered VMIs. VMIs that fail to recover within the timeout period appear under PodsStatus.unrecovered without timing data.

Example telemetry output:

{
  "recovered": [
    {
      "pod_name": "virt-launcher-fedora-vm-xyz",
      "namespace": "default",
      "pod_rescheduling_time": 3.2,
      "pod_readiness_time": 12.5,
      "total_recovery_time": 15.7
    }
  ],
  "unrecovered": []
}

Rollback Scenario Support

Krkn supports rollback for KubeVirt VM Outage Scenario. For more details, please refer to the Rollback Scenarios documentation.

Limitations

The scenario currently supports deleting a single VMI at a time
If VM spec changes during the outage window, the manual recovery may not reflect those changes
The scenario doesn’t simulate partial VM failures (e.g., VM freezing) - only complete VM outage

Troubleshooting

If the scenario fails, check the following:

Ensure KubeVirt/CNV is properly installed in your cluster
Verify that the target VMI exists and is running
Check that your credentials have sufficient permissions to delete and create VMIs
Examine the logs for specific error messages

How to Run KubeVirt VM Outage Scenarios

Choose your preferred method to run KubeVirt VM outage scenarios:

KubeVirt VM Outage Scenario in Kraken

The kubevirt_vm_outage scenario in Kraken enables users to simulate VM-level disruptions by deleting a Virtual Machine Instance (VMI) to test resilience and recovery capabilities.

Example scenario file: kubevirt-vm-outage.yaml

Implementation

This scenario is implemented in Kraken’s core repository, with the following key functionality:

Finding and validating the target VMI
Deleting the VMI using the KubeVirt API
Monitoring the recovery process
Implementing fallback recovery if needed

Usage

You can use this scenario in your Kraken configuration file as follows:

scenarios:
  - name: "kubevirt vm outage"
    scenario: kubevirt_vm_outage
    parameters:
      vm_name: <my-application-vm>
      namespace: <vm-workloads>
      timeout: 60
      kill_count: 3

Detailed Parameters

Parameter	Description	Required	Default	Example Values
vm_name	The name of the VMI to delete	Yes	N/A	“database-vm”, “web-server-vm”
namespace	The namespace where the VMI is located	No	“default”	“openshift-cnv”, “vm-workloads”
timeout	How long to wait (in seconds) for VMI to become running before attempting recovery	No	60	30, 120, 300
kill_count	How many VMI’s to kill serially	No	1	3

Execution Flow

When executed, the scenario follows this process:

Initialization: Validates KubeVirt is installed and configures the KubeVirt client
VMI Validation: Checks if the target VMI exists and is in Running state
State Preservation: Saves the initial state of the VMI
Chaos Injection: Deletes the VMI using the KubeVirt API
Wait for Running: Waits for VMI to become running again, up to the timeout specified
Recovery Monitoring: Checks if the VMI is automatically restored
Manual Recovery: If automatic recovery doesn’t occur, manually recreates the VMI
Validation: Confirms the VMI is running correctly

Sample Configuration

Here’s an example configuration for using the kubevirt_vm_outage scenario:

scenarios:
  - name: "kubevirt outage test"
    scenario: kubevirt_vm_outage
    parameters:
      vm_name: my-vm
      namespace: kubevirt
      duration: 60
      kill_count: 3

For multiple VMs in different namespaces:

scenarios:
  - name: "kubevirt outage test - app VM"
    scenario: kubevirt_vm_outage
    parameters:
      vm_name: app-vm
      namespace: application
      duration: 120
      kill_count: 1
  
  - name: "kubevirt outage test - database VM"
    scenario: kubevirt_vm_outage
    parameters:
      vm_name: db-vm
      namespace: database
      duration: 180
      kill_count: 2

Combining with Other Scenarios

For more comprehensive testing, you can combine this scenario with other Kraken scenarios in the list of chaos_scenarios in the config file:

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ...
    chaos_scenarios:
        - hog_scenarios:
            - scenarios/kube/cpu-hog.yml
        -  kubevirt_vm_outage:
               - scenarios/kubevirt/kubevirt-vm-outage.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - kubevirt_vm_outage:
            - scenarios/kubevirt/kubevirt-vm-outage-1.yaml
            - scenarios/kubevirt/kubevirt-vm-outage-2.yaml
            - scenarios/kubevirt/kubevirt-vm-outage-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - kubevirt_vm_outage:
            - scenarios/kubevirt/kubevirt-vm-outage.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - kubevirt_vm_outage:  # Same type can appear multiple times
            - scenarios/kubevirt/kubevirt-vm-outage-2.yaml

Run

python run_kraken.py --config config/config.yaml

This scenario deletes a VMI matching the namespace and name on a Kubernetes/OpenShift cluster.

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:kubevirt-outage
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:kubevirt-outage
$ docker run \
  -e <VARIABLE>=<value> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:kubevirt-outage

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
NAMESPACE	VMI Namespace to target	string	""
VM_NAME	VMI name to delete, supports regex	string	""
TIMEOUT	Timeout to wait for VMI to start running again, will fail if timeout is hit	number	60
KILL_COUNT	Number of VMI’s to kill (will perform serially)	number	1
Note In case of using custom metrics profile or alerts profile when `CAPTURE_METRICS` or `ENABLE_ALERTS` is enabled, mount the metrics profile from the host on which the container is run using podman/docker under `/home/krkn/kraken/config/metrics-aggregated.yaml` and `/home/krkn/kraken/config/alerts`.
For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:kubevirt-outage

krknctl run kubevirt-outage [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters: (be sure to scroll to right)

Parameter	Description	Type	Required	Default
`--namespace`	VMI Namespace to target	string	Yes	default
`--vm-name`	Name of the VM to delete	string	Yes
`--timeout`	Time that scenario will wait for VM to come back	number	No	60
`--kill-count`	Number of VMI’s to kill (will perform serially)	number	No	1

Behavior Notes

VM recovery: After krkn deletes the VM, the KubeVirt controller automatically recreates the VMI unless runStrategy is set to Manual. The --timeout parameter controls how long krkn waits for the VM to come back before reporting failure.

To see all available scenario options

krknctl run kubevirt-outage --help

13 - ManagedCluster Scenarios

ManagedCluster scenarios provide a way to integrate kraken with Open Cluster Management (OCM) and Red Hat Advanced Cluster Management for Kubernetes (ACM).

ManagedCluster scenarios leverage ManifestWorks to inject faults into the ManagedClusters.

The following ManagedCluster chaos scenarios are supported:

managedcluster_start_scenario: Scenario to start the ManagedCluster instance.
managedcluster_stop_scenario: Scenario to stop the ManagedCluster instance.
managedcluster_stop_start_scenario: Scenario to stop and then start the ManagedCluster instance.
start_klusterlet_scenario: Scenario to start the klusterlet of the ManagedCluster instance.
stop_klusterlet_scenario: Scenario to stop the klusterlet of the ManagedCluster instance.
stop_start_klusterlet_scenario: Scenario to stop and start the klusterlet of the ManagedCluster instance.

ManagedCluster scenarios can be injected by placing the ManagedCluster scenarios config files under managedcluster_scenarios option in the Kraken config. Refer to managedcluster_scenarios_example config file.

managedcluster_scenarios:
  - actions:                                                        # ManagedCluster chaos scenarios to be injected
    - managedcluster_stop_start_scenario
    managedcluster_name: cluster1                                   # ManagedCluster on which scenario has to be injected; can set multiple names separated by comma
    # label_selector:                                               # When managedcluster_name is not specified, a ManagedCluster with matching label_selector is selected for ManagedCluster chaos scenario injection
    instance_count: 1                                               # Number of managedcluster to perform action/select that match the label selector
    runs: 1                                                         # Number of times to inject each scenario under actions (will perform on same ManagedCluster each time)
    timeout: 420                                                    # Duration to wait for completion of ManagedCluster scenario injection
                                                                    # For OCM to detect a ManagedCluster as unavailable, have to wait 5*leaseDurationSeconds
                                                                    # (default leaseDurationSeconds = 60 sec)
  - actions:
    - stop_start_klusterlet_scenario
    managedcluster_name: cluster1
    # label_selector:
    instance_count: 1
    runs: 1
    timeout: 60

14 - Network Chaos NG Scenarios

This scenario introduce a new infrastructure to refactor and port the current implementation of the network chaos plugins

Available Scenarios

Network Chaos NG scenarios:

14.1 - Network Chaos API

`AbstractNetworkChaosModule` abstract module class

All the plugins must implement the AbstractNetworkChaosModule abstract class in order to be instantiated and ran by the Netwok Chaos NG plugin. This abstract class implements two main abstract methods:

run(self, target: str, kubecli: KrknTelemetryOpenshift, error_queue: queue.Queue = None) is the entrypoint for each Network Chaos module. If the module is configured to be run in parallel error_queue must not be None
- target: param is the name of the resource (Pod, Node etc.) that will be targeted by the scenario
- kubecli: the KrknTelemetryOpenshift needed by the scenario to access to the krkn-lib methods
- error_queue: a queue that will be used by the plugin to push the errors raised during the execution of parallel modules
get_config(self) -> (NetworkChaosScenarioType, BaseNetworkChaosConfig) returns the common subset of settings shared by all the scenarios BaseNetworkChaosConfig and the type of Network Chaos Scenario that is running (Pod Scenario or Node Scenario)

`BaseNetworkChaosConfig` base module configuration

Is the base class that contains the common parameters shared by all the Network Chaos NG modules.

id is the string name of the Network Chaos NG module
wait_duration if there is more than one network module config in the same config file, the plugin will wait wait_duration seconds before running the following one
test_duration the duration in seconds of the scenario
label_selector the selector used to target the resource
instance_count if greater than 0 picks instance_count elements from the targets selected by the filters randomly
execution if more than one target are selected by the selector the scenario can target the resources both in serial or parallel.
namespace the namespace were the scenario workloads will be deployed
service_account optional service account for the scenario workload (empty string uses the cluster default)
taints : List of taints for which tolerations need to be created. Example: [“node-role.kubernetes.io/master:NoSchedule”]

14.2 - Node Interface Down

Brings one or more network interfaces down on a target node for a configurable duration, then restores them. Can be used to simulate network partitions, NIC failures, or loss of connectivity at the node level.

How to Run Node Interface Down Scenarios

Choose your preferred method to run node interface down scenarios:

Example scenario file: node_interface_down.yaml

Configuration

- id: node_interface_down
  image: quay.io/krkn-chaos/krkn-network-chaos:latest
  wait_duration: 0
  test_duration: 60
  label_selector: "node-role.kubernetes.io/worker="
  instance_count: 1
  execution: parallel
  namespace: default
  # scenario specific settings
  target: ""
  interfaces: []
  recovery_time: 30
  taints: []

For the common module settings please refer to the documentation.

target: the node name to target (used when label_selector is not set)
interfaces: a list of network interface names to bring down (e.g. ["eth0", "bond0"]). Leave empty to auto-detect the node’s default interface
recovery_time: seconds to wait after bringing the interface(s) back up before continuing. Set to 0 to skip the recovery wait

Usage

To enable node interface down scenarios edit the kraken config file, go to the section kraken -> chaos_scenarios of the yaml structure and add a new element to the list named network_chaos_ng_scenarios then add the desired scenario pointing to the scenario yaml file.

kraken:
    ...
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/openshift/node_interface_down.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/openshift/node_interface_down-1.yaml
            - scenarios/openshift/node_interface_down-2.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/openshift/node_interface_down.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml

Run

python run_kraken.py --config config/config.yaml

Run

$ podman run --name=<container_name> --net=host --pull=always --env-host=true -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-interface-down
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

$ docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-interface-down
OR
$ docker run -e <VARIABLE>=<value> --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-interface-down
$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

TIP: Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

ex.) export <parameter_name>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Default
TOTAL_CHAOS_DURATION	Duration in seconds to keep the interface(s) down	60
RECOVERY_TIME	Seconds to wait after bringing the interface(s) back up	0
NODE_SELECTOR	Label selector to choose target nodes. If not specified, a schedulable node will be chosen at random	""
NODE_NAME	The node name to target (used when label selector is not set)
INSTANCE_COUNT	Restricts the number of nodes selected by the label selector	1
EXECUTION	Execution mode for multiple nodes: `serial` or `parallel`	parallel
INTERFACES	Comma-separated list of interface names to bring down (e.g. `eth0` or `eth0,bond0`). Leave empty to auto-detect the default interface	""
NAMESPACE	Namespace where the chaos workload pod will be deployed	default
TAINTS	List of taints for which tolerations need to be created. Example: `["node-role.kubernetes.io/master:NoSchedule"]`	""
SERVICE_ACCOUNT	Optional service account for the chaos workload pod	""

$ podman run --name=<container_name> --net=host --pull=always --env-host=true -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-interface-down

krknctl run node-interface-down [--<parameter> <value>]

Can also set any global variable listed here

Node Interface Down Parameters

Argument	Type	Description	Required	Default Value
`--chaos-duration`	number	Duration in seconds to keep the interface(s) down	false	60
`--recovery-time`	number	Seconds to wait after bringing the interface(s) back up before continuing	false	0
`--node-selector`	string	Label selector to choose target nodes	false	node-role.kubernetes.io/worker=
`--node-name`	string	Node name to target (used when node-selector is not set)	false
`--namespace`	string	Namespace where the chaos workload pod will be deployed	false	default
`--instance-count`	number	Number of nodes to target from those matching the selector	false	1
`--execution`	enum	Execution mode when targeting multiple nodes: `serial` or `parallel`	false	parallel
`--interfaces`	string	Comma-separated list of interface names to bring down. Leave empty to auto-detect the default interface	false
`--image`	string	The chaos workload container image	false	quay.io/redhat-chaos/krkn-ng-tools:latest
`--taints`	string	List of taints for which tolerations need to be created	false

14.3 - Node Network Chaos

Injects network degradation (latency, packet loss, bandwidth) into a target node’s network interfaces using Linux tc rules.

Injects network degradation (latency, packet loss, bandwidth restriction) into a target node’s network interfaces using Linux tc (traffic control) rules. Unlike node-network-filter which blocks specific ports via iptables, this module shapes traffic at the interface level. Includes safety checks for existing tc rules on the node.

How to Run Node Network Chaos Scenarios

Choose your preferred method to run node network chaos scenarios:

Configuration

- id: node_network_chaos
  image: "quay.io/krkn-chaos/krkn-network-chaos:latest"
  wait_duration: 1
  test_duration: 60
  label_selector: ""
  service_account: ""
  instance_count: 1
  execution: parallel
  namespace: default
  # scenario specific settings
  target: "<node_name>"
  interfaces: []
  ingress: true
  egress: true
  latency: ""         # empty string to skip; or e.g. 100ms (units: us, ms, s)
  loss: 10           # percentage (no % symbol)
  bandwidth: 1gbit   # supported units: bit, kbit, mbit, gbit, tbit
  force: false
  taints: []

For the common module settings please refer to the documentation.

latency: network latency to inject. Format: integer followed by us (microseconds), ms (milliseconds), or s (seconds). Example: 100ms. Set to empty string to skip.
loss: packet loss percentage as a plain integer (no % symbol). Example: 10 means 10% packet loss. Set to empty string to skip.
bandwidth: bandwidth limit. Format: integer followed by bit, kbit, mbit, gbit, or tbit. Example: 100mbit. Set to empty string to skip.
interfaces: list of network interface names to target. Leave empty to auto-detect the node’s default interface.
ingress: apply rules to incoming traffic (default: true)
egress: apply rules to outgoing traffic (default: true)
target: the node name to target (used when label_selector is not set)
force: by default (false), if the target node already has tc rules configured, the scenario aborts with a warning to avoid damaging cluster networking. Set to true to override existing rules. A 10-second warning delay is inserted before proceeding. Use with caution.

Usage

To enable node network chaos scenarios edit the kraken config file, go to the section kraken -> chaos_scenarios of the yaml structure and add a new element to the list named network_chaos_ng_scenarios then add the desired scenario pointing to the scenario yaml file.

kraken:
    ...
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/kube/node-network-chaos.yml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/kube/node-network-chaos-1.yml
            - scenarios/kube/node-network-chaos-2.yml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/kube/node-network-chaos.yml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml

Warning

When force is set to false (default), the scenario will check if the target node already has complex tc queueing disciplines configured. If existing rules are detected, the scenario aborts to prevent damaging cluster networking. Only set force: true if you understand the implications of overriding existing traffic control rules.

Run

python run_kraken.py --config config/config.yaml

Not yet supported

node_network_chaos is not currently available as a krkn-hub container image. Use the Krkn tab to run this scenario directly.

Not yet supported

node_network_chaos is not currently available via krknctl. Use the Krkn tab to run this scenario directly.

Example scenario file: node-network-chaos.yml

14.4 - Node Network Filter

Creates iptables rules on one or more nodes to block incoming and outgoing traffic on a port in the node network interface. Can be used to block network based services connected to the node or to block inter-node communication.

How to Run Node Network Filter Scenarios

Choose your preferred method to run node network filter scenarios:

Example scenario file: node-network-filter.yml

Configuration

- id: node_network_filter
  wait_duration: 300
  test_duration: 100
  label_selector: "kubernetes.io/hostname=ip-10-0-39-182.us-east-2.compute.internal"
  instance_count: 1
  execution: parallel
  namespace: 'default'
  # scenario specific settings
  ingress: false
  egress: true
  target: node-name
  interfaces: []
  protocols:
   - tcp
  ports:
    - 2049
  taints: []
  service_account: ""

for the common module settings please refer to the documentation.

ingress: filters incoming traffic on one or more ports
egress: filters outgoing traffic on one or more ports
target: the node name (if label_selector is not set)
interfaces: network interfaces used for outgoing traffic when egress is enabled (same semantics as krknctl and krkn-hub)
ports: ports that incoming and/or outgoing filtering applies to (depending on ingress / egress)
protocols: the IP protocols to filter (tcp and udp)
taints: list of taints for which tolerations are created. Example: ["node-role.kubernetes.io/master:NoSchedule"]
service_account: optional service account for the scenario workload (empty string uses the default)

Usage

To enable hog scenarios edit the kraken config file, go to the section kraken -> chaos_scenarios of the yaml structure and add a new element to the list named network_chaos_ng_scenarios then add the desired scenario pointing to the hog.yaml file.

kraken:
    ...
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/kube/node-network-filter.yml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/kube/node-network-filter-1.yml
            - scenarios/kube/node-network-filter-2.yml
            - scenarios/kube/node-network-filter-3.yml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/kube/node-network-filter.yml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - network_chaos_ng_scenarios:  # Same type can appear multiple times
            - scenarios/kube/node-network-filter-2.yml

Examples

Please refer to the use cases section for some real usage scenarios.

Run

python run_kraken.py --config config/config.yaml

Run

$ podman run --name=<container_name> --net=host --pull=always --env-host=true -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-network-filter
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

$ docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-network-filter
OR 
$ docker run -e <VARIABLE>=<value> --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-network-filter
$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

TIP: Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

ex.) export <parameter_name>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
TOTAL_CHAOS_DURATION	set chaos duration (in sec) as desired	number	60
NODE_SELECTOR	defines the node selector for choosing target nodes. If not specified, one schedulable node in the cluster will be chosen at random. If multiple nodes match the selector, all of them will be subjected to stress.	string	""
NODE_NAME	the node name to target (if label selector not selected)	string
INSTANCE_COUNT	restricts the number of selected nodes by the selector	number	“1”
EXECUTION	sets the execution mode of the scenario on multiple nodes, can be parallel or serial	enum	“parallel”
INGRESS	sets the network filter on incoming traffic, can be true or false	boolean	false
EGRESS	sets the network filter on outgoing traffic, can be true or false	boolean	false
INTERFACES	a list of comma separated names of network interfaces (eg. eth0 or eth0,eth1,eth2) to filter for outgoing traffic	string	""
PORTS	a list of comma separated port numbers (eg 8080 or 8080,8081,8082) to filter for both outgoing and incoming traffic	string	""
PROTOCOLS	a list of comma separated protocols to filter (tcp, udp or both)	string
TAINTS	List of taints for which tolerations need to be created. Example: [“node-role.kubernetes.io/master:NoSchedule”]	string	[]
SERVICE_ACCOUNT	optional service account for the Node Network Filter workload	string	""

$ podman run --name=<container_name> --net=host --pull=always --env-host=true -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-network-filter

krknctl run node-network-filter [--<parameter> <value>]

Can also set any global variable listed here

Node Network Filter Parameters

krknctl marks --ingress and --egress as required flags (you should pass both). Values: at least one of --ingress or --egress must be true; both may be true to filter incoming and outgoing traffic.

Argument	Type	Description	Required	Default Value
`--chaos-duration`	number	Chaos duration in seconds	false	60
`--node-selector`	string	Node label selector (format: `key=value`)	false
`--node-name`	string	Specific node name to target (alternative to node-selector)	false
`--namespace`	string	Namespace where the scenario container is deployed	false	default
`--instance-count`	number	Number of nodes to target when using node-selector	false	1
`--execution`	enum	Execution mode: `parallel` or `serial`	false	parallel
`--ingress`	boolean	Filter incoming traffic (`true` / `false`)	true
`--egress`	boolean	Filter outgoing traffic (`true` / `false`)	true
`--interfaces`	string	Network interfaces for outgoing traffic (comma-separated, e.g. `eth0,eth1`). Optional; empty uses workload defaults	false
`--ports`	string	Network ports to filter traffic (comma-separated, e.g., `8080,8081,8082`)	true
`--image`	string	The network chaos injection workload container image	false	quay.io/krkn-chaos/krkn-network-chaos:latest
`--protocols`	string	Network protocols to filter: `tcp`, `udp`, or `tcp,udp`	false	tcp
`--taints`	string	Comma-separated taints (tolerations are derived for the workload). Same notation as elsewhere in Network Chaos NG docs, e.g. `node-role.kubernetes.io/master:NoSchedule`	false
`--service-account`	string	Service account for the workload (optional)	false

Parameter Format Details

Node Selection:

--node-selector: Label selector in format key=value (e.g., node-role.kubernetes.io/worker=)
--node-name: Specific node name (e.g., ip-10-0-1-100.ec2.internal)
Specify either --node-selector OR --node-name, not both
When using --node-selector, use --instance-count to limit the number of selected nodes

Port Format:

Single port: 8080
Multiple ports: 8080,8081,8082 (comma-separated, no spaces)

Protocol Format:

Valid values: tcp, udp, tcp,udp, or udp,tcp
Default: tcp

Interface Format:

Applies to egress (outgoing) filtering, matching the scenario image metadata
Single interface: eth0
Multiple interfaces: eth0,eth1,eth2 (comma-separated, no spaces)
May be left empty when not needed for your egress rules

Taints Format:

Comma-separated Kubernetes taints; the workload gets matching tolerations
Examples: node-role.kubernetes.io/master:NoSchedule or key=value:NoSchedule when the taint includes a value

Usage Notes

Node targeting: This scenario targets nodes (not pods) and creates iptables rules on the target node(s) to filter network traffic
Ingress/Egress: Pass both flags; at least one must be true. Both may be true to filter incoming and outgoing traffic
Execution modes:
- parallel: Applies network filtering to all selected nodes simultaneously
- serial: Applies network filtering to nodes one at a time

Example Commands

Basic egress filtering (block outgoing traffic):

krknctl run node-network-filter \
  --node-selector node-role.kubernetes.io/worker= \
  --instance-count 1 \
  --ingress false \
  --egress true \
  --ports 8080 \
  --protocols tcp \
  --chaos-duration 120

Ingress + egress filtering (block both incoming and outgoing):

krknctl run node-network-filter \
  --node-name ip-10-0-1-100.ec2.internal \
  --ingress true \
  --egress true \
  --ports 9090,9091 \
  --protocols tcp,udp \
  --interfaces eth0 \
  --chaos-duration 300

Multi-port filtering with parallel execution:

krknctl run node-network-filter \
  --node-selector kubernetes.io/os=linux \
  --instance-count 3 \
  --execution parallel \
  --ingress false \
  --egress true \
  --ports 6379,6380,6381 \
  --protocols tcp \
  --chaos-duration 180

14.5 - Pod Network Chaos

Injects network degradation (latency, packet loss, bandwidth) into a target pod’s network interfaces using Linux tc rules.

Injects network degradation (latency, packet loss, bandwidth restriction) into a target pod’s network interfaces using Linux tc (traffic control) rules. Unlike pod-network-filter which blocks specific ports via iptables, this module shapes traffic at the interface level.

How to Run Pod Network Chaos Scenarios

Choose your preferred method to run pod network chaos scenarios:

Configuration

- id: pod_network_chaos
  image: "quay.io/krkn-chaos/krkn-network-chaos:latest"
  wait_duration: 1
  test_duration: 60
  label_selector: ""
  service_account: ""
  instance_count: 1
  execution: parallel
  namespace: default
  # scenario specific settings
  target: "<pod_name>"
  interfaces: []
  ingress: true
  egress: true
  latency: ""         # empty string to skip; or e.g. 100ms (units: us, ms, s)
  loss: 10           # percentage (no % symbol)
  bandwidth: 1gbit   # supported units: bit, kbit, mbit, gbit, tbit
  taints: []

For the common module settings please refer to the documentation.

latency: network latency to inject. Format: integer followed by us (microseconds), ms (milliseconds), or s (seconds). Example: 100ms. Set to empty string to skip.
loss: packet loss percentage as a plain integer (no % symbol). Example: 10 means 10% packet loss. Set to empty string to skip.
bandwidth: bandwidth limit. Format: integer followed by bit, kbit, mbit, gbit, or tbit. Example: 100mbit. Set to empty string to skip.
interfaces: list of network interface names to target. Leave empty to auto-detect the pod’s default interface.
ingress: apply rules to incoming traffic (default: true)
egress: apply rules to outgoing traffic (default: true)
target: the pod name to target (used when label_selector is not set)

Usage

To enable pod network chaos scenarios edit the kraken config file, go to the section kraken -> chaos_scenarios of the yaml structure and add a new element to the list named network_chaos_ng_scenarios then add the desired scenario pointing to the scenario yaml file.

kraken:
    ...
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/kube/pod-network-chaos.yml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/kube/pod-network-chaos-1.yml
            - scenarios/kube/pod-network-chaos-2.yml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/kube/pod-network-chaos.yml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml

Run

python run_kraken.py --config config/config.yaml

Not yet supported

pod_network_chaos is not currently available as a krkn-hub container image. Use the Krkn tab to run this scenario directly.

Not yet supported

pod_network_chaos is not currently available via krknctl. Use the Krkn tab to run this scenario directly.

Example scenario file: pod-network-chaos.yml

14.6 - Pod Network Filter

Creates iptables rules on one or more pods to block incoming and outgoing traffic on a port in the pod network interface. Can be used to block network based services connected to the pod or to block inter-pod communication.

How to Run Pod Network Filter Scenarios

Choose your preferred method to run pod network filter scenarios:

Example scenario file: pod-network-filter.yml

Configuration

- id: pod_network_filter
  wait_duration: 300
  test_duration: 100
  label_selector: "app=label"
  instance_count: 1
  execution: parallel
  namespace: 'default'
  # scenario specific settings
  ingress: false
  egress: true
  target: 'pod-name'
  interfaces: []
  protocols:
    - tcp
  ports:
    - 80
  taints: []

for the common module settings please refer to the documentation.

ingress: filters the incoming traffic on one or more ports. If set one or more network interfaces must be specified
egress : filters the outgoing traffic on one or more ports.
target: the pod name (if label_selector not set)
interfaces: a list of network interfaces where the incoming traffic will be filtered
ports: the list of ports that will be filtered
protocols: the ip protocols to filter (tcp and udp)
taints : List of taints for which tolerations need to be created. Example: [“node-role.kubernetes.io/master:NoSchedule”]

Usage

kraken:
    ...
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/kube/pod-network-filter.yml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/kube/pod-network-filter-1.yml
            - scenarios/kube/pod-network-filter-2.yml
            - scenarios/kube/pod-network-filter-3.yml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/kube/pod-network-filter.yml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - network_chaos_ng_scenarios:  # Same type can appear multiple times
            - scenarios/kube/pod-network-filter-2.yml

Examples

Please refer to the use cases section for some real usage scenarios.

Run

python run_kraken.py --config config/config.yaml

Run

$ podman run --name=<container_name> --net=host --pull=always --env-host=true -v <path-to-kube-config>:/home/krkn/.kube/config:z -d quay.io/krkn-chaos/krkn-hub:pod-network-filter
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

$ docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:z -d quay.io/krkn-chaos/krkn-hub:pod-network-filter
OR 
$ docker run -e <VARIABLE>=<value> --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:z -d quay.io/krkn-chaos/krkn-hub:pod-network-filter
$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

TIP: Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

ex.) export <parameter_name>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
TOTAL_CHAOS_DURATION	set chaos duration (in sec) as desired	number	60
POD_SELECTOR	defines the pod selector for choosing target pods. If multiple pods match the selector, all of them will be subjected to stress.	string	""
POD_NAME	the pod name to target (if POD_SELECTOR not specified)	string
INSTANCE_COUNT	restricts the number of selected pods by the selector	number	“1”
EXECUTION	sets the execution mode of the scenario on multiple pods, can be parallel or serial	enum	“parallel”
INGRESS	sets the network filter on incoming traffic, can be true or false	boolean	false
EGRESS	sets the network filter on outgoing traffic, can be true or false	boolean	true
INTERFACES	a list of comma separated names of network interfaces (eg. eth0 or eth0,eth1,eth2) to filter for outgoing traffic	string	""
PORTS	a list of comma separated port numbers (eg 8080 or 8080,8081,8082) to filter for both outgoing and incoming traffic	string	""
PROTOCOLS	a list of comma separated network protocols (tcp, udp or both of them e.g. tcp,udp)	string	“tcp”
NAMESPACE	namespace where the scenario container will be deployed	string	default
IMAGE	the network chaos injection workload container image	string	quay.io/krkn-chaos/krkn-network-chaos:latest
TAINTS	List of taints for which tolerations need to be created. Example: [“node-role.kubernetes.io/master:NoSchedule”]	string	[]
SERVICE_ACCOUNT	optional service account for the Pod Network Filter workload	string	""

$ podman run --name=<container_name> --net=host --pull=always --env-host=true -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:pod-network-filter

krknctl run pod-network-filter [--<parameter> <value>]

Can also set any global variable listed here

Argument	Type	Description	Required	Default Value
`--chaos-duration`	number	Chaos Duration	false	60
`--pod-selector`	string	Pod Selector	false
`--pod-name`	string	Pod Name	false
`--namespace`	string	Namespace	false	default
`--instance-count`	number	Number of instances to target	false	1
`--execution`	enum	Execution mode	false
`--ingress`	boolean	Filter incoming traffic	true
`--egress`	boolean	Filter outgoing traffic	true
`--interfaces`	string	Network interfaces to filter outgoing traffic (if more than one separated by comma)	false
`--ports`	string	Network ports to filter traffic (if more than one separated by comma)	true
`--image`	string	The network chaos injection workload container image	false	quay.io/krkn-chaos/krkn-network-chaos:latest
`--protocols`	string	The network protocols that will be filtered	false	tcp
`--taints`	string	Comma-separated taints (tolerations are derived for the workload), e.g. `node-role.kubernetes.io/master:NoSchedule`	false
`--service-account`	string	Service account for the Pod Network Filter workload (optional)	false

Parameter Format Details

Pod Selection:

--pod-selector: Label selector in format key=value (e.g., app=myapp)
--pod-name: Specific pod name (alternative to pod-selector)
Specify either --pod-selector OR --pod-name, not both
When using --pod-selector, use --instance-count to limit the number of selected pods

Port Format:

Single port: 8080
Multiple ports: 8080,8081,8082 (comma-separated, no spaces)

Protocol Format:

Valid values: tcp, udp, tcp,udp, or udp,tcp
Default: tcp

Interface Format:

Single interface: eth0
Multiple interfaces: eth0,eth1,eth2 (comma-separated, no spaces)

Example Commands

Basic egress filtering (block outgoing traffic on a port):

krknctl run pod-network-filter \
  --pod-selector app=myapp \
  --namespace default \
  --ingress false \
  --egress true \
  --ports 8080 \
  --protocols tcp \
  --chaos-duration 120

Ingress + egress filtering (block both directions):

krknctl run pod-network-filter \
  --pod-name my-pod-abc123 \
  --namespace my-namespace \
  --ingress true \
  --egress true \
  --ports 9090,9091 \
  --protocols tcp,udp \
  --chaos-duration 300

Multi-pod filtering with parallel execution:

krknctl run pod-network-filter \
  --pod-selector app=redis \
  --namespace redis-cluster \
  --instance-count 3 \
  --execution parallel \
  --ingress false \
  --egress true \
  --ports 6379,6380 \
  --protocols tcp \
  --chaos-duration 180

14.7 - VMI Network Chaos

Injects network degradation into a KubeVirt Virtual Machine Instance (VMI) by shaping traffic on the VM's tap interface inside the virt-launcher network namespace. Supports configurable bandwidth limiting, latency injection, and packet loss. Unlike node or pod network chaos, this scenario targets the tap device that connects QEMU to the bridge, so only the specific VMI is affected without disrupting OVN's BFD heartbeats or other workloads on the same node.

How to Run VMI Network Chaos Scenarios

Choose your preferred method to run VMI network chaos scenarios:

Example scenario file: virt_network_chaos.yaml

Configuration

- id: vmi_network_chaos
  image: "quay.io/krkn-chaos/krkn-network-chaos:latest"
  wait_duration: 300
  test_duration: 120
  label_selector: ""
  service_account: ""
  taints: []
  namespace: "my-namespace"
  instance_count: 1
  execution: serial
  target: ".*"
  interfaces: []
  ingress: true
  egress: true
  latency: "100ms"
  loss: "10"
  bandwidth: "100mbit"

For the common module settings please refer to the documentation.

target: regex to match VMI names within the namespace (e.g. "<vmi-name-prefix>-.*" or ".*" for all)
namespace: namespace containing the target VMIs (required; also supports regex to match multiple namespaces)
interfaces: list of tap interface names to target. Leave empty to auto-detect the tap device in the virt-launcher network namespace
ingress: shape incoming traffic to the VM
egress: shape outgoing traffic from the VM
latency: artificial network latency added to packets (e.g. "100ms", "500ms")
loss: percentage of packets to drop (e.g. "10" for 10%, "50" for 50%)
bandwidth: maximum throughput cap (e.g. "100mbit", "1gbit", "500kbit")

Note

At least one of latency, loss, or bandwidth should be set. Setting all three simultaneously compounds the degradation.

Catastrophic Configurations

The following combinations produce the most impactful chaos:

Complete network degradation (maximum chaos):

  latency: "2000ms"
  loss: "50"
  bandwidth: "1mbit"

Combines severe latency with heavy packet loss and near-complete bandwidth exhaustion.

DNS blackout via latency (cascading failures):

  latency: "5000ms"
  loss: "0"
  bandwidth: ""

5-second latency causes DNS timeouts across every service in the VM, producing cascading failures without a hard cut.

Bandwidth starvation:

  latency: ""
  loss: "0"
  bandwidth: "100kbit"

Throttles the VMI to 100 kbit/s — enough to keep connections alive but too slow for most application traffic.

Usage

To enable VMI network chaos scenarios edit the kraken config file, go to the section kraken -> chaos_scenarios of the yaml structure and add a new element to the list named network_chaos_ng_scenarios then add the desired scenario pointing to the scenario yaml file.

kraken:
    ...
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/openshift/virt_network_chaos.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/openshift/virt_network_chaos.yaml
            - scenarios/openshift/virt_network_chaos_2.yaml

You can also combine multiple different scenario types in the same config.yaml file:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/openshift/virt_network_chaos.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml

Run

python run_kraken.py --config config/config.yaml

Run

$ podman run --name=<container_name> --net=host --pull=always --env-host=true -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-chaos
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

$ docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-chaos
OR
$ docker run -e <VARIABLE>=<value> --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-chaos
$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

TIP: Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~/kubeconfig:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-chaos

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

ex.) export <parameter_name>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
TOTAL_CHAOS_DURATION	Chaos duration in seconds	number	120
NAMESPACE	Namespace containing the target VMIs (required)	string
VMI_NAME	Regex to match VMI names (e.g. `virt-server-.` or `.` for all)	string	`.*`
LABEL_SELECTOR	Label selector to filter VMIs (e.g. `app=myapp`)	string	`""`
INSTANCE_COUNT	Maximum number of VMIs to target	number	1
EXECUTION	Execution mode: `serial` or `parallel`	enum	`serial`
INGRESS	Shape incoming traffic to the VM	boolean	true
EGRESS	Shape outgoing traffic from the VM	boolean	true
INTERFACES	Comma-separated tap interface names (empty to auto-detect)	string	`""`
LATENCY	Artificial latency added to packets (e.g. `100ms`, `500ms`)	string	`""`
LOSS	Packet loss percentage (e.g. `10` for 10%)	string	`""`
BANDWIDTH	Maximum throughput cap (e.g. `100mbit`, `1gbit`)	string	`""`
WAIT_DURATION	Seconds to wait before running the next scenario in the same file	number	300
IMAGE	Network chaos injection workload image	string	`quay.io/krkn-chaos/krkn-network-chaos:latest`
TAINTS	List of taints for which tolerations are created (e.g. `["node-role.kubernetes.io/master:NoSchedule"]`)	string	`[]`
SERVICE_ACCOUNT	Optional service account for the scenario workload	string	`""`

$ podman run --name=<container_name> --net=host --pull=always --env-host=true -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-chaos

krknctl run vmi-network-chaos [--<parameter> <value>]

Can also set any global variable listed here

VMI Network Chaos Parameters

Argument	Type	Description	Required	Default Value
`--chaos-duration`	number	Chaos duration in seconds	false	120
`--namespace`	string	Namespace containing the target VMIs	true
`--target`	string	Regex to match VMI names (e.g. `<vmi-name-prefix>-.` or `.` for all)	false	`.*`
`--label-selector`	string	Label selector to filter VMIs (e.g. `app=myapp`)	false
`--instance-count`	number	Maximum number of VMIs to target	false	1
`--execution`	enum	Execution mode: `parallel` or `serial`	false	serial
`--ingress`	boolean	Shape incoming traffic to the VM	false	true
`--egress`	boolean	Shape outgoing traffic from the VM	false	true
`--interfaces`	string	Comma-separated tap interface names (empty to auto-detect)	false
`--latency`	string	Artificial latency added to packets (e.g. `100ms`, `500ms`)	false
`--loss`	string	Packet loss percentage (e.g. `10` for 10%)	false
`--bandwidth`	string	Maximum throughput cap (e.g. `100mbit`, `1gbit`, `500kbit`)	false
`--image`	string	Network chaos injection workload image	false	quay.io/krkn-chaos/krkn-network-chaos:latest
`--taints`	string	Comma-separated taints for which tolerations are created (e.g. `node-role.kubernetes.io/master:NoSchedule`)	false
`--service-account`	string	Optional service account for the scenario workload	false
`--wait-duration`	number	Seconds to wait before running the next scenario in the same file	false	300

Parameter Format Details

VMI Selection:

--namespace: required; supports regex to match multiple namespaces (e.g. virt-density-.*)
--target: regex matched against VMI names (e.g. <vmi-name-prefix>-.* targets all VMIs whose name starts with that prefix)
--label-selector: Kubernetes label selector in key=value format
Use --instance-count to limit how many matching VMIs are targeted

Traffic Shaping Values:

--latency: any value accepted by Linux tc netem delay (e.g. 100ms, 1s, 500ms)
--loss: integer percentage without the % symbol (e.g. 10 = 10%)
--bandwidth: any value accepted by Linux tc HTB rate (e.g. 100mbit, 1gbit, 500kbit)
At least one of --latency, --loss, or --bandwidth should be set

Interface Detection:

Leave --interfaces empty to let the scenario auto-detect the tap device inside the virt-launcher network namespace
Specify explicitly (e.g. tap0) only if auto-detection fails or you want to target a specific interface

Example Commands

Add latency and packet loss to all VMIs in a namespace:

krknctl run vmi-network-chaos \
  --namespace <namespace> \
  --target ".*" \
  --latency 100ms \
  --loss 10 \
  --chaos-duration 120

Bandwidth cap on a specific VMI:

krknctl run vmi-network-chaos \
  --namespace <namespace> \
  --target "<vmi-name>" \
  --bandwidth 1mbit \
  --ingress true \
  --egress true \
  --chaos-duration 300

Catastrophic combined degradation:

krknctl run vmi-network-chaos \
  --namespace <namespace> \
  --target "<vmi-name-prefix>-.*" \
  --instance-count 3 \
  --execution parallel \
  --latency 2000ms \
  --loss 50 \
  --bandwidth 1mbit \
  --chaos-duration 180

DNS blackout simulation (high latency, no packet drop):

krknctl run vmi-network-chaos \
  --namespace <namespace> \
  --target ".*" \
  --latency 5000ms \
  --chaos-duration 60

14.8 - VMI Network Filter

Injects iptables-based network filtering into a KubeVirt Virtual Machine Instance (VMI) by applying INPUT and OUTPUT rules inside the virt-launcher network namespace via nsenter. Supports port and protocol-specific filtering so you can selectively block DNS, SSH, HTTP, or any other traffic without cutting all connectivity. The tap interface (tap0) is targeted directly so only the specific VMI is isolated, leaving OVN's BFD heartbeats and other node workloads unaffected.

How to Run VMI Network Filter Scenarios

Choose your preferred method to run VMI network filter scenarios:

Example scenario file: virt_network.yaml

Configuration

- id: vmi_network_filter
  image: "quay.io/krkn-chaos/krkn-network-chaos:latest"
  wait_duration: 300
  test_duration: 120
  label_selector: ""
  service_account: ""
  taints: []
  namespace: "my-namespace"
  instance_count: 1
  execution: serial
  target: ".*"
  interfaces: []
  ingress: true
  egress: true

For the common module settings please refer to the documentation.

target: regex to match VMI names within the namespace (e.g. "<vmi-name-prefix>-.*" or ".*" for all)
namespace: namespace containing the target VMIs (required; also supports regex to match multiple namespaces)
interfaces: list of tap interface names to target. Leave empty to auto-detect the tap device in the virt-launcher network namespace
ingress: apply iptables DROP rules to incoming traffic
egress: apply iptables DROP rules to outgoing traffic
ports: list of ports to block (omit or leave empty to block all ports)
protocols: list of IP protocols to filter — tcp, udp, or both (defaults to ["tcp", "udp"])

Note

ports and protocols are optional. When ports is omitted or empty, all traffic on the specified protocols is blocked — equivalent to full network isolation.

Catastrophic Configurations

Full network isolation (most catastrophic):

  ingress: true
  egress: true
  # no ports or protocols — blocks all TCP and UDP

Complete network cut to the VMI.

DNS blackout (cascading failures):

  ingress: true
  egress: true
  protocols:
    - tcp
    - udp
  ports:
    - 53

Blocking DNS (port 53) causes every service inside the VM that resolves hostnames to fail with timeouts. Cascading failures across the application stack without a hard cut — often the most realistic chaos scenario.

Management plane loss:

  ingress: true
  egress: true
  protocols:
    - tcp
  ports:
    - 22
    - 443
    - 6443

Blocks SSH, HTTPS, and the Kubernetes API server. The VM stays running but is unreachable for management and API calls.

Application layer only:

  ingress: true
  egress: true
  protocols:
    - tcp
  ports:
    - 80
    - 443
    - 8080
    - 8443

Kills HTTP/HTTPS traffic only — tests application resilience without taking the entire VM offline.

Usage

To enable VMI network filter scenarios edit the kraken config file, go to the section kraken -> chaos_scenarios of the yaml structure and add a new element to the list named network_chaos_ng_scenarios then add the desired scenario pointing to the scenario yaml file.

kraken:
    ...
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/openshift/virt_network.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/openshift/virt_network.yaml
            - scenarios/openshift/virt_network_2.yaml

You can also combine multiple different scenario types in the same config.yaml file:

kraken:
    chaos_scenarios:
        - network_chaos_ng_scenarios:
            - scenarios/openshift/virt_network.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml

Run

python run_kraken.py --config config/config.yaml

Run

$ podman run --name=<container_name> --net=host --pull=always --env-host=true -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-filter
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

$ docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-filter
OR
$ docker run -e <VARIABLE>=<value> --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-filter
$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

TIP: Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~/kubeconfig:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-filter

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

ex.) export <parameter_name>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
TOTAL_CHAOS_DURATION	Chaos duration in seconds	number	120
NAMESPACE	Namespace containing the target VMIs (required)	string
VMI_NAME	Regex to match VMI names (e.g. `virt-server-.` or `.` for all)	string	`.*`
LABEL_SELECTOR	Label selector to filter VMIs (e.g. `app=myapp`)	string	`""`
INSTANCE_COUNT	Maximum number of VMIs to target	number	1
EXECUTION	Execution mode: `serial` or `parallel`	enum	`serial`
INGRESS	Apply DROP rules to incoming traffic	boolean	true
EGRESS	Apply DROP rules to outgoing traffic	boolean	true
INTERFACES	Comma-separated tap interface names (empty to auto-detect)	string	`""`
PORTS	Comma-separated port numbers to block (empty = all ports)	string	`""`
PROTOCOLS	Comma-separated protocols to filter: `tcp`, `udp`, or both	string	`tcp,udp`
WAIT_DURATION	Seconds to wait before running the next scenario in the same file	number	300
IMAGE	Network chaos injection workload image	string	`quay.io/krkn-chaos/krkn-network-chaos:latest`
TAINTS	List of taints for which tolerations are created (e.g. `["node-role.kubernetes.io/master:NoSchedule"]`)	string	`[]`
SERVICE_ACCOUNT	Optional service account for the scenario workload	string	`""`

$ podman run --name=<container_name> --net=host --pull=always --env-host=true -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:vmi-network-filter

krknctl run vmi-network-filter [--<parameter> <value>]

Can also set any global variable listed here

VMI Network Filter Parameters

Argument	Type	Description	Required	Default Value
`--chaos-duration`	number	Chaos duration in seconds	false	120
`--namespace`	string	Namespace containing the target VMIs	true
`--target`	string	Regex to match VMI names (e.g. `<vmi-name-prefix>-.` or `.` for all)	false	`.*`
`--label-selector`	string	Label selector to filter VMIs (e.g. `app=myapp`)	false
`--instance-count`	number	Maximum number of VMIs to target	false	1
`--execution`	enum	Execution mode: `parallel` or `serial`	false	serial
`--ingress`	boolean	Apply DROP rules to incoming traffic	false	true
`--egress`	boolean	Apply DROP rules to outgoing traffic	false	true
`--interfaces`	string	Comma-separated tap interface names (empty to auto-detect)	false
`--ports`	string	Comma-separated port numbers to block (e.g. `53`, `22,443,6443`). Empty = all ports	false
`--protocols`	string	Protocols to filter: `tcp`, `udp`, or `tcp,udp`	false	`tcp,udp`
`--image`	string	Network chaos injection workload image	false	quay.io/krkn-chaos/krkn-network-chaos:latest
`--taints`	string	Comma-separated taints for which tolerations are created (e.g. `node-role.kubernetes.io/master:NoSchedule`)	false
`--service-account`	string	Optional service account for the scenario workload	false
`--wait-duration`	number	Seconds to wait before running the next scenario in the same file	false	300

Parameter Format Details

VMI Selection:

--namespace: required; supports regex to match multiple namespaces (e.g. virt-density-.*)
--target: regex matched against VMI names (e.g. <vmi-name-prefix>-.* targets all VMIs whose name starts with that prefix)
Use --instance-count to limit how many matching VMIs are targeted

Port and Protocol Format:

--ports: comma-separated integers, no spaces (e.g. 53 or 22,443,6443). Omit to block all ports
--protocols: tcp, udp, or tcp,udp. Defaults to both

Interface Detection:

Leave --interfaces empty to let the scenario auto-detect the tap device inside the virt-launcher network namespace
Specify explicitly (e.g. tap0) only if auto-detection fails

Example Commands

DNS blackout (most impactful cascading failure):

krknctl run vmi-network-filter \
  --namespace <namespace> \
  --target ".*" \
  --ports 53 \
  --protocols tcp,udp \
  --ingress true \
  --egress true \
  --chaos-duration 120

Full network isolation:

krknctl run vmi-network-filter \
  --namespace <namespace> \
  --target "<vmi-name>" \
  --ingress true \
  --egress true \
  --chaos-duration 60

Management plane loss (SSH + API):

krknctl run vmi-network-filter \
  --namespace <namespace> \
  --target "<vmi-name-prefix>-.*" \
  --instance-count 2 \
  --ports 22,443,6443 \
  --protocols tcp \
  --ingress true \
  --egress true \
  --chaos-duration 300

Application layer only (HTTP/HTTPS):

krknctl run vmi-network-filter \
  --namespace <namespace> \
  --target ".*" \
  --execution parallel \
  --ports 80,443,8080,8443 \
  --protocols tcp \
  --ingress true \
  --egress true \
  --chaos-duration 180

15 - Network Chaos Scenario

Scenario to introduce network latency, packet loss, and bandwidth restriction in the Node's host network interface. The purpose of this scenario is to observe faults caused by random variations in the network.

How to Run Network Chaos Scenarios

Choose your preferred method to run network chaos scenarios:

Example scenario files from scenarios-hub:

Sample scenario config for egress traffic shaping

network_chaos:                                    # Scenario to create an outage by simulating random variations in the network.
  duration: 300                                   # In seconds - duration network chaos will be applied.
  node_name:                                      # Comma separated node names on which scenario has to be injected.
  label_selector: node-role.kubernetes.io/master  # When node_name is not specified, a node with matching label_selector is selected for running the scenario.
  instance_count: 1                               # Number of nodes in which to execute network chaos.
  interfaces:                                     # List of interface on which to apply the network restriction.
  - "ens5"                                        # Interface name would be the Kernel host network interface name.
  execution: serial                               # Default: serial. Options: serial, parallel. Execute each of the egress options as a single scenario(parallel) or as separate scenario(serial).
  egress:
    latency: 500ms
    loss: 2                                      # 2% packet loss (value is a percentage, e.g. 50 = 50%)
    bandwidth: 10mbit
  image: quay.io/krkn-chaos/krkn:tools

Sample scenario config for ingress traffic shaping (using a plugin)

- id: network_chaos
  config:
    node_interface_name:                            # Dictionary with key as node name(s) and value as a list of its interfaces to test
      ip-10-0-128-153.us-west-2.compute.internal:
        - ens5
        - genev_sys_6081
    label_selector: node-role.kubernetes.io/master  # When node_interface_name is not specified, nodes with matching label_selector is selected for node chaos scenario injection
    instance_count: 1                               # Number of nodes to perform action/select that match the label selector
    kubeconfig_path: ~/.kube/config                 # Path to kubernetes config file. If not specified, it defaults to ~/.kube/config
    execution_type: parallel                        # Execute each of the ingress options as a single scenario(parallel) or as separate scenario(serial).
    network_params:
        latency: 500ms
        loss: '2'                               # 2% packet loss (value is a percentage, must be quoted)
        bandwidth: 10mbit
    wait_duration: 120
    test_duration: 60
    image: quay.io/krkn-chaos/krkn:tools

Note: For ingress traffic shaping, ensure that your node doesn’t have any IFB interfaces already present. The scenario relies on creating IFBs to do the shaping, and they are deleted at the end of the scenario.

Steps

Pick the nodes to introduce the network anomaly either from node_name or label_selector.
Verify interface list in one of the nodes or use the interface with a default route, as test interface, if no interface is specified by the user.
Set traffic shaping config on node’s interface using tc and netem.
Wait for the duration time.
Remove traffic shaping config on node’s interface.
Remove the job that spawned the pod.

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - network_chaos_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - network_chaos_scenarios:
            - scenarios/network-chaos-1.yaml
            - scenarios/network-chaos-2.yaml
            - scenarios/network-chaos-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - network_chaos_scenarios:
            - scenarios/network-chaos.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - container_scenarios:
            - scenarios/container-kill.yaml
        - network_chaos_scenarios:  # Same type can appear multiple times
            - scenarios/network-chaos-2.yaml

Run

python run_kraken.py --config config/config.yaml

This scenario introduces network latency, packet loss, bandwidth restriction in the egress traffic of a Node’s interface using the tc and Netem. For more information refer the following documentation.

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:network-chaos
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run \
  -e <VARIABLE>=<value> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:network-chaos

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

Note

export TRAFFIC_TYPE=egress for Egress scenarios and export TRAFFIC_TYPE=ingress for Ingress scenarios

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Egress Scenarios

Parameter	Description	Default
DURATION	Duration in seconds - during with network chaos will be applied.	300
IMAGE	Image used to disrupt network on a pod	quay.io/krkn-chaos/krkn:tools
NODE_NAME	Node name to inject faults in case of targeting a specific node; Can set multiple node names separated by a comma	""
LABEL_SELECTOR	When NODE_NAME is not specified, a node with matching label_selector is selected for running.	node-role.kubernetes.io/master
INSTANCE_COUNT	Targeted instance count matching the label selector	1
INTERFACES	List of interface on which to apply the network restriction.	[]
EXECUTION	Execute each of the egress option as a single scenario(parallel) or as separate scenario(serial).	parallel
EGRESS	Dictonary of values to set network latency(latency: 50ms), packet loss(loss: 0.02), bandwidth restriction(bandwidth: 100mbit)	{bandwidth: 100mbit}

Ingress Scenarios

Parameter	Description	Default
DURATION	Duration in seconds - during with network chaos will be applied.	300
IMAGE	Image used to disrupt network on a pod	quay.io/krkn-chaos/krkn:tools
TARGET_NODE_AND_INTERFACE	# Dictionary with key as node name(s) and value as a list of its interfaces to test. For example: {ip-10-0-216-2.us-west-2.compute.internal: [ens5]}	""
LABEL_SELECTOR	When NODE_NAME is not specified, a node with matching label_selector is selected for running.	node-role.kubernetes.io/master
INSTANCE_COUNT	Targeted instance count matching the label selector	1
EXECUTION	Used to specify whether you want to apply filters on interfaces one at a time or all at once.	parallel
NETWORK_PARAMS	latency, loss and bandwidth are the three supported network parameters to alter for the chaos test. For example: {latency: 50ms, loss: ‘0.02’}	""
WAIT_DURATION	Ensure that it is at least about twice of test_duration	300

Note

For disconnected clusters, be sure to also mirror the helper image of quay.io/krkn-chaos/krkn:tools and set the mirrored image path properly

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:network-chaos

krknctl run network-chaos [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Required	Default
`--traffic-type`	Selects the network chaos scenario type can be ingress or egress	enum	Yes	ingress \| egress
`--image`	Image used to disrupt network on a pod	string	No	quay.io/krkn-chaos/krkn:tools
`--duration`	Duration in seconds - during with network chaos will be applied.	number	No	300
`--label-selector`	When NODE_NAME is not specified, a node with matching label_selector is selected for running.	string	No	node-role.kubernetes.io/master
`--execution`	Execute each of the egress option as a single scenario(parallel) or as separate scenario(serial).	enum	No	parallel
`--instance-count`	Targeted instance count matching the label selector.	number	No	1
`--node-name`	Node name to inject faults in case of targeting a specific node; Can set multiple node names separated by a comma	string	No
`--interfaces`	List of interface on which to apply the network restriction. eg. [eth0,eth1,eth2]	string	No	[]
`--egress`	Dictonary of values to set network latency(latency: 50ms), packet loss(loss: 0.02), bandwidth restriction(bandwidth: 100mbit) eg. {bandwidth: 100mbit}	string	No	“{bandwidth: 100mbit}”
`--target-node-interface`	Dictionary with key as node name(s) and value as a list of its interfaces to test. For example: {ip-10-0-216-2.us-west-2.compute.internal: [ens5]}	string	No
`--network-params`	latency, loss and bandwidth are the three supported network parameters to alter for the chaos test. For example: {latency: 50ms, loss: 0.02}	string	No
`--wait-duration`	Ensure that it is at least about twice of test_duration	number	No	300

Parameter Dependencies

--node-name: Egress only. Ignored when --traffic-type is ingress.
--network-params and --target-node-interface: Ingress only. Ignored when --traffic-type is egress.
--wait-duration: Must be at least 2× --duration to allow the network to stabilize before verification.

Behavior Notes

Empty --interfaces: When left empty [], krkn auto-detects the primary network interface on the target node using the default route. If specified, each interface is validated against the node’s actual interfaces before applying chaos.

To see all available scenario options

krknctl run network-chaos --help

16 - Node Scenarios

This scenario disrupts the node(s) matching the label or node name(s) on a Kubernetes/OpenShift cluster. These scenarios are performed in two different ways, either by the clusters cloud cli or by common/generic commands that can be performed on any cluster.

Actions

The following node chaos scenarios are supported:

node_start_scenario: Scenario to start the node instance. Need access to cloud provider
node_stop_scenario: Scenario to stop the node instance. Need access to cloud provider
node_stop_start_scenario: Scenario to stop and then start the node instance. Not supported on VMware. Need access to cloud provider
node_termination_scenario: Scenario to terminate the node instance. Need access to cloud provider
node_reboot_scenario: Scenario to reboot the node instance. Need access to cloud provider
stop_kubelet_scenario: Scenario to stop the kubelet of the node instance. Need access to cloud provider
stop_start_kubelet_scenario: Scenario to stop and start the kubelet of the node instance. Need access to cloud provider
restart_kubelet_scenario: Scenario to restart the kubelet of the node instance. Can be used with generic cloud type or when you don’t have access to cloud provider
node_crash_scenario: Scenario to crash the node instance. Can be used with generic cloud type or when you don’t have access to cloud provider
stop_start_helper_node_scenario: Scenario to stop and start the helper node and check service status. Need access to cloud provider
node_block_scenario: Scenario to block inbound and outbound traffic from other nodes to a specific node for a set duration (only for Azure). Need access to cloud provider
node_disk_detach_attach_scenario: Scenario to detach and reattach disks (only for baremetals).

Clouds

Supported cloud supported:

Note

If the node does not recover from the node_crash_scenario injection, reboot the node to get it back to Ready state.

Note

node_start_scenario, node_stop_scenario, node_stop_start_scenario, node_termination_scenario, node_reboot_scenario and stop_start_kubelet_scenario are supported on

AWS
Azure
OpenStack
BareMetal
GCP
VMware
Alibaba
IbmCloud
IbmCloudPower

Recovery Times

In each node scenario, the end telemetry details of the run will show the time it took for each node to stop and recover depening on the scenario.

The details printed in telemetry:

node_name: Node name
node_id: Node id
not_ready_time: Amount of time the node took to get to a not ready state after cloud provider has stopped node
ready_time: Amount of time the node took to get to a ready state after cloud provider has become in started state
stopped_time: Amount of time the cloud provider took to stop a node
running_time: Amount of time the cloud provider took to get a node running
terminating_time: Amount of time the cloud provider took for node to become terminated

Example:

"affected_nodes": [
    {
        "node_name": "cluster-name-**.438115.internal",
        "node_id": "cluster-name-**",
        "not_ready_time": 0.18194103240966797,
        "ready_time": 0.0,
        "stopped_time": 140.74104499816895,
        "running_time": 0.0,
        "terminating_time": 0.0
    },
    {
        "node_name": "cluster-name-**-master-0.438115.internal",
        "node_id": "cluster-name-**-master-0",
        "not_ready_time": 0.1611928939819336,
        "ready_time": 0.0,
        "stopped_time": 146.72056317329407,
        "running_time": 0.0,
        "terminating_time": 0.0
    },
    {
        "node_name": "cluster-name-**.438115.internal",
        "node_id": "cluster-name-**",
        "not_ready_time": 0.0,
        "ready_time": 43.521320104599,
        "stopped_time": 0.0,
        "running_time": 12.305592775344849,
        "terminating_time": 0.0
    },
    {
        "node_name": "cluster-name-**-master-0.438115.internal",
        "node_id": "cluster-name-**-master-0",
        "not_ready_time": 0.0,
        "ready_time": 48.33336925506592,
        "stopped_time": 0.0,
        "running_time": 12.052034854888916,
        "terminating_time": 0.0
    }
]

How to Run Node Scenarios

Choose your preferred method to run node scenarios:

For any of the node scenarios, you’ll specify node_scenarios as the scenario type.

Example scenario files from scenarios-hub:

See example config here:

    chaos_scenarios:
        - node_scenarios: # List of chaos node scenarios to load
            - scenarios/***.yml
            - scenarios/***.yml # Can specify multiple files here

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - node_scenarios:
            - scenarios/node-reboot.yaml
            - scenarios/node-stop-start.yaml
            - scenarios/node-network.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - container_scenarios:
            - scenarios/container-kill.yaml
        - node_scenarios:  # Same type can appear multiple times
            - scenarios/node-stop-start.yaml

Sample scenario file, you are able to specify multiple list items under node_scenarios that will be ran serially

node_scenarios:
  - actions:                   # node chaos scenarios to be injected
    - <action>                 # Can specify multiple actions here
    node_name: <node_name>     # node on which scenario has to be injected; can set multiple names separated by comma
    label_selector: <label>    # when node_name is not specified, a node with matching label_selector is selected for node chaos scenario injection; can specify multiple by a comma separated list
    exclude_label: <label>     # if label_selector is set, will exclude nodes marked by this label from the chaos scenario
    instance_count: <instance_number> # Number of nodes to perform action/select that match the label selector
    runs: <run_int>            # number of times to inject each scenario under actions (will perform on same node each time)
    timeout: <timeout>         # duration to wait for completion of node scenario injection
    duration: <duration>       # duration to stop the node before running the start action
    cloud_type: <cloud>        # cloud type on which Kubernetes/OpenShift runs  
    parallel: <true_or_false>  # Run action on label or node name in parallel or sequential, defaults to sequential
    kube_check: <true_or_false> # Run the kubernetes api calls to see if the node gets to a certain state during the node scenario
    disable_ssl_verification: <true_or_false> # Disable SSL verification, to avoid certificate errors

AWS

Cloud setup instructions can be found here. Sample scenario config can be found here.

The cloud type in the scenario yaml file needs to be aws

Baremetal

Sample scenario config can be found here.

The cloud type in the scenario yaml file needs to be bm

Note

Baremetal requires setting the IPMI user and password to power on, off, and reboot nodes, using the config options bm_user and bm_password. It can either be set in the root of the entry in the scenarios config, or it can be set per machine.

If no per-machine addresses are specified, kraken attempts to use the BMC value in the BareMetalHost object. To list them, you can do ‘oc get bmh -o wide –all-namespaces’. If the BMC values are blank, you must specify them per-machine using the config option ‘bmc_addr’ as specified below.

For per-machine settings, add a “bmc_info” section to the entry in the scenarios config. Inside there, add a configuration section using the node name. In that, add per-machine settings. Valid settings are ‘bmc_user’, ‘bmc_password’, ‘bmc_addr’ and ‘disks’. See the example node scenario or the example below.

Note

Baremetal requires oc (openshift client) be installed on the machine running Kraken.

Note

Baremetal machines are fragile. Some node actions can occasionally corrupt the filesystem if it does not shut down properly, and sometimes the kubelet does not start properly.

Docker

The Docker provider can be used to run node scenarios against kind clusters.

kind is a tool for running local Kubernetes clusters using Docker container “nodes”.

kind was primarily designed for testing Kubernetes itself, but may be used for local development or CI.

GCP

Cloud setup instructions can be found here. Sample scenario config can be found here.

The cloud type in the scenario yaml file needs to be gcp

Openstack

How to set up Openstack cli to run node scenarios is defined here.

The cloud type in the scenario yaml file needs to be openstack

The supported node level chaos scenarios on an OPENSTACK cloud are only: node_stop_start_scenario, stop_start_kubelet_scenario and node_reboot_scenario.

Note

For stop_start_helper_node_scenario, visit here to learn more about the helper node and its usage.

To execute the scenario, ensure the value for ssh_private_key in the node scenarios config file is set with the correct private key file path for ssh connection to the helper node. Ensure passwordless ssh is configured on the host running Kraken and the helper node to avoid connection errors.

Azure

Cloud setup instructions can be found here. Sample scenario config can be found here.

The cloud type in the scenario yaml file needs to be azure

Alibaba

How to set up Alibaba cli to run node scenarios is defined here.

Note

There is no “terminating” idea in Alibaba, so any scenario with terminating will “release” the node . Releasing a node is 2 steps, stopping the node and then releasing it.

The cloud type in the scenario yaml file needs to be alibaba

VMware

How to set up VMware vSphere to run node scenarios is defined here

The cloud type in the scenario yaml file needs to be vmware

IBMCloud

How to set up IBMCloud to run node scenarios is defined here

See a sample of ibm cloud node scenarios example config file

The cloud type in the scenario yaml file needs to be ibm

Note

To avoid ssl certificate errors, set disable_ssl_verification to true in the scenario yaml file.

IBMCloud Power

How to set up IBMCloud Power to run node scenarios is defined here

See a sample of ibm cloud node scenarios example config file

The cloud type in the scenario yaml file needs to be ibmpower or ibmcloudpower

General

Note

The node_crash_scenario and stop_kubelet_scenario scenarios are supported independent of the cloud platform.

Use ‘generic’ or do not add the ‘cloud_type’ key to your scenario if your cluster is not set up using one of the current supported cloud types.

Run

python run_kraken.py --config config/config.yaml

This scenario disrupts the node(s) matching the label on a Kubernetes/OpenShift cluster. Actions/disruptions supported are listed here

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-scenarios
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-scenarios
$ docker run \
  -e <VARIABLE>=<value> \
  --net=host \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-scenarios

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host -v ~/kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
ACTION	Action can be one of the following	enum	node_stop_start_scenario
LABEL_SELECTOR	Node label to target	string	node-role.kubernetes.io/worker
EXCLUDE_LABEL	Nodes labeled with this value will be excluded from the chaos	string
NODE_NAME	Node name to inject faults in case of targeting a specific node; Can set multiple node names separated by a comma	string	""
INSTANCE_COUNT	Targeted instance count matching the label selector	number	1
RUNS	Iterations to perform action on a single node	number	1
CLOUD_TYPE	Cloud platform on top of which cluster is running, supported platforms - aws, vmware, ibmcloud, ibmcloudpower, bm	enum	aws
TIMEOUT	Duration to wait for completion of node scenario injection	number	180
DURATION	Duration to stop the node before running the start action - not supported for vmware and ibm cloud type	number	120
KUBE_CHECK	Connect to the kubernetes api to see if the node gets to a certain state during the node scenario. Supported values: `true`, `false`	enum	True
PARALLEL	Run action on label or node name in parallel or sequential. Supported values: `true`, `false`	enum	False
DISABLE_SSL_VERIFICATION	Disable SSL verification, to avoid certificate errors. Supported values: `true`, `false`	enum	False
VERIFY_SESSION	Verify the SSH session during node scenarios	string	false
SKIP_OPENSHIFT_CHECKS	Skip OpenShift-specific cluster checks (set to true for vanilla Kubernetes)	string	false
BMC_USER	Only needed for Baremetal ( bm ) - IPMI/bmc username	string	""
BMC_PASSWORD	Only needed for Baremetal ( bm ) - IPMI/bmc password	string	""
BMC_ADDR	Only needed for Baremetal ( bm ) - IPMI/bmc address	string	""
DISKS	Comma-separated list of disks for baremetal disk detach/attach scenarios	string	""

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:node-scenarios

The following environment variables need to be set for the scenarios that requires intereacting with the cloud platform API to perform the actions:

Amazon Web Services

$ export AWS_ACCESS_KEY_ID=<>
$ export AWS_SECRET_ACCESS_KEY=<>
$ export AWS_DEFAULT_REGION=<>

VMware Vsphere

$ export VSPHERE_IP=<vSphere_client_IP_address>

$ export VSPHERE_USERNAME=<vSphere_client_username>

$ export VSPHERE_PASSWORD=<vSphere_client_password>

Ibmcloud

$ export IBMC_URL=https://<region>.iaas.cloud.ibm.com/v1

$ export IBMC_APIKEY=<ibmcloud_api_key>

Baremetal
Check Bare Metal Documentation

Google Cloud Platform

$ export GOOGLE_APPLICATION_CREDENTIALS=<GCP Json>

Azure

$ export AZURE_TENANT_ID=<>
$ export AZURE_CLIENT_SECRET=<>
$ export AZURE_CLIENT_ID=<>

OpenStack

export OS_USERNAME=username
export OS_PASSWORD=password
export OS_TENANT_NAME=projectName
export OS_AUTH_URL=https://identityHost:portNumber/v2.0
export OS_TENANT_ID=tenantIDString
export OS_REGION_NAME=regionName
export OS_CACERT=/path/to/cacertFile

krknctl run node-scenarios [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters: (be sure to scroll to right)

Parameter	Description	Type	Required	Default	Possible Values
`--action`	action performed on the node, visit https://github.com/krkn-chaos/krkn/blob/main/docs/node_scenarios.md for more infos	enum	Yes		node_start_scenario,node_stop_scenario,node_stop_start_scenario,node_termination_scenario,node_reboot_scenario,stop_kubelet_scenario,stop_start_kubelet_scenario,restart_kubelet_scenario,node_crash_scenario,stop_start_helper_node_scenario
`--label-selector`	Node label to target	string	No	node-role.kubernetes.io/worker
`--exclude-label`	excludes nodes marked by this label from chaos	string	No
`--node-name`	Node name to inject faults in case of targeting a specific node; Can set multiple node names separated by a comma	string	No
`--instance-count`	Targeted instance count matching the label selector	number	No	1
`--runs`	Iterations to perform action on a single node	number	No	1
`--cloud-type`	Cloud platform on top of which cluster is running, supported platforms - aws, azure, gcp, vmware, ibmcloud, bm	enum	No	aws
`--kube-check`	Connecting to the kubernetes api to check the node status, set to False for SNO	enum	No	true
`--timeout`	Duration to wait for completion of node scenario injection	number	No	180
`--duration`	Duration to wait for completion of node scenario injection	number	No	120
`--vsphere-ip`	vSphere IP address	string	No
`--vsphere-username`	vSphere IP address	string (secret)	No
`--vsphere-password`	vSphere password	string (secret)	No
`--aws-access-key-id`	AWS Access Key Id	string (secret)	No
`--aws-secret-access-key`	AWS Secret Access Key	string (secret)	No
`--aws-default-region`	AWS default region	string	No
`--bmc-user`	Only needed for Baremetal ( bm ) - IPMI/bmc username	string(secret)	No
`--bmc-password`	Only needed for Baremetal ( bm ) - IPMI/bmc password	string(secret)	No
`--bmc-address`	Only needed for Baremetal ( bm ) - IPMI/bmc address	string	No
`--ibmc-address`	IBM Cloud URL	string	No
`--ibmc-api-key`	IBM Cloud API Key	string (secret)	No
`--ibmc-power-address`	IBM Power Cloud URL	string	No
`--ibmc-cnr`	IBM Cloud Power Workspace CNR	string	No
`--disable-ssl-verification`	Disable SSL verification, to avoid certificate errors	enum	Yes	false
`--azure-tenant`	Azure Tenant	string	No
`--azure-client-secret`	Azure Client Secret	string(secret)	No
`--azure-client-id`	Azure Client ID	string(secret)	No
`--azure-subscription-id`	Azure Subscription ID	string (secret)	No
`--gcp-application-credentials`	GCP application credentials file location	file	No

NOTE: The secret string types will be masked when scenario is ran

Parameter Dependencies

--node-name vs --label-selector: When --node-name is set, --label-selector is ignored. The scenario targets the named node(s) directly.
--instance-count: Only applies when using --label-selector. It limits how many of the matched nodes are targeted.
Cloud credentials: The --vsphere-*, --aws-*, --bmc-*, --ibmc-*, --azure-*, and --gcp-* parameters are only required for their respective --cloud-type value. For example, --aws-access-key-id is only needed when --cloud-type is aws.

To see all available scenario options

krknctl run node-scenarios --help

Demo

See a demo of this scenario:

16.1 - Node Scenarios on Bare Metal

Disrupts node(s) on a bare metal Kubernetes/OpenShift cluster by driving power state through the host's BMC (IPMI). Unlike the cloud-provider node scenarios, this flow requires IPMI credentials (either default or per-machine) and the OpenShift `oc` CLI on the runner host. Supported actions are inherited from the parent [Node Scenarios](../_index.md) page (start, stop, stop_start, terminate, reboot, kubelet stop/restart, disk detach/attach, and so on).

How to Run Node Scenarios on Bare Metal

Choose your preferred method to run baremetal node scenarios:

Example scenario file: baremetal_node_scenarios.yml

Configuration

For baremetal, set cloud_type: bm and provide IPMI credentials either at the root of the scenario entry (bmc_user / bmc_password) or per-machine inside bmc_info. If bmc_addr is omitted, Krkn falls back to the BMC value found on the matching BareMetalHost (oc get bmh -o wide --all-namespaces).

node_scenarios:
  - actions:
      - node_stop_start_scenario           # any action listed on the parent Node Scenarios page
    label_selector: node-role.kubernetes.io/worker
    instance_count: 1
    runs: 1
    timeout: 360
    duration: 120
    parallel: false
    cloud_type: bm
    kube_check: true
    bmc_user: defaultuser                  # default IPMI user; optional if every machine sets its own
    bmc_password: defaultpass              # default IPMI password; optional if every machine sets its own
    bmc_info:                              # per-machine overrides (optional)
      node-1:
        bmc_addr: mgmt-machine1.example.com
      node-2:
        bmc_addr: mgmt-machine2.example.com
        bmc_user: user
        bmc_password: pass

For the full set of node-scenario fields shared with other cloud providers (actions, node_name, label_selector, instance_count, etc.) see the parent Node Scenarios page.

Baremetal-specific fields

cloud_type — must be bm.
bmc_user, bmc_password — default IPMI credentials. May also be supplied via environment variables (BMC_USER, BMC_PASSWORD) — Krkn falls back to env when the YAML keys are absent.
bmc_info — per-machine overrides keyed by node name. Each entry accepts bmc_addr, bmc_user, bmc_password, and (for node_disk_detach_attach_scenario) a disks list.
For node_disk_detach_attach_scenario, bmc_info.<node>.disks is required and bmc_addr is not used.

Disk detach / attach

node_scenarios:
  - actions:
      - node_disk_detach_attach_scenario
    node_name: node-1
    instance_count: 1
    runs: 1
    timeout: 360
    duration: 120
    parallel: false
    cloud_type: bm
    bmc_info:
      node-1:
        disks: ["sda", "sdb"]

Usage

Enable baremetal node scenarios by adding the YAML file under node_scenarios in your kraken config:

kraken:
    chaos_scenarios:
        - node_scenarios:
            - scenarios/openshift/baremetal_node_scenarios.yml

Note

Baremetal requires oc (OpenShift client) installed on the host running Krkn. Some node actions can occasionally corrupt the filesystem if the node does not shut down cleanly — keep recovery procedures handy.

Run

python run_kraken.py --config config/config.yaml

Run

Unlike other krkn-hub scenarios, baremetal node scenarios require a base64-encoded scenario file rather than per-parameter env vars. Author your scenario locally following the scenario syntax, then pass it to the container via SCENARIO_BASE64.

If enabling Cerberus to monitor the cluster and pass/fail the scenario post chaos, refer docs. Make sure to start it before injecting the chaos and set CERBERUS_ENABLED for the chaos injection container to auto-connect.

$ podman run --name=<container_name> --net=host --pull=always --env-host=true \
    -e SCENARIO_BASE64="$(base64 -w0 <scenario_file>)" \
    -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-scenarios-bm
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

$ docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always \
    -e SCENARIO_BASE64="$(base64 -w0 <scenario_file>)" \
    -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-scenarios-bm
OR
$ docker run -e SCENARIO_BASE64="$(base64 -w0 <scenario_file>)" \
    --net=host --pull=always -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-scenarios-bm
$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

TIP: Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -e SCENARIO_BASE64="$(base64 -w0 <scenario_file>)" -v ~/kubeconfig:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-scenarios-bm

Supported parameters

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario-specific variables.

Parameter	Description	Type	Default	Required
SCENARIO_BASE64	Base64-encoded contents of a baremetal node scenario YAML (`base64 -w0 baremetal_node_scenarios.yml`)	string		Yes
KRKN_DEBUG	When set to `True`, prints the decoded scenario and config files before running and enables `--debug True`	bool	`False`	No

The contents of SCENARIO_BASE64 are validated against the node-scenarios-bm JSON schema before Krkn starts — invalid scenarios fail fast with a schema error.

NOTE In case of using a custom metrics profile or alerts profile when CAPTURE_METRICS or ENABLE_ALERTS is enabled, mount the metrics/alerts files from the host under /home/krkn/kraken/config/metrics-aggregated.yaml and /home/krkn/kraken/config/alerts:

$ podman run --name=<container_name> --net=host --pull=always --env-host=true \
    -e SCENARIO_BASE64="$(base64 -w0 <scenario_file>)" \
    -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
    -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
    -v <path-to-kube-config>:/home/krkn/.kube/config:Z -d quay.io/krkn-chaos/krkn-hub:node-scenarios-bm

krknctl run node-scenarios-bm --scenario-file-path <path-to-baremetal_node_scenarios.yml>

Can also set any global variable listed here.

Node Scenarios BM Parameters

Argument	Type	Description	Required	Default
`--scenario-file-path`	file_base64	Absolute path to the baremetal node-scenarios YAML file. krknctl base64-encodes the file and supplies it as `SCENARIO_BASE64` to the container.	true

The scenario YAML must follow the baremetal node scenario schema. See the Krkn tab on this page for an annotated example and the list of supported actions.

Example

krknctl run node-scenarios-bm \
  --scenario-file-path ~/krkn/scenarios/openshift/baremetal_node_scenarios.yml

Note

krknctl handles the base64 encoding for you — pass a plain filesystem path. The validation step inside the container (against config-schema.json) still applies, so invalid YAML is rejected before Krkn runs.

Demo

See a demo of this scenario:

17 - Pod Network Scenarios

Pod outage

Scenario to block the traffic (Ingress/Egress) of a pod matching the labels for the specified duration of time to understand the behavior of the service/other services which depend on it during downtime. This helps with planning the requirements accordingly, be it improving the timeouts or tweaking the alerts etc. With the current network policies, it is not possible to explicitly block ports which are enabled by allowed network policy rule. This chaos scenario addresses this issue by using OVS flow rules to block ports related to the pod. It supports OpenShiftSDN and OVNKubernetes based networks.

Excluding Pods from Network Outage

The pod outage scenario now supports excluding specific pods from chaos testing using the exclude_label parameter. This allows you to target a namespace or group of pods with your chaos testing while deliberately preserving certain critical workloads.

Why Use Pod Exclusion?

This feature addresses several common use cases:

Testing resiliency of an application while keeping critical monitoring pods operational
Preserving designated “control plane” pods within a microservice architecture
Allowing targeted chaos without affecting auxiliary services in the same namespace
Enabling more precise pod selection when network policies require all related services to be in the same namespace

How to Use the `exclude_label` Parameter

The exclude_label parameter works alongside existing pod selection parameters (label_selector and pod_name). The system will:

Identify all pods in the target namespace
Exclude pods matching the exclude_label criteria (in format “key=value”)
Apply the existing filters (label_selector or pod_name)
Apply the chaos scenario to the resulting pod list

Example Configurations

Basic exclude configuration:

- id: pod_network_outage
  config:
    namespace: my-application
    label_selector: "app=my-service"
    exclude_label: "critical=true"
    direction:
      - egress
    test_duration: 600

In this example, network disruption is applied to all pods with the label app=my-service in the my-application namespace, except for those that also have the label critical=true.

Complete scenario example:

- id: pod_network_outage
  config:
    namespace: openshift-console
    direction:
      - ingress
    ingress_ports:
      - 8443
    label_selector: 'component=ui'
    exclude_label: 'excluded=true'
    test_duration: 600

This scenario blocks ingress traffic on port 8443 for pods matching component=ui label in the openshift-console namespace, but will skip any pods labeled with excluded=true.

The exclude_label parameter is also supported in the pod network shaping scenarios (pod_egress_shaping and pod_ingress_shaping), allowing for the same selective application of network latency, packet loss, and bandwidth restriction.

How to Run Pod Network Scenarios

Choose your preferred method to run pod network scenarios:

Example scenario file: pod_network_outage.yml

Sample scenario config (using a plugin)

- id: pod_network_outage
  config:
    namespace: openshift-console   # Required - Namespace of the pod to which filter need to be applied
    direction:                     # Optional - List of directions to apply filters
        - ingress                  # Blocks ingress traffic, Default both egress and ingress
    ingress_ports:                 # Optional - List of ports to block traffic on
        - 8443                     # Blocks 8443, Default [], i.e. all ports.
    label_selector: 'component=ui' # Blocks access to openshift console
    exclude_label: 'critical=true' # Optional - Pods matching this label will be excluded from the chaos
    image: quay.io/krkn-chaos/krkn:tools

Pod Network shaping

Scenario to introduce network latency, packet loss, and bandwidth restriction in the Pod’s network interface. The purpose of this scenario is to observe faults caused by random variations in the network.

Sample scenario config for egress traffic shaping (using plugin)

- id: pod_egress_shaping
  config:
    namespace: openshift-console   # Required - Namespace of the pod to which filter need to be applied.
    label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
    exclude_label: 'critical=true' # Optional - Pods matching this label will be excluded from the chaos
    network_params:
        latency: 500ms             # Add 500ms latency to egress traffic from the pod.
    image: quay.io/krkn-chaos/krkn:tools

Sample scenario config for ingress traffic shaping (using plugin)

- id: pod_ingress_shaping
  config:
    namespace: openshift-console   # Required - Namespace of the pod to which filter need to be applied.
    label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
    exclude_label: 'critical=true' # Optional - Pods matching this label will be excluded from the chaos
    network_params:
        latency: 500ms             # Add 500ms latency to egress traffic from the pod.
    image: quay.io/krkn-chaos/krkn:tools

Steps

Pick the pods to introduce the network anomaly either from label_selector or pod_name.
Identify the pod interface name on the node.
Set traffic shaping config on pod’s interface using tc and netem.
Wait for the duration time.
Remove traffic shaping config on pod’s interface.
Remove the job that spawned the pod.

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - pod_network_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - pod_network_scenarios:
            - scenarios/pod-network-1.yaml
            - scenarios/pod-network-2.yaml
            - scenarios/pod-network-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - pod_network_scenarios:
            - scenarios/pod-network.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - container_scenarios:
            - scenarios/container-kill.yaml
        - pod_network_scenarios:  # Same type can appear multiple times
            - scenarios/pod-network-2.yaml

Run

python run_kraken.py --config config/config.yaml

This scenario runs network chaos at the pod level on a Kubernetes/OpenShift cluster.

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:pod-network-chaos
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:pod-network-chaos
$ docker run \
  -e <VARIABLE>=<value> \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:pod-network-chaos

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Default
NAMESPACE	Required - Namespace of the pod to which filter need to be applied	""
IMAGE	Image used to disrupt network on a pod	“quay.io/krkn-chaos/krkn:tools”
LABEL_SELECTOR	Label of the pod(s) to target	""
POD_NAME	When label_selector is not specified, pod matching the name will be selected for the chaos scenario	""
EXCLUDE_LABEL	Pods matching this label will be excluded from the chaos even if they match other criteria	""
INSTANCE_COUNT	Number of pods to perform action/select that match the label selector	1
TRAFFIC_TYPE	List of directions to apply filters - egress/ingress ( needs to be a list )	[ingress, egress]
INGRESS_PORTS	Ingress ports to block ( needs to be a list )	[] i.e all ports
EGRESS_PORTS	Egress ports to block ( needs to be a list )	[] i.e all ports
WAIT_DURATION	The duration (in seconds) that the network chaos (traffic shaping, packet loss, etc.) persists on the target pods. This is the actual time window where the network disruption is active. It must be longer than TEST_DURATION to ensure the fault is active for the entire test.	300
TEST_DURATION	Duration of the test run (e.g. workload or verification)	120

Note

For disconnected clusters, be sure to also mirror the helper image of quay.io/krkn-chaos/krkn:tools and set the mirrored image path properly

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:pod-network-chaos

krknctl run pod-network-chaos [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Required	Default
`--namespace`	Namespace of the pod to which filter need to be applied	string	Yes
`--image`	Image used to disrupt network on a pod	string	No	quay.io/krkn-chaos/krkn:tools
`--label-selector`	When pod_name is not specified, pod matching the label will be selected for the chaos scenario	string	No
`--exclude-label`	Pods matching this label will be excluded from the chaos even if they match other criteria	string	No	""
`--pod-name`	When label_selector is not specified, pod matching the name will be selected for the chaos scenario	string	No
`--instance-count`	Targeted instance count matching the label selector	number	No	1
`--traffic-type`	List of directions to apply filters - egress/ingress ( needs to be a list )	string	No	“[ingress,egress]”
`--ingress-ports`	Ingress ports to block ( needs to be a list )	string	No
`--egress-ports`	Egress ports to block ( needs to be a list )	string	No
`--wait-duration`	Ensure that it is at least about twice of test_duration	number	No	300
`--test-duration`	Duration of the test run	number	No	120

Parameter Dependencies

--ingress-ports / --egress-ports: When left empty, all ports are blocked for that traffic direction. Specify port numbers to restrict the filter to only those ports.
--wait-duration: Must be at least 2× --test-duration to allow the network to stabilize before verification.

To see all available scenario options

krknctl run pod-network-chaos --help

18 - Pod Scenarios

This scenario disrupts the pods matching the label, excluded label or pod name in the specified namespace on a Kubernetes/OpenShift cluster.

Why pod scenarios are important:

Modern applications demand high availability, low downtime, and resilient infrastructure. Kubernetes provides building blocks like Deployments, ReplicaSets, and Services to support fault tolerance, but understanding how these interact during disruptions is critical for ensuring reliability. Pod disruption scenarios test this reliability under various conditions, validating that the application and infrastructure respond as expected.

Use cases of pod scenarios

Deleting a single pod

Use Case: Simulates unplanned deletion of a single pod
Why It’s Important: Validates whether the ReplicaSet or Deployment automatically creates a replacement.
Customer Impact: Ensures continuous service even if a pod unexpectedly crashes.
Recovery Timing: Typically less than 10 seconds for stateless apps (seen in Krkn telemetry output).
HA Indicator: Pod is automatically rescheduled and becomes Ready without manual intervention.

kubectl delete pod <pod-name> -n <namespace>
kubectl get pods -n <namespace> -w # watch for new pods
```bash

2. Deleting multiple pods simultaneously
- **Use Case:** Simulates a larger failure event, such as a node crash or AZ outage.
- **Why It's Important:** Tests whether the system has enough resources and policies to recover gracefully.
- **Customer Impact:** If all pods of a service fail, user experience is directly impacted.
- **HA Indicator:** Application can continue functioning from other replicas across zones/nodes.

3. Pod Eviction (Soft Disruption)
- **Use Case:** Triggered by Kubernetes itself during node upgrades or scaling down.
- **Why It's Important:** Ensures graceful termination and restart elsewhere without user impact.
- **Customer Impact:** Should be zero if readiness/liveness probes and PDBs are correctly configured.
- **HA Indicator:** Rolling disruption does not take down the whole application.

</krkn-hub-scenario>

## How to know if it is highly available 
- ***Multiple Replicas Exist:*** Confirmed by checking `kubectl get deploy -n <namespace>` and seeing atleast 1 replica.
- ***Pods Distributed Across Nodes/availability zones:*** Using `topologySpreadConstraints` or observing pod distribution in `kubectl get pods -o wide`. See [Health Checks](../../krkn/health-checks.md) for real time visibility into the impact of chaos scenarios on application availability and performance
- ***Service Uptime Remains Unaffected:*** During chaos test, verify app availability (synthetic probes, Prometheus alerts, etc).
- ***Recovery Is Automatic:*** No manual intervention needed to restore service.
- ***Krkn Telemetry Indicators:*** End of run data includes recovery times, pod reschedule latency, and service downtime which are vital metrics for assessing HA.

## Excluding Pods from Disruption

Employ `exclude_label` to designate the safe pods in a group, while the rest of the pods in a namespace are subjected to chaos. Some frequent use cases are:
- Turn off the backend pods but make sure the database replicas that are highly available remain untouched.
- Inject the fault in the application layer, do not stop the infrastructure/monitoring pods.
- Run a rolling disruption experiment with the control-plane or system-critical components that are not affected.

**Format:**

```yaml
exclude_label: "key=value"

Mechanism:

Pods are selected based on namespace_pattern + label_selector or name_pattern.
Before deletion, the pods that match exclude_label are removed from the list.
Rest of the pods are subjected to chaos.

Example: Have the Leader Protected While Different etcd Replicas Are Killed

- id: kill_pods
    config:
        namespace_pattern: ^openshift-etcd$
        label_selector: k8s-app=etcd
        exclude_label: role=etcd-leader
        krkn_pod_recovery_time: 120
        kill: 1

Example: Disrupt Backend, Skip Monitoring

- id: kill_pods
    config:
        namespace_pattern: ^production$
        label_selector: app=backend
        exclude_label: component=monitoring
        krkn_pod_recovery_time: 120
        kill: 2

Targeting Pods on Specific Nodes

By default, pod scenarios target all pods matching the namespace and label selectors regardless of which node they run on. However, you can narrow down the scope to only affect pods running on specific nodes using two options:

Option 1: Using Node Label Selector

Target pods running on nodes with specific labels (e.g., control-plane nodes, worker nodes, nodes in a specific zone).

Format:

node_label_selector: "key=value"

Use Cases:

Test resilience of control-plane workloads by disrupting pods only on master/control-plane nodes
Simulate zone-specific failures by targeting nodes in a particular availability zone
Test worker node failures without affecting control-plane components

Example: Target Pods on Control-Plane Nodes

- id: kill_pods
  config:
    namespace_pattern: ^kube-system$
    label_selector: k8s-app=kube-scheduler
    node_label_selector: node-role.kubernetes.io/control-plane=
    krkn_pod_recovery_time: 120

Example: Target Pods in a Specific Availability Zone

- id: kill_pods
  config:
    namespace_pattern: ^production$
    label_selector: app=backend
    node_label_selector: topology.kubernetes.io/zone=us-east-1a
    krkn_pod_recovery_time: 120

Option 2: Using Node Names

Target pods running on explicitly named nodes. This is useful for testing specific node scenarios or mixed node type environments.

Format:

node_names:
  - node-name-1
  - node-name-2

Use Cases:

Test failures on specific nodes (e.g., nodes with known hardware issues)
Simulate scenarios involving mixed node types (e.g., GPU nodes, high-memory nodes)
Validate pod distribution and failover between specific nodes

Example: Target Pods on Specific Nodes

- id: kill_pods
  config:
    namespace_pattern: ^kube-system$
    label_selector: k8s-app=kube-scheduler
    node_names:
      - ip-10-0-31-8.us-east-2.compute.internal
      - ip-10-0-48-188.us-east-2.compute.internal
    krkn_pod_recovery_time: 120

Mechanism:

Pods are selected based on namespace_pattern + label_selector or name_pattern
The selection is further filtered to only include pods running on the specified nodes
If exclude_label is also specified, it’s applied after node filtering
The remaining pods are subjected to chaos

Recovery Time Metrics in Krkn Telemetry

Krkn tracks three key recovery time metrics for each affected pod:

pod_rescheduling_time - The time (in seconds) that the Kubernetes cluster took to reschedule the pod after it was killed. This measures the cluster’s scheduling efficiency and includes the time from pod deletion until the replacement pod is scheduled on a node.
pod_readiness_time - The time (in seconds) the pod took to become ready after being scheduled. This measures application startup time, including container image pulls, initialization, and readiness probe success.
total_recovery_time - The total amount of time (in seconds) from pod deletion until the replacement pod became fully ready and available to serve traffic. This is the sum of rescheduling time and readiness time.

Example telemetry output:

{
  "recovered": [
    {
      "pod_name": "backend-7d8f9c-xyz",
      "namespace": "production",
      "pod_rescheduling_time": 2.3,
      "pod_readiness_time": 5.7,
      "total_recovery_time": 8.0
    }
  ],
  "unrecovered": []
}

How to Run Pod Scenarios

Choose your preferred method to run pod scenarios:

Example Config

The following are the components of Kubernetes for which a basic chaos scenario config exists today.

Example scenario files:

pod.yml (Kubernetes)
regex_openshift_pod_kill.yml (OpenShift)

kraken:
  chaos_scenarios:
    - pod_disruption_scenarios:
      - path/to/scenario.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
  chaos_scenarios:
    - pod_disruption_scenarios:
      - path/to/scenario1.yaml
      - path/to/scenario2.yaml
      - path/to/scenario3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
  chaos_scenarios:
    - pod_disruption_scenarios:
      - scenarios/pod-kill.yaml
      - scenarios/etcd-kill.yaml
    - container_scenarios:
      - scenarios/container-kill.yaml
    - node_scenarios:
      - scenarios/node-reboot.yaml
    - pod_disruption_scenarios:  # Same type can appear multiple times
      - scenarios/pod-kill-2.yaml

You can then create the scenario file with the following contents:

# yaml-language-server: $schema=../plugin.schema.json
- id: kill-pods
  config:
    namespace_pattern: ^kube-system$
    label_selector: k8s-app=kube-scheduler
    krkn_pod_recovery_time: 120
    #Not needed by default, but can be used if you want to target pods on specific nodes
    # Option 1: Target pods on nodes with specific labels [master/worker nodes]
    node_label_selector: node-role.kubernetes.io/control-plane=      # Target control-plane nodes (works on both k8s and openshift)
    exclude_label: 'critical=true' # Optional - Pods matching this label will be excluded from the chaos
    # Option 2: Target pods of specific nodes (testing mixed node types)
    node_names:
      - ip-10-0-31-8.us-east-2.compute.internal      # Worker node 1
      - ip-10-0-48-188.us-east-2.compute.internal    # Worker node 2
      - ip-10-0-14-59.us-east-2.compute.internal     # Master node 1

Please adjust the schema reference to point to the schema file. This file will give you code completion and documentation for the available options in your IDE.

Pod Chaos Scenarios

The following are the components of Kubernetes/OpenShift for which a basic chaos scenario config exists today.

Component	Description	Working
Basic pod scenario	Kill a pod.	✔️
Etcd	Kills a single/multiple etcd replicas.	✔️
Kube ApiServer	Kills a single/multiple kube-apiserver replicas.	✔️
ApiServer	Kills a single/multiple apiserver replicas.	✔️
Prometheus	Kills a single/multiple prometheus replicas.	✔️
OpenShift System Pods	Kills random pods running in the OpenShift system namespaces.	✔️

Run

python run_kraken.py --config config/config.yaml

This scenario disrupts the pods matching the label in the specified namespace on a Kubernetes/OpenShift cluster.

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:pod-scenarios
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:pod-scenarios
$ docker run \
  -e <VARIABLE>=<value> \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:pod-scenarios

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~/kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
NAMESPACE	Targeted namespace in the cluster ( supports regex )	string	openshift-.*
POD_LABEL	Label of the pod(s) to target	string	""
EXCLUDE_LABEL	Pods matching this label will be excluded from the chaos even if they match other criteria	string	""
NAME_PATTERN	Regex pattern to match the pods in NAMESPACE when POD_LABEL is not specified	string	.*
DISRUPTION_COUNT	Number of pods to disrupt	number	1
KILL_TIMEOUT	Timeout to wait for the target pod(s) to be removed in seconds	number	180
EXPECTED_RECOVERY_TIME	Fails if the pod disrupted do not recover within the timeout set	number	120
NODE_LABEL_SELECTOR	Label of the node(s) to target	string	""
NODE_NAMES	Name of the node(s) to target. Example: [“worker-node-1”,“worker-node-2”,“master-node-1”]	string	[]

Note

Set NAMESPACE environment variable to openshift-.* to pick and disrupt pods randomly in openshift system namespaces, the DAEMON_MODE can also be enabled to disrupt the pods every x seconds in the background to check the reliability.

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:pod-scenarios

krknctl run pod-scenarios [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Required	Default
`--namespace`	Targeted namespace in the cluster ( supports regex )	string	No	openshift-*
`--pod-label`	Label of the pod(s) to target ex. “app=test”	string	No
`--exclude-label`	Pods matching this label will be excluded from the chaos even if they match other criteria	string	No	""
`--name-pattern`	Regex pattern to match the pods in NAMESPACE when POD_LABEL is not specified	string	No	.*
`--disruption-count`	Number of pods to disrupt	number	No	1
`--kill-timeout`	Timeout to wait for the target pod(s) to be removed in seconds	number	No	180
`--expected-recovery-time`	Fails if the pod disrupted do not recover within the timeout set	number	No	120
`--node-label-selector`	Label of the node(s) to target	string	No	""
`--node-names`	Name of the node(s) to target. Example: [“worker-node-1”,“worker-node-2”,“master-node-1”]	string	No	[]

Behavior Notes

Recovery monitoring: After disrupting pods, krkn monitors for recovery up to --expected-recovery-time seconds. If any pods remain unrecovered after the timeout, the scenario reports failure.

To see all available scenario options

krknctl run pod-scenarios --help

Demo

See a demo of this scenario:

19 - Power Outage Scenarios

This scenario shuts down Kubernetes/OpenShift cluster for the specified duration to simulate power outages, brings it back online and checks if it’s healthy.

How to Run Power Outage Scenarios

Choose your preferred method to run power outage scenarios:

Power Outage/ Cluster shut down scenario can be injected by placing the shut_down config file under cluster_shut_down_scenario option in the kraken config. Refer to cluster_shut_down_scenario config file.

Example scenario file: cluster_shut_down_scenario.yml

Refer to cloud setup to configure your cli properly for the cloud provider of the cluster you want to shut down.

Current accepted cloud types:

cluster_shut_down_scenario:                          # Scenario to stop all the nodes for specified duration and restart the nodes.
  runs: 1                                            # Number of times to execute the cluster_shut_down scenario.
  shut_down_duration: 120                            # Duration in seconds to shut down the cluster.
  cloud_type: aws                                    # Cloud type on which Kubernetes/OpenShift runs.

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - cluster_shut_down_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - cluster_shut_down_scenarios:
            - scenarios/power-outage-1.yaml
            - scenarios/power-outage-2.yaml
            - scenarios/power-outage-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - cluster_shut_down_scenarios:
            - scenarios/power-outage.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - cluster_shut_down_scenarios:  # Same type can appear multiple times
            - scenarios/power-outage-2.yaml

Run

python run_kraken.py --config config/config.yaml

This scenario shuts down Kubernetes/OpenShift cluster for the specified duration to simulate power outages, brings it back online and checks if it’s healthy. More information can be found here

Right now power outage and cluster shutdown are one in the same. We originally created this scenario to stop all the nodes and then start them back up how a customer would shut their cluster down.

In a real life chaos scenario though, we figured this scenario was close to if the power went out on the aws side so all of our ec2 nodes would be stopped/powered off. We tried to look at if aws cli had a way to forcefully poweroff the nodes (not gracefully) and they don’t currently support so this scenario is as close as we can get to “pulling the plug”

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:power-outages
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:power-outages
$ docker run \
  -e <VARIABLE>=<value> \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:power-outages

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

example: export <parameter_name>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
SHUTDOWN_DURATION	Duration in seconds to shut down the cluster	number	1200
CLOUD_TYPE	Cloud platform on top of which cluster is running, supported cloud platforms	enum	aws
TIMEOUT	Time in seconds to wait for each node to be stopped or running after the cluster comes back	number	600

The following environment variables need to be set for the scenarios that requires intereacting with the cloud platform API to perform the actions:

Amazon Web Services

$ export AWS_ACCESS_KEY_ID=<>
$ export AWS_SECRET_ACCESS_KEY=<>
$ export AWS_DEFAULT_REGION=<>

Google Cloud Platform

TBD

Azure

TBD

OpenStack

TBD

Baremetal

TBD

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:power-outages

krknctl run power-outages [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Required	Default
`--cloud-type`	Cloud platform on top of which cluster is running, supported platforms - aws, azure, gcp, vmware, ibmcloud, bm	enum	No	aws
`--timeout`	Time in seconds to wait for each node to be stopped or running after the cluster comes back	number	No	180
`--shutdown-duration`	Duration in seconds to shut down the cluster	number	No	1200
`--vsphere-ip`	vSphere IP address	string	No
`--vsphere-username`	vSphere IP address	string (secret)	No
`--vsphere-password`	vSphere password	string (secret)	No
`--aws-access-key-id`	AWS Access Key Id	string (secret)	No
`--aws-secret-access-key`	AWS Secret Access Key	string (secret)	No
`--aws-default-region`	AWS default region	string	No
`--bmc-user`	Only needed for Baremetal ( bm ) - IPMI/bmc username	string(secret)	No
`--bmc-password`	Only needed for Baremetal ( bm ) - IPMI/bmc password	string(secret)	No
`--bmc-address`	Only needed for Baremetal ( bm ) - IPMI/bmc address	string	No
`--ibmc-address`	IBM Cloud URL	string	No
`--ibmc-api-key`	IBM Cloud API Key	string (secret)	No
`--azure-tenant`	Azure Tenant	string	No
`--azure-client-secret`	Azure Client Secret	string(secret)	No
`--azure-client-id`	Azure Client ID	string(secret)	No
`--azure-subscription-id`	Azure Subscription ID	string (secret)	No
`--gcp-application-credentials`	GCP application credentials file location	file	No

NOTE: The secret string types will be masked when scenario is ran

To see all available scenario options

krknctl run power-outages --help

Demo

See a demo of this scenario:

20 - PVC Scenario

Scenario to fill up a given PersistenVolumeClaim by creating a temp file on the PVC from a pod associated with it. The purpose of this scenario is to fill up a volume to understand faults caused by the application using this volume.

How to Run PVC Scenarios

Choose your preferred method to run PVC scenarios:

Example scenario file: pvc_scenario.yaml

Sample scenario config

pvc_scenario:
  pvc_name: <pvc_name>          # Name of the target PVC.
  pod_name: <pod_name>          # Name of the pod where the PVC is mounted. It will be ignored if the pvc_name is defined.
  namespace: <namespace_name>   # Namespace where the PVC is.
  fill_percentage: 50           # Target percentage to fill up the cluster. Value must be higher than current percentage. Valid values are between 0 and 99.
  duration: 60                  # Duration in seconds for the fault.

Steps

Get the pod name where the PVC is mounted.
Get the volume name mounted in the container pod.
Get the container name where the PVC is mounted.
Get the mount path where the PVC is mounted in the pod.
Get the PVC capacity and current used capacity.
Calculate file size to fill the PVC to the target fill_percentage.
Connect to the pod.
Create a temp file kraken.tmp with random data on the mount path:
- dd bs=1024 count=$file_size </dev/urandom > /mount_path/kraken.tmp
Wait for the duration time.
Remove the temp file created:
- rm kraken.tmp

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - pvc_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - pvc_scenarios:
            - scenarios/pvc-fill-1.yaml
            - scenarios/pvc-fill-2.yaml
            - scenarios/pvc-fill-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - pvc_scenarios:
            - scenarios/pvc-fill.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - container_scenarios:
            - scenarios/container-kill.yaml
        - pvc_scenarios:  # Same type can appear multiple times
            - scenarios/pvc-fill-2.yaml

Run

python run_kraken.py --config config/config.yaml

This scenario fills up a given PersistenVolumeClaim by creating a temp file on the PVC from a pod associated with it. The purpose of this scenario is to fill up a volume to understand faults cause by the application using this volume. For more information refer the following documentation.

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:pvc-scenarios
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:pvc-scenarios
$ docker run \
  -e <VARIABLE>=<value> \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:pvc-scenarios

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

If both PVC_NAME and POD_NAME are defined, POD_NAME value will be overridden from the Mounted By: value on PVC definition.

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
PVC_NAME	Targeted PersistentVolumeClaim in the cluster (if null, POD_NAME is required)	string
POD_NAME	Targeted pod in the cluster (if null, PVC_NAME is required)	string
NAMESPACE	Targeted namespace in the cluster (required)	string
FILL_PERCENTAGE	Targeted percentage to be filled up in the PVC	number	50
DURATION	Duration in seconds with the PVC filled up	number	60
BLOCK_SIZE	Block size in bytes for the dd command used to fill the PVC	number	102400

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:pvc-scenarios

krknctl run pvc-scenarios [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Required	Default
`--pvc-name`	Targeted PersistentVolumeClaim in the cluster (if null, POD_NAME is required)	string	No
`--pod-name`	Targeted pod in the cluster (if null, PVC_NAME is required)	string	No
`--namespace`	Targeted namespace in the cluster (required)	string	Yes
`--fill-percentage`	Targeted percentage to be filled up in the PVC	number	No	50
`--duration`	Duration in seconds with the PVC filled up	number	No	60

Parameter Dependencies

--pvc-name vs --pod-name: At least one is required. If both are set, --pvc-name takes precedence and --pod-name is ignored.

Behavior Notes

Automatic cleanup: After --duration expires, krkn automatically deletes the temporary fill file from the PVC.
PVC requirements: The target PVC must be in Bound state and mounted to an active pod. The scenario locates the mount path by inspecting the pod’s volume mounts.

To see all available scenario options

krknctl run pvc-scenarios --help

21 - Service Disruption Scenarios

Using this type of scenario configuration one is able to delete crucial objects in a specific namespace, or a namespace matching a certain regex string.

How to Run Service Disruption Scenarios

Choose your preferred method to run service disruption scenarios:

Example scenario files from scenarios-hub:

Configuration Options:

namespace: Specific namespace or regex style namespace of what you want to delete. Gets all namespaces if not specified; set to "" if you want to use the label_selector field.

Set to ‘^.*$’ and label_selector to "" to randomly select any namespace in your cluster.

label_selector: Label on the namespace you want to delete. Set to "" if you are using the namespace variable.

delete_count: Number of namespaces to kill in each run. Based on matching namespace and label specified, default is 1.

runs: Number of runs/iterations to kill namespaces, default is 1.

sleep: Number of seconds to wait between each iteration/count of killing namespaces. Defaults to 10 seconds if not set

Refer to namespace_scenarios_example config file.

scenarios:
- namespace: "^.*$"
  runs: 1
- namespace: "^.*ingress.*$"
  runs: 1
  sleep: 15

Steps

This scenario will select a namespace (or multiple) dependent on the configuration and will kill all of the below object types in that namespace and will wait for them to be Running in the post action

Services
Daemonsets
Statefulsets
Replicasets
Deployments

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - service_disruption_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - service_disruption_scenarios:
            - scenarios/service-disruption-1.yaml
            - scenarios/service-disruption-2.yaml
            - scenarios/service-disruption-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - service_disruption_scenarios:
            - scenarios/service-disruption.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - container_scenarios:
            - scenarios/container-kill.yaml
        - service_disruption_scenarios:  # Same type can appear multiple times
            - scenarios/service-disruption-2.yaml

Run

python run_kraken.py --config config/config.yaml

This scenario deletes main objects within a namespace in your Kubernetes/OpenShift cluster. More information can be found here.

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:service-disruption-scenarios
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:service-disruption-scenarios
$ docker run \
  -e <VARIABLE>=<value> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:service-disruption-scenarios

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
LABEL_SELECTOR	Label of the namespace to target. Set this parameter only if NAMESPACE is not set	string	""
NAMESPACE	Name of the namespace you want to target. Set this parameter only if LABEL_SELECTOR is not set	string	“openshift-etcd”
SLEEP	Number of seconds to wait before polling to see if namespace exists again	number	15
DELETE_COUNT	Number of namespaces to kill in each run, based on matching namespace and label specified	number	1
RUNS	Number of runs to execute the action	number	1

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:service-disruption-scenarios

krknctl run service-disruption-scenarios [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Required	Default
`--namespace`	Targeted namespace in the cluster (required)	string	No	openshift-etcd
`--label-selector`	Label of the namespace to target. Set this parameter only if NAMESPACE is not set	string	No
`--delete-count`	Number of namespaces to kill in each run, based on matching namespace and label specified	number	No	1
`--runs`	Number of runs to execute the action	number	No	1

Behavior Notes

No automatic recovery: After krkn deletes the services, they are not automatically recreated. Services will only come back if managed by a controller (e.g. Helm release, operator, or GitOps pipeline). Verify your recovery mechanism before running this scenario.

To see all available scenario options

krknctl run service-disruption-scenarios --help

Demo

See a demo of this scenario:

22 - Service Hijacking Scenario

Service Hijacking Scenarios aim to simulate fake HTTP responses from a workload targeted by a Service already deployed in the cluster. This scenario is executed by deploying a custom-made web service and modifying the target Service selector to direct traffic to this web service for a specified duration.

The web service’s source code is available here.

It employs a time-based test plan from the scenario configuration file, which specifies the behavior of resources during the chaos scenario as follows:

The scenario will focus on the service_name within the service_namespace, substituting the selector with a randomly generated one, which is added as a label in the mock service manifest. This allows multiple scenarios to be executed in the same namespace, each targeting different services without causing conflicts.

The newly deployed mock web service will expose a service_target_port, which can be either a named or numeric port based on the service configuration. This ensures that the Service correctly routes HTTP traffic to the mock web service during the chaos run.

Each step will last for duration seconds from the deployment of the mock web service in the cluster. For each HTTP resource, defined as a top-level YAML property of the plan (it could be a specific resource, e.g., /list/index.php, or a path-based resource typical in MVC frameworks), one or more HTTP request methods can be specified. Both standard and custom request methods are supported.

During this time frame, the web service will respond with:

status: The HTTP status code (can be standard or custom).
mime_type: The MIME type (can be standard or custom).
payload: The response body to be returned to the client.

At the end of the step duration, the web service will proceed to the next step (if available) until the global chaos_duration concludes. At this point, the original service will be restored, and the custom web service and its resources will be undeployed.

NOTE: Some clients (e.g., cURL, jQuery) may optimize queries using lightweight methods (like HEAD or OPTIONS) to probe API behavior. If these methods are not defined in the test plan, the web service may respond with a 405 or 404 status code. If you encounter unexpected behavior, consider this use case.

How to Run Service Hijacking Scenarios

Choose your preferred method to run service hijacking scenarios:

Example scenario file: service_hijacking.yaml

Sample Scenario

service_target_port: http-web-svc # The port of the service to be hijacked (can be named or numeric, based on the workload and service configuration).
service_name: nginx-service # The name of the service that will be hijacked.
service_namespace: default # The namespace where the target service is located.
image: quay.io/krkn-chaos/krkn-service-hijacking:v0.1.3 # Image of the krkn web service to be deployed to receive traffic.
chaos_duration: 30 # Total duration of the chaos scenario in seconds.
privileged: True # True or false if need privileged securityContext to run
plan:
  - resource: "/list/index.php" # Specifies the resource or path to respond to in the scenario. For paths, both the path and query parameters are captured but ignored. For resources, only query parameters are captured.

    steps:                      # A time-based plan consisting of steps can be defined for each resource.
      GET:                      # One or more HTTP methods can be specified for each step. Note: Non-standard methods are supported for fully custom web services (e.g., using NONEXISTENT instead of POST).

        - duration: 15          # Duration in seconds for this step before moving to the next one, if defined. Otherwise, this step will continue until the chaos scenario ends.

          status: 500           # HTTP status code to be returned in this step.
          mime_type: "application/json" # MIME type of the response for this step.
          payload: |            # The response payload for this step.
            {
              "status":"internal server error"
            }
        - duration: 15
          status: 201
          mime_type: "application/json"
          payload: |
            {
              "status":"resource created"
            }
      POST:
        - duration: 15
          status: 401
          mime_type: "application/json"
          payload: |
            {
               "status": "unauthorized"
            }
        - duration: 15
          status: 404
          mime_type: "text/plain"
          payload: "not found"

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - service_hijacking_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - service_hijacking_scenarios:
            - scenarios/service-hijack-1.yaml
            - scenarios/service-hijack-2.yaml
            - scenarios/service-hijack-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - service_hijacking_scenarios:
            - scenarios/service-hijack.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - network_chaos_scenarios:
            - scenarios/network-chaos.yaml
        - service_hijacking_scenarios:  # Same type can appear multiple times
            - scenarios/service-hijack-2.yaml

Run

python run_kraken.py --config config/config.yaml

This scenario reroutes traffic intended for a target service to a custom web service that is automatically deployed by Krkn. This web service responds with user-defined HTTP statuses, MIME types, and bodies. For more details, please refer to the following documentation.

Run

Unlike other krkn-hub scenarios, this one requires a specific configuration due to its unique structure. You must set up the scenario in a local file following the scenario syntax, and then pass this file’s base64-encoded content to the container via the SCENARIO_BASE64 variable.

$ podman run  --name=<container_name> \
              -e SCENARIO_BASE64="$(base64 -w0 <scenario_file>)" \
              -v <path_to_kubeconfig>:/home/krkn/.kube/config:Z containers.krkn-chaos.dev/krkn-chaos/krkn-hub:service-hijacking
              
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ export SCENARIO_BASE64="$(base64 -w0 <scenario_file>)"
$ docker run $(./get_docker_params.sh) --name=<container_name> \
                                       --net=host --pull=always \
                                       -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
                                       -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:service-hijacking
OR 
$ docker run --name=<container_name> -e SCENARIO_BASE64="$(base64 -w0 <scenario_file>)" \
                                     --net=host --pull=always \
                                     -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
                                     -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:service-hijacking

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

ecause the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected: example:

export <parameter_name>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description
SCENARIO_BASE64	Base64 encoded service-hijacking scenario file. Note that the -w0 option in the command substitution `SCENARIO_BASE64="$(base64 -w0 <scenario_file>)"` is mandatory in order to remove line breaks from the base64 command output

A sample scenario file can be found here, you’ll need to customize it based on your wanted response codes for API calls

Note

For example:

$ podman run -e SCENARIO_BASE64="$(base64 -w0 <scenario_file>)" \
             --name=<container_name> \
             --net=host --pull=always \
             --env-host=true \
             -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
             -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
             -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
             -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:service-hijacking

krknctl run service-hijacking [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Required	Default
`--scenario-file-path`	The absolute path of the scenario file compiled following the documentation	file_base64	Yes

A sample scenario file can be found here, you’ll need to customize it based on your wanted response codes for API calls

Note

Note that the -w0 option in the command substitution SCENARIO_BASE64="$(base64 -w0 <scenario_file>)" is mandatory in order to remove line breaks from the base64 command output

To see all available scenario options

krknctl run service-hijacking --help

23 - Syn Flood Scenarios

Syn Flood Scenarios

This scenario generates a substantial amount of TCP traffic directed at one or more Kubernetes services within the cluster to test the server’s resiliency under extreme traffic conditions. It can also target hosts outside the cluster by specifying a reachable IP address or hostname. This scenario leverages the distributed nature of Kubernetes clusters to instantiate multiple instances of the same pod against a single host, significantly increasing the effectiveness of the attack. The configuration also allows for the specification of multiple node selectors, enabling Kubernetes to schedule the attacker pods on a user-defined subset of nodes to make the test more realistic.

The attacker container source code is available here.

How to Run Syn Flood Scenarios

Choose your preferred method to run syn flood scenarios:

Example scenario file: syn_flood.yaml

Sample scenario config

packet-size: 120 # hping3 packet size
window-size: 64 # hping 3 TCP window size
duration: 10 # chaos scenario duration
namespace: default # namespace where the target service(s) are deployed
target-service: target-svc # target service name (if set target-service-label must be empty)
target-port: 80 # target service TCP port
target-service-label : "" # target service label, can be used to target multiple target at the same time
                          # if they have the same label set (if set target-service must be empty)
number-of-pods: 2 # number of attacker pod instantiated per each target
image: quay.io/krkn-chaos/krkn-syn-flood # syn flood attacker container image
attacker-nodes: # this will set the node affinity to schedule the attacker node. Per each node label selector
                # can be specified multiple values in this way the kube scheduler will schedule the attacker pods
                # in the best way possible based on the provided labels. Multiple labels can be specified
  kubernetes.io/hostname:
    - host_1
    - host_2
  kubernetes.io/os:
    - linux

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - syn_flood_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - syn_flood_scenarios:
            - scenarios/syn-flood-1.yaml
            - scenarios/syn-flood-2.yaml
            - scenarios/syn-flood-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - syn_flood_scenarios:
            - scenarios/syn-flood.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - network_chaos_scenarios:
            - scenarios/network-chaos.yaml
        - syn_flood_scenarios:  # Same type can appear multiple times
            - scenarios/syn-flood-2.yaml

Run

python run_kraken.py --config config/config.yaml

Syn Flood scenario

This scenario simulates a user-defined surge of TCP SYN requests directed at one or more services deployed within the cluster or an external target reachable by the cluster. For more details, please refer to the following documentation.

Run

$ podman run --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -e TARGET_PORT=<target_port> \
  -e NAMESPACE=<target_namespace> \
  -e TOTAL_CHAOS_DURATION=<duration> \
  -e TARGET_SERVICE=<target_service> \
  -e NUMBER_OF_PODS=10 \
  -e NODE_SELECTORS=<key>=<value>;<key>=<othervalue> \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:syn-flood

$ podman logs -f <container_name or container_id>

$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}"

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -e TARGET_PORT=<target_port> \
  -e NAMESPACE=<target_namespace> \
  -e TOTAL_CHAOS_DURATION=<duration> \
  -e TARGET_SERVICE=<target_service> \
  -e NUMBER_OF_PODS=10 \
  -e NODE_SELECTORS=<key>=<value>;<key>=<othervalue> \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:syn-flood

$ docker logs -f <container_name or container_id>

$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}"

TIP: Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && \
chmod 444 ~/kubeconfig && \
docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v ~kubeconfig:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Default
PACKET_SIZE	The size in bytes of the SYN packet	120
WINDOW_SIZE	The TCP window size between packets in bytes	64
TOTAL_CHAOS_DURATION	The number of seconds the chaos will last	120
NAMESPACE	The namespace containing the target service and where the attacker pods will be deployed	default
TARGET_SERVICE	The service name (or the hostname/IP address in case an external target will be hit) that will be affected by the attack. Must be empty if TARGET_SERVICE_LABEL will be set
TARGET_PORT	The TCP port that will be targeted by the attack
TARGET_SERVICE_LABEL	The label that will be used to select one or more services. Must be left empty if TARGET_SERVICE variable is set
NUMBER_OF_PODS	The number of attacker pods that will be deployed	2
IMAGE	The container image that will be used to perform the scenario	quay.io/krkn-chaos/krkn-syn-flood:latest
NODE_SELECTORS	The node selectors are used to guide the cluster on where to deploy attacker pods. You can specify one or more labels in the format key=value;key=value2 (even using the same key) to choose one or more node categories. If left empty, the pods will be scheduled on any available node, depending on the cluster’s capacity.

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:syn-flood

krknctl run syn-flood [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Required	Default
`--packet-size`	The size in bytes of the SYN packet	number	No	120
`--window-size`	The TCP window size between packets in bytes	number	No	64
`--chaos-duration`	The number of seconds the chaos will last	number	No	120
`--namespace`	The namespace containing the target service and where the attacker pods will be deployed	string	No	default
`--target-service`	The service name (or the hostname/IP address in case an external target will be hit) that will be affected by the attack.Must be empty if TARGET_SERVICE_LABEL will be set	string	No
`--target-port`	The TCP port that will be targeted by the attack	number	Yes
`--target-service-label`	The label that will be used to select one or more services.Must be left empty if TARGET_SERVICE variable is set	string	No
`--number-of-pods`	The number of attacker pods that will be deployed	number	No	2
`--image`	The container image that will be used to perform the scenario	string	No	quay.io/krkn-chaos/krkn-syn-flood:latest
`--node-selectors`	The node selectors are used to guide the cluster on where to deploy attacker pods. You can specify one or more labels in the format key=value;key=value2 (even using the same key) to choose one or more node categories. If left empty, the pods will be scheduled on any available node, depending on the cluster s capacity.	string	No

To see all available scenario options

krknctl run syn-flood --help

24 - Time Scenarios

Using this type of scenario configuration, one is able to change the time and/or date of the system for pods or nodes.

How to Run Time Scenarios

Choose your preferred method to run time scenarios:

Example scenario file: time_scenarios_example.yml

Configuration Options:

action: skew_time or skew_date.

object_type: pod or node.

namespace: namespace of the pods you want to skew. Needs to be set if setting a specific pod name.

label_selector: Label on the nodes or pods you want to skew.

container_name: Container name in pod you want to reset time on. If left blank it will randomly select one.

object_name: List of the names of pods or nodes you want to skew.

Refer to time_scenarios_example config file.

time_scenarios:
  - action: skew_time
    object_type: pod
    object_name:
      - apiserver-868595fcbb-6qnsc
      - apiserver-868595fcbb-mb9j5
    namespace: openshift-apiserver
    container_name: openshift-apiserver
  - action: skew_date
    object_type: node
    label_selector: node-role.kubernetes.io/worker

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - time_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - time_scenarios:
            - scenarios/time-skew-1.yaml
            - scenarios/time-skew-2.yaml
            - scenarios/time-skew-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - time_scenarios:
            - scenarios/time-skew.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - time_scenarios:  # Same type can appear multiple times
            - scenarios/time-skew-2.yaml

Run

python run_kraken.py --config config/config.yaml

This scenario skews the date and time of the nodes and pods matching the label on a Kubernetes/OpenShift cluster. More information can be found here.

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:time-scenarios
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:time-scenarios
$ docker run \
  -e <VARIABLE>=<value> \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:time-scenarios

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

example:

export <parameter_name>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Default
OBJECT_TYPE	Object to target. Supported options: pod, node	pod
LABEL_SELECTOR	Label of the container(s) or nodes to target	k8s-app=etcd
ACTION	Action to run. Supported actions: skew_time, skew_date	skew_date
OBJECT_NAME	List of the names of pods or nodes you want to skew ( optional parameter )	[]
CONTAINER_NAME	Container in the specified pod to target in case the pod has multiple containers running. Random container is picked if empty	""
NAMESPACE	Namespace of the pods you want to skew, need to be set only if setting a specific pod name	""

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:time-scenarios

krknctl run time-scenarios [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Required	Default
`--object-type`	Object to target. Supported options `pod` or `node`	enum	No	pod
`--label-selector`	Label of the container(s) or nodes to target	string	No	“k8s-app=etcd”
`--action`	Action to run. Supported actions: `skew_time` or `skew_date`	enum	No	skew_date
`--object-names`	List of the names of pods or nodes you want to skew	string	No
`--container-name`	Container in the specified pod to target in case the pod has multiple containers running. Random container is picked if empty	string	No
`--namespace`	Namespace of the pods you want to skew, need to be set only if setting a specific pod name	string	No

To see all available scenario options

krknctl run time-scenarios --help

Demo

See a demo of this scenario:

25 - Zone Outage Scenarios

Scenario to create outage in a targeted zone in the public cloud to understand the impact on both Kubernetes/OpenShift control plane as well as applications running on the worker nodes in that zone.

There are 2 ways these scenarios run: For AWS, it tweaks the network acl of the zone to simulate the failure and that in turn will stop both ingress and egress traffic from all the nodes in a particular zone for the specified duration and reverts it back to the previous state.

For GCP, it in a specific zone you want to target and finds the nodes (master, worker, and infra) and stops the nodes for the set duration and then starts them back up. The reason we do it this way is because any edits to the nodes require you to first stop the node before performing any updates. So, editing the network as the AWS way would still require you to stop the nodes first.

How to Run Zone Outage Scenarios

Choose your preferred method to run zone outage scenarios:

Zone outage can be injected by placing the zone_outage config file under zone_outages option in the kraken config. Refer to zone_outage_scenario config file for the parameters that need to be defined.

Example scenario files from scenarios-hub:

zone_outage.yaml (AWS)
zone_outage_gcp.yaml (GCP)

Refer to cloud setup to configure your cli properly for the cloud provider of the cluster you want to run zone outages on

Current accepted cloud types:

Sample scenario config

zone_outage:                                         # Scenario to create an outage of a zone by tweaking network ACL.
  cloud_type: aws                                    # Cloud type on which Kubernetes/OpenShift runs. aws is the only platform supported currently for this scenario.
  duration: 600                                      # Duration in seconds after which the zone will be back online.
  vpc_id:                                            # Cluster virtual private network to target.
  subnet_id: [subnet1, subnet2]                      # List of subnet-id's to deny both ingress and egress traffic.

Note

vpc_id and subnet_id can be obtained from the cloud web console by selecting one of the instances in the targeted zone ( us-west-2a for example ).

zone_outage:                                         # Scenario to create an outage of a zone by tweaking network ACL
  cloud_type: gcp                                    # cloud type on which Kubernetes/OpenShift runs. aws is only platform supported currently for this scenario.
  duration: 600                                      # duration in seconds after which the zone will be back online
  zone: <zone>                                       # Zone of nodes to stop and then restart after the duration ends
  kube_check: True                                   # Run kubernetes api calls to see if the node gets to a certain state during the scenario

Note

Multiple zones will experience downtime in case of targeting multiple subnets which might have an impact on the cluster health especially if the zones have control plane components deployed.

AWS- Debugging steps in case of failures

In case of failures during the steps which revert back the network acl to allow traffic and bring back the cluster nodes in the zone, the nodes in the particular zone will be in NotReady condition. Here is how to fix it:

OpenShift by default deploys the nodes in different zones for fault tolerance, for example us-west-2a, us-west-2b, us-west-2c. The cluster is associated with a virtual private network and each zone has its own subnet with a network acl which defines the ingress and egress traffic rules at the zone level unlike security groups which are at an instance level.
From the cloud web console, select one of the instances in the zone which is down and go to the subnet_id specified in the config.
Look at the network acl associated with the subnet and you will see both ingress and egress traffic being denied which is expected as Kraken deliberately injects it.
Kraken just switches the network acl while still keeping the original or default network acl around, switching to the default network acl from the drop-down menu will get back the nodes in the targeted zone into Ready state.

GCP - Debugging steps in case of failures

In case of failures during the steps which bring back the cluster nodes in the zone, the nodes in the particular zone will be in NotReady condition. Here is how to fix it:

From the gcp web console, select one of the instances in the zone which is down
Kraken just stops the node, so you’ll just have to select the stopped nodes and START them. This will get back the nodes in the targeted zone into Ready state

How to Use Plugin Name

Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

kraken:
    kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
    ..
    chaos_scenarios:
        - zone_outages_scenarios:
            - scenarios/<scenario_name>.yaml

Note

You can specify multiple scenario files of the same type by adding additional paths to the list:

kraken:
    chaos_scenarios:
        - zone_outages_scenarios:
            - scenarios/zone-outage-1.yaml
            - scenarios/zone-outage-2.yaml
            - scenarios/zone-outage-3.yaml

You can also combine multiple different scenario types in the same config.yaml file. Scenario types can be specified in any order, and you can include the same scenario type multiple times:

kraken:
    chaos_scenarios:
        - zone_outages_scenarios:
            - scenarios/zone-outage.yaml
        - pod_disruption_scenarios:
            - scenarios/pod-kill.yaml
        - node_scenarios:
            - scenarios/node-reboot.yaml
        - zone_outages_scenarios:  # Same type can appear multiple times
            - scenarios/zone-outage-2.yaml

Run

python run_kraken.py --config config/config.yaml

Run

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:zone-outages
$ podman logs -f <container_name or container_id> # Streams Kraken logs
$ podman inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Note

$ docker run $(./get_docker_params.sh) \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:zone-outages
$ docker run \
  -e <VARIABLE>=<value> \
  --name=<container_name> \
  --net=host \
  --pull=always \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:zone-outages

$ docker logs -f <container_name or container_id> # Streams Kraken logs
$ docker inspect <container-name or container-id> \
  --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario

Tip

Because the container runs with a non-root user, ensure the kube config is globally readable before mounting it in the container. You can achieve this with the following commands:

kubectl config view --flatten > ~/kubeconfig && chmod 444 ~/kubeconfig && docker run $(./get_docker_params.sh) --name=<container_name> --net=host --pull=always -v ~kubeconfig:/home/krkn/.kube/config:Z -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:<scenario>

Supported parameters

The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

Example if –env-host is used:

export <parameter_name>=<value>

OR on the command line like example:

-e <VARIABLE>=<value>

See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

Parameter	Description	Type	Default
CLOUD_TYPE	Cloud platform on top of which cluster is running, supported cloud platforms	enum	aws or gcp
DURATION	Duration in seconds after which the zone will be back online	number	600
VPC_ID	cluster virtual private network to target ( REQUIRED for AWS )	string	""
SUBNET_ID	subnet-id to deny both ingress and egress traffic ( REQUIRED for AWS ). Format: [subenet1, subnet2]	string	""
ZONE	zone you want to target ( REQUIRED for GCP )	string	""
DEFAULT_ACL_ID	(Optional) AWS Network ACL ID to reuse instead of creating a new one. If provided, this ACL will not be deleted after the scenario	string	""
The following environment variables need to be set for the scenarios that requires intereacting with the cloud platform API to perform the actions:

Amazon Web Services

$ export AWS_ACCESS_KEY_ID=<>
$ export AWS_SECRET_ACCESS_KEY=<>
$ export AWS_DEFAULT_REGION=<>

Google Cloud Platform

export GOOGLE_APPLICATION_CREDENTIALS="<serviceaccount.json>"

Note

For example:

$ podman run \
  --name=<container_name> \
  --net=host \
  --pull=always \
  --env-host=true \
  -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
  -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
  -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
  -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:zone-outages

krknctl run zone-outages [--<parameter> <value>]

Can also set any global variable listed here

Scenario specific parameters:

Parameter	Description	Type	Required	Default
`--cloud-type`	Cloud platform on top of which cluster is running, supported platforms - aws, gcp	enum	No	aws
`--duration`	Duration in seconds after which the zone will be back online	number	No	600
`--vpc-id`	cluster virtual private network to target	string	No
`--subnet-id`	subnet-id to deny both ingress and egress traffic ( REQUIRED ). Format: [subnet1, subnet2]	string	No
`--zone`	cluster zone to target (only for gcp cloud type )	string	No
`--kube-check`	Connecting to the kubernetes api to check the node status, set to False for SNO	enum	No
`--aws-access-key-id`	AWS Access Key Id	string (secret)	No
`--aws-secret-access-key`	AWS Secret Access Key	string (secret)	No
`--aws-default-region`	AWS default region	string	No
`--gcp-application-credentials`	GCP application credentials file location	file	No

NOTE: The secret string types will be masked when scenario is ran

To see all available scenario options

krknctl run zone-outages --help

Demo

See a demo of this scenario: