This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Power Outage Scenarios

    This scenario shuts down Kubernetes/OpenShift cluster for the specified duration to simulate power outages, brings it back online and checks if it’s healthy.

    How to Run Power Outage Scenarios

    Choose your preferred method to run power outage scenarios:

    Power Outage/ Cluster shut down scenario can be injected by placing the shut_down config file under cluster_shut_down_scenario option in the kraken config. Refer to cluster_shut_down_scenario config file.

    Example scenario file: cluster_shut_down_scenario.yml

    Refer to cloud setup to configure your cli properly for the cloud provider of the cluster you want to shut down.

    Current accepted cloud types:

    cluster_shut_down_scenario:                          # Scenario to stop all the nodes for specified duration and restart the nodes.
      runs: 1                                            # Number of times to execute the cluster_shut_down scenario.
      shut_down_duration: 120                            # Duration in seconds to shut down the cluster.
      cloud_type: aws                                    # Cloud type on which Kubernetes/OpenShift runs.
    

    How to Use Plugin Name

    Add the plugin name to the list of chaos_scenarios section in the config/config.yaml file

    kraken:
        kubeconfig_path: ~/.kube/config                     # Path to kubeconfig
        ..
        chaos_scenarios:
            - cluster_shut_down_scenarios:
                - scenarios/<scenario_name>.yaml
    

    Run

    python run_kraken.py --config config/config.yaml
    

    This scenario shuts down Kubernetes/OpenShift cluster for the specified duration to simulate power outages, brings it back online and checks if it’s healthy. More information can be found here

    Right now power outage and cluster shutdown are one in the same. We originally created this scenario to stop all the nodes and then start them back up how a customer would shut their cluster down.

    In a real life chaos scenario though, we figured this scenario was close to if the power went out on the aws side so all of our ec2 nodes would be stopped/powered off. We tried to look at if aws cli had a way to forcefully poweroff the nodes (not gracefully) and they don’t currently support so this scenario is as close as we can get to “pulling the plug”

    Run

    If enabling Cerberus to monitor the cluster and pass/fail the scenario post chaos, refer docs. Make sure to start it before injecting the chaos and set CERBERUS_ENABLED environment variable for the chaos injection container to autoconnect.

    $ podman run \
      --name=<container_name> \
      --net=host \
      --pull=always \
      --env-host=true \
      -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
      -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:power-outages
    $ podman logs -f <container_name or container_id> # Streams Kraken logs
    $ podman inspect <container-name or container-id> \
      --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario
    
    $ docker run $(./get_docker_params.sh) \
      --name=<container_name> \
      --net=host \
      --pull=always \
      -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
      -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:power-outages
    $ docker run \
      -e <VARIABLE>=<value> \
      --name=<container_name> \
      --net=host \
      --pull=always \
      -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
      -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:power-outages
    
    $ docker logs -f <container_name or container_id> # Streams Kraken logs
    $ docker inspect <container-name or container-id> \
      --format "{{.State.ExitCode}}" # Outputs exit code which can considered as pass/fail for the scenario
    

    Supported parameters

    The following environment variables can be set on the host running the container to tweak the scenario/faults being injected:

    example: export <parameter_name>=<value>

    See list of variables that apply to all scenarios here that can be used/set in addition to these scenario specific variables

    ParameterDescriptionDefault
    SHUTDOWN_DURATIONDuration in seconds to shut down the cluster1200
    CLOUD_TYPECloud platform on top of which cluster is running, supported cloud platformsaws
    TIMEOUTTime in seconds to wait for each node to be stopped or running after the cluster comes back600

    The following environment variables need to be set for the scenarios that requires intereacting with the cloud platform API to perform the actions:

    Amazon Web Services

    $ export AWS_ACCESS_KEY_ID=<>
    $ export AWS_SECRET_ACCESS_KEY=<>
    $ export AWS_DEFAULT_REGION=<>
    

    Google Cloud Platform

    TBD
    

    Azure

    TBD
    

    OpenStack

    TBD
    

    Baremetal

    TBD
    

    For example:

    $ podman run \
      --name=<container_name> \
      --net=host \
      --pull=always \
      --env-host=true \
      -v <path-to-custom-metrics-profile>:/home/krkn/kraken/config/metrics-aggregated.yaml \
      -v <path-to-custom-alerts-profile>:/home/krkn/kraken/config/alerts \
      -v <path-to-kube-config>:/home/krkn/.kube/config:Z \
      -d containers.krkn-chaos.dev/krkn-chaos/krkn-hub:container-scenarios
    
    krknctl run power-outages (optional: --<parameter>:<value> )
    

    Can also set any global variable listed here

    Scenario specific parameters:

    ParameterDescriptionTypeDefault
    --cloud-typeCloud platform on top of which cluster is running, supported platforms - aws, azure, gcp, vmware, ibmcloud, bmenumaws
    --timeoutDuration to wait for completion of node scenario injectionnumber180
    --shutdown-durationDuration to wait for completion of node scenario injectionnumber1200
    --vsphere-ipVSpere IP Addressstring
    --vsphere-usernameVSpere IP Addressstring (secret)
    --vsphere-passwordVSpere passwordstring (secret)
    --aws-access-key-idAWS Access Key Idstring (secret)
    --aws-secret-access-keyAWS Secret Access Keystring (secret)
    --aws-default-regionAWS default regionstring
    --bmc-userOnly needed for Baremetal ( bm ) - IPMI/bmc usernamestring(secret)
    --bmc-passwordOnly needed for Baremetal ( bm ) - IPMI/bmc passwordstring(secret)
    --bmc-addressOnly needed for Baremetal ( bm ) - IPMI/bmc addressstring
    --ibmc-addressIBM Cloud URLstring
    --ibmc-api-keyIBM Cloud API Keystring (secret)
    --azure-tenantAzure Tenantstring
    --azure-client-secretAzure Client Secretstring(secret)
    --azure-client-idAzure Client IDstring(secret)
    --azure-subscription-idAzure Subscription IDstring (secret)
    --gcp-application-credentialsGCP application credentials file locationfile

    NOTE: The secret string types will be masked when scenario is ran

    To see all available scenario options

    krknctl run power-outages --help
    

    Demo

    See a demo of this scenario: