• About WordPress
    • WordPress.org
    • Documentation
    • Support
    • Feedback
  • Log In
  • Register
  • Home
  • About Us
  • Blog
  • Courses
  • Contact Us

Have any question?

101daysofdevops@gmail.com
RegisterLogin
101DaysofDevops
  • Home
  • About Us
  • Blog
  • Courses
  • Contact Us

Blog

  • Home
  • Blog
  • Blog
  • My road to Gremlin Chaos Engineering Practitioner Certificate

My road to Gremlin Chaos Engineering Practitioner Certificate

  • Posted by lakhera2020
  • Date October 16, 2021
  • Comments 0 comment

Chaos Engineering is one field that always draws my attention. I came to know about it after I heard about the Netflix Simian Army toolkit https://github.com/Netflix/SimianArmy . At an initial glance, it’s hard to believe that someone using the Chaos tools in production randomly shut down any production server(chaos monkey). Later on, I watched Tammy Bryant Butow video on youtube and came to know about Gremlin. What Gremlin does is provides a hosted service that lets you run the Chaos experiment. Finally, after one week of study, I am now Gremlin Chaos Engineering Practitioner Certified.

Exam Resources

I only followed below two resources below to prepare for the exam.

  • Gremlin Tutorial: https://www.gremlin.com/community/tutorials/?ref=nav 
  • Gremlin YouTube Channel: https://www.youtube.com/channel/UC6PAoCqf2LSw6Hth-4M4yEQ
  • If you need more practice and hands-on experience, you can attend Gremlin Bootcamp https://www.gremlin.com/bootcamps/?ref=nav

Exam Format

  • Number of Questions: 20
  • Question Type: Single and Multiple Choice, Drag and Drop
  • If you still have any doubts about the exam format, please watch this video https://www.youtube.com/watch?v=TL1j2MJBE0A&t=1248s.

NOTE: Exam is free of cost; you can register via below link https://gremlin.coassemble.com/unlock/7Jan8Su

Exam Preparation

  • To prepare for the exam, the first thing you can do is to create a free account on the Gremlin website https://app.gremlin.com/?ref=nav
  1. Get familiar with how to install a gremlin agent
  • For you to attack a host, the gremlin agent needs to install on that host. Gremlin support various operating system(Ubuntu, Centos, RHEL, Windows), you can even download the Docker image https://hub.docker.com/r/gremlin/gremlin or use the helm repo.
helm repo add gremlin https://helm.gremlin.com
No alt text provided for this image
  • This is how the architecture will look like
No alt text provided for this image
  • In the case of Ubuntu, these are the steps you need to follow, as shown in the above diagram.
* echo "deb https://deb.gremlin.com/ release non-free" | sudo tee /etc/apt/sources.list.d/gremlin.list
* sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys XXXX
* sudo apt-get update && sudo apt-get install -y gremlin gremlind
  • Once these steps are done, you need to Register the installed Gremlin with the Gremlin Control Plane using your Team ID and Secret Key in Team Settings. To do that, go to the Team Settings page, make a note of TeamID and SecretKey(In case you don’t know it, click on the Reset button)
No alt text provided for this image
  • Run gremlin init command and enter the Team ID and Secret you copied in previous steps
$ gremlin init
Metadata set for [ gremlin-client-version: 2.20.0 ]
Metadata set for [ os-type: Linux ]
Metadata set for [ os-name: Ubuntu ]
AWS metadata may be present
Metadata set for [ instance-id: i-0550fdb260931639b ]
Metadata set for [ local-hostname: ip-172-31-28-103.ec2.internal ]
Metadata set for [ local-ip: 172.31.28.103 ]
Metadata set for [ public-hostname: ec2-184-73-139-79.compute-1.amazonaws.com ]
Metadata set for [ public-ip: 184.73.139.79 ]
Metadata set for [ azid: use1-az4 ]
Metadata set for [ cloud: AWS ]
Metadata set for [ image-id: ami-09e67e426f25ce0d7 ]
Metadata set for [ instance-type: t2.micro ]
Metadata set for [ region: us-east-1 ]
Metadata set for [ zone: us-east-1c ]
Unable to describe AWS tags.  The error message is: No such file or directory (os error 2)
Azure metadata may be present
Please input your Team ID: <--------
XXXXXXXX
Please input your Team Secret: <--------
Using XXXXXX for Team Id
Using 172.31.28.103 for Gremlin identifier
  • Go to the gremlin dashboard, and you will see your newly added host.
No alt text provided for this image
  • You were all set to perform various attacks by just clicking on the attack button.

Get familiar with various types of attacks you can perform via Gremlin

Using Gremlin, you can trigger various attacks depend upon the Infrastructure to target(Hosts, Containers, or Kubernetes)

For Hosts

Resource: Test against sudden changes in consumption of computing resources.

  • CPU: Test that your application behaves as expected even when CPU capacity is limited or exhausted
  • Disk: Test system and application behavior when storage space is limited or unavailable, and validate dynamic storage provisioning systems
  • IO: Test against heavy IO operations to understand their effect on your applications
  • Memory: Test your systems against memory consumption to ensure they can tolerate and perform given a sudden increase in usage
No alt text provided for this image

State: Test against unexpected changes in your environment, such as power outages, node failures, clock drift, or application crashes.

  • Process Killer: Test against application crashes and similar events by terminating specific sets of processes
  • Shutdown: Test resilience to host failures by rebooting or shutting down targeted host operating systems
  • Time Travel: Test for scenarios such as Daylight Saving Time (DST), clock drift between hosts, and expiring SSL/TLS certificates
No alt text provided for this image

Network: Test against unreliable network conditions.

  • Blackhole: Test against unreachable dependencies by dropping network traffic between services
  • DNS: Test against DNS outages, and validate both fallback DNS servers and DNS resolver configurations
  • Latency: Test your system’s responsiveness under varying network conditions by injecting a controlled delay into outbound network traffic
  • Packet Loss: Test your system’s end user experience when a percentage of outbound network packets are dropped or corrupted
No alt text provided for this image

Try to test and perform some of these attacks before the exam. E.g., to test shut down, go to State and click on shutdown; you have an option to introduce delay and reboot the host after shutdown.

No alt text provided for this image
  • You can go to the host and see what command it’s executing.
$ ps aux|grep -i gremlin
gremlin     2142  0.0  0.9  23420  9328 ?        Ssl  04:42   0:00 /usr/sbin/gremlind
gremlin     2362  0.0  0.8  23612  8516 ?        Sl   05:07   0:00 gremlin attack shutdown -d 1 -r
  • Gremlin also provides a friendly UI, where you can view this.
No alt text provided for this image
  • Similarly, you can perform other kinds of attacks like CPU attacks. In the scenario below, we run the test for 60 sec, for CPU utilization of 50% and on all cores.
No alt text provided for this image
  • You can go back to the host and check the CPU utilization using the top command.
No alt text provided for this image
  • To use Gremlin with EKS, please check this blog https://www.gremlin.com/community/tutorials/how-to-install-and-use-gremlin-with-eks/
  • To use Gremlin with RDS https://www.gremlin.com/community/tutorials/how-to-use-gremlin-with-amazon-rds/

3. Get familiar with the gremlin command line.

$ gremlin -h
gremlin
USAGE:
gremlin <SUBCOMMAND>
FLAGS:
-h, --help    Prints help information
SUBCOMMANDS:
attack                Run a new gremlin attack against this host
attack-container      Run a new gremlin attack against the specified container
check                 Show runtime troubleshooting data
help                  Prints this message or the help of the given subcommand(s)
init                  Initialize a new client session with the Gremlin service
logout                Remove this client from the Gremlin service
measure               Measure then report dynamic system data
rollback              Interrupt an active attack, or revert the last impact
rollback-container    Interrupt an active attack against a Docker container
status                Show the status of all gremlins or a specific attack
syscheck              System check was a feature in Gremlin 2.8.x and is no longer supported
validate              Validate a gremlin
version               Show version information for the gremlin binary

In the end, I will say this exam is straightforward, go through Gremlin doc and youtube(Bonus: If you can attend their Bootcamp), and you should be good to go.

The best way to connect with me is via any of the below mediums

  • Website: https://101daysofdevops.com/
  • Linkedin: https://www.linkedin.com/in/prashant-lakhera-696119b/
  • Twitter: @100daysofdevops OR @lakhera2015
  • Facebook: https://www.facebook.com/groups/795382630808645/
  • Medium: https://medium.com/@devopslearning
  • GitHub: https://github.com/100daysofdevops/100daysofdevops
  • YouTube Channel: https://www.youtube.com/user/laprashant/videos
  • Slack: https://join.slack.com/t/100daysofdevops/shared_invite/zt-au03logz-YfDUp_FJF4rAUeDEbgWmsg
  • Reddit: r/101DaysofDevops
  • Meetup: https://www.meetup.com/100daysofdevops/

Tag:chaos, cloud, devops, gremlin, netflix

  • Share:
author avatar
lakhera2020

Previous post

My road to Certified Kubernetes Security Specialist (CKS)
October 16, 2021

Next post

4 common Kubernetes Pods Error and Debugging
April 20, 2022

You may also like

Am I reading the iostat command output correctly?
25 April, 2022

Iostat command came from the same sysstat family package # rpm -qf `which iostat` sysstat-11.7.3-6.el8.x86_64 It mainly read data from /proc/diskstats # cat /proc/diskstats 259 0 nvme1n1 147 0 6536 …

Debugging Performance Issue using SAR
21 April, 2022

What is SAR? SAR is a utility used to collect and report system activity. It collects data relating to most core system functions and writes those metrics to binary data …

4 common Kubernetes Pods Error and Debugging
20 April, 2022

Why do Kubernetes Pods fail? The two most common reasons for Kubernetes pod failure is The container inside the pod doesn’t start, which we also call a startup failure. The …

Leave A Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Am I reading the iostat command output correctly?
  • Debugging Performance Issue using SAR
  • 4 common Kubernetes Pods Error and Debugging
  • My road to Gremlin Chaos Engineering Practitioner Certificate
  • My road to Certified Kubernetes Security Specialist (CKS)

Recent Comments

  • lakhera2020 on Debugging Performance Issue using SAR
  • Anonymous on Debugging Performance Issue using SAR
  • Pety on Day 2 – MetalLB Load Balancer for Bare Metal Kubernetes
  • akashambasta on Day 1 – AWS IAM User
  • rd on 100 Days of AWS

 

101daysofdevops@gmail.com

  • Home
  • About Us
  • Courses
  • Blog

© 101daysofdevops. All rights reserved.

Login with your site account

Lost your password?

Not a member yet? Register now

Register a new account

Are you a member? Login now