Becoming a SRE

This is a continuation of an earlier post about SRE. In that post we’ve seen what a SRE is and key skills require to become a SRE. Further to that, in this post we’ll see on becoming a SRE.

Cloud

  • AWS (recommended)
  • Azure
  • Google Cloud

Operating Systems

  • Linux (recommended)
  • Windows

Programming

  • Python (recommended)
  • Golang (recommended)
  • NodeJS

IaC – Infrastructure as a Code

  • Terraform (recommended)
  • Container Orchestration (recommended)
  • Configuration Management

CI & CD Tools

  • Jenkins (recommended)
  • Git & GItHub (recommended)
  • GitLab
  • Circle CI
  • Go continuous delivery
  • Bamboo

Continuous Monitoring

  • Prometheus (recommended)
  • AppDynamics (recommended)
  • Nagios
  • Zabbix
  • NewRelic

Networking/Connectivity

  • Protocols
  • Subnet/CIDR
  • Network Components (TGW, VPC, SG etc)
  • API (Rest, SOAP, XMLRPC)

Site Reliability Engineering (SRE)

I’m bit late to post in this blog in the year 2022 due to some personal exigencies. Being three months already in this year, and considering the widespread reach to the term Site Reliability Engineering, I believe the acronym SRE will be a better way to start off this year. I’m trying to convey what I’ve learned about SRE as a System Admin for more than a decade and SRE for another half a decade.

According to the person who coined this word Ben Treynor Sloss, the senior VP overseeing technical operations at Google SRE is

“what happens when a software engineer is tasked with what used to be called operations.”

In another words Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. Summarising this we can say that a SRE is a professional with solid background in coding/automation, who uses that experience to solve problems in infrastructure and operations.

If you think of DevOps as a philosophy and an approach to working, you can argue that SRE implements some of the philosophy that DevOps describes, and is somewhat closer to a concrete definition of a job or role than, say, “DevOps engineer” So, in a way, we can say:

class SRE implements DevOps;

abstract class DevOps {
  // Reduce organization silos
  abstract reduceOrganizationSilos(): BetterColaboration: 

  // Accept failure as normal
  abstract acceptFailureAsNormal(): ReliabilityGoal;

  // Implement gradual change
  abstract implementGradualChange(): ErrorBudget;

  // Leverage tooling and automation
  abstract leverageAutomation(): LongTermValue;

  // Measure everything
  abstract measureEverything(): BetterObservability;
}

class SRE implements DevOps {
  ...
}

I will explain more about SRE in this blog post quoting from the Introduction of the SRE Book [Site Reliability Engineering; How Google Runs Production Systems] written by Ben Treynor Sloss & edited by Betsy Beyer.

“Hope is not a strategy.”
-Traditional SRE saying
It is a truth universally acknowledged that systems do not run themselves. How, then, should a system — particularly a complex computing system that operates at a large scale — be run?

https://sre.google/sre-book/introduction/

When we say “Hope is not a strategy” we mean: We need to apply best practices, instead of just letting software and new features launch and trusting that it will be successful. We use it to call out anyone who is letting something happen (such as a launch or running a system) without applying the proper principles and best practices. The book clearly defines the Principles, Practices and Management about the Site Reliability Engineering in a better way.

A site reliability engineer can be a generalist or a specialist. Depending on the individual skill set organizations can engage a SRE in a number of general or specialist roles like: Educator, SLO guard, Infra architect, Incident response leader etc. Details about SLA, SLO, SLI can be found in a previous post here. SRE’s may contribute to the code base of a product or write development policies and procedures as and when needed. Workflows, priorities and day-to-day operations for SRE vary from team to team. They all share a set of basic responsibilities for the service(s)/products(s)/platform(s) they support and always adhere to the core responsibility for availability, latency, performance, monitoring, efficiency, change management, emergency response and capacity planning. As defined in SRE book google caps operational work for SREs at 50% of their time and the remaining should be spent on their coding skills and project works. They achieve this by reintegrating developers into on-call rotations, routing excess operational work to the product development team and even re assigning bus and tickets to development or engineering managers.

One of the key responsibility of SRE is to quantify confidence in the systems they maintain. Confidence can be measured both by past reliability and future reliability. Past reliability is captured by analysing monitoring data historically and future reliability by predictions based on the past system behavior. We will discuss more on the Principles, Practices and Management about the Site Reliability Engineering in the later posts which will be followed shortly after this.

A SRE has responsibility for all these areas:

  • General systems uptimes
  • Systems performance
  • Latency
  • Incident and outage management
  • Systems and application monitoring
  • Change management
  • Capacity planning

In a nutshell Service Reliability hierarchy is as follows,

Service Reliability Hierarchy

It’s easy to define what site reliability engineers do, but which skills exactly do SREs need to perform their jobs is a much more undefined or complicated question. As mentioned earlier though the SRE skills widely vary from team to team depending on multiple factors like – types of systems managed, types of reliability challenges faced etc.: modern SREs or aspiring SREs need a core set of standard skills that helps them to understand, manage and deploy complex distributed systems at any typical organizations today.

Now we can look in to skill sets that a SRE should master:

Coding:

Coding is an essential skill to master for a SRE role. Depending on the role understanding development and coding can go a long way. As day-to-day tasks of an SRE include automating processes and dealing with systems, knowing Bash, Python, Yaml and Golang can help you in the long run.

Version Control Tools:

As a SRE, while working with code, you’ll be using Git or some other kind of version control tool. So it makes sense to learn about version control tools mainly distributed verson control systems. So it’s better to have a good understanding of Git and GitHub.

Cloud Computing:

Cloud computing is on of the niche skills that modern SREs can’t live without. Around 90% of business uses cloud in any format available private, public, hydbrid. Realiability of cloud platform cannot be managed if you don’t understand the cloud architecture, cloud networking. data storage, observability and so on and so forth.

Distributed Computing:

Knowing how distributed computing works and understanding the concept of microservices are both significant advantages for an SRE. You’ll be handling large, distributed systems, so having some experience with these topics can really help you progress as a SRE.

Agile & DevOps:

As we already mentioned earlier that class SRE implements DevOps. Many would say that SRE is to DevOps what Scrum is to Agile. DevOps is not a role, it is more of a cultural aspect and can’t be assigned to a person but shoud be done as a team. DevOps engineer most times is just a title used to hire system admins. SREs focus more on the aspects of system availability, observability and scaling. DevOps is a practice of bringing development and operations teams together whereas Agile is an iterative approach that focuses on collaboration, customer feedback and small rapid releases. DevOps focuses on constant testing and delivery while the Agile process focuses on constant changes. Automation is the key to DevOps and we need some tools to do DevOps. Understanding these toolsets and afore mentioned cultural aspect of DevOps is very much in need for being a SRE.

Operating Systems:

Basically a good understanding Operating Systems usually Linux or Windows which is common in most organisations will be helpful. In this Cloud & DevOps era, most public cloud management tools, toolsets that are part of DevOps follow the conventions of Linux CLI. Cloud Native systems like Kubernetes, containers also follow the same CLI principles even if you run them in a Windows environment. So it’s an essential skill for any SRE to work with Linux or *NIX systems even if you come from a Windows background.

Understanding of Databases:

NoSQL databases, there are many types, and each has pretty specific use cases where they excel. Compare and contrast with relational databases like MySQL. This is an excellent time to dive into understanding what a data model is, why data models are necessary, and how the data model should inform your choice of database and your service architecture.

Cloud Native Applications:

Knowing cloud native applications is another skill to master as a SRE. You don’t have to know them in depth, but here are some knowledge areas that can help your organization and you as you get on the road to becoming a successful SRE. Knowing what docker is having some idea about how containers work and understanding how to run a secure application using Kubernetes is also a set of skills to master as a SRE.

Networking:

In the modern distributed environments at scale, networking plays a pivotal role. It is also considered as culprit when something goes wrong. Even if the organizations have different networking engineers and/or connectivity team SREs need an indepth understanding of networking and different protocols and topologies used in modern system design to know when the network is the root cause of an incident and how efficiently and effectively to resolve those issues.

Monitoring:

As we mentioned earlier monitoring is an integral part of Service Reliability hierarchy. Monitoring tools make your life easier when you’re an SRE. They give you a brief look into your system performance and issues your system is dealing with. Implementing these tools and getting insights from them is the primary goal of SRE, so the system experiences as little downtime as possible. Prometheus and Grafana are widely used monitoring solutions, so it makes sense to learn those.

CI/CD Pipelines:

It’s hard to address reliability problems that emerge from the source code or deployment process if a SRE don’t have a good understanding of how CI/CD process work and which tools are being used in that area. Even though SRE don’t typically develop software they must know how a software is written and deployed. Most organizations today rely on CI/CD pipeline for this. So this skill is also a niche skill for SREs.

Security Engineering/Response:

SREs who dont understand security fundamentals are at risk of implementing reliability solutions that are effective from a reliability standpoint and not really secure. Though this domain is one that SREs don’t own but they require significant skills in this area.

Incident Management:

SREs must know how incident response roles are structured and have to take lead in organizing the incident response team, communicating with takholders and devising best strategy to ensure rapid and effective incident resolution.

Problem Management:

As we mentioned earlier in Service Reliability hierarchy, postmortem/root cause analysis is a must for reliability engineering. Knowing how to run a postmortem and derive a RCA is considered as an important skill a SRE should possess.

Communication:

As a SRE, you’ll need to report critical incidents that affect applications or you’ll be working with software engineers and others. In all these situations, having effective, well-developed communication skills makes life much easier. To ensure there are no miscommunications while reporting incidents this is also a skill to master if you are in the path to a SRE

The list of SRE skills could go on infinitely but the skills mentioned here are best and good to have skills to transition yourselves to a SRE or if you want to excel in your current role as a SRE.

I have worked as System admin, architect etc. and the most I enjoyed was as my tenure as a SRE and SRE lead. If you enjoy working on the backend and want to get closer to your system’s performance, reliability, and scalability, then an SRE role might just be perfect for you!

RHEL 8 What’s new?

Red Hat Enterprise Linux 8 was released in Beta on November 14, 2018. There are so many features and improvements that distinguishes it from its antecedent – RHEL 7. In this blog, I’m attempting to provide a quick glance of those improvements, deprecations and the upgrade path.

Improvements:

  • YUM command is not available and DNF command replaces it. If you’ve worked on Fedora, DNF was a the default package manager in it.
  • chronyd is the default network time protocol wrapper instead of ntpd
  • Repo channels names changed, but content of them is mostly the same. CodeReady Linux Builder repository was added. It is similar to EPEL and supplies additional packages, which are not supported for production use.
  • One of the biggest improvement in RHEL 8 system performance is the new upper limit on physical memory capacity. Now has 4 PB of physical memory capacity compared to 64TB of system memory in RHEL 7
  • RPM command is also upgraded. The rpmbuild command can now do all build steps from a source package directly. the new –reinstall option allows to reinstall a previously installed package. there is a new rpm2archive utility for converting rpm payload to tar archives.
  • TCP networking stack is Improved. RedHat claims that the version 4.18 provides higher performances, better scalability, and more stability
  • RHEL 8 supports Open SSL 1.1.1 and TLS 1.3 cryptographic standard by default
  • BIND version is upgraded to 9.11 by default and introduces new features and feature changes compared to version 9.10.
  • Apache HTTP Server, has been updated from version 2.4.6 to version 2.4.37 between RHEL 7 and RHEL 8. This updated version includes several new features, but maintains backwards compatibility with the RHEL 7 version
  • RHEL 8 introduces nginx 1.14, a web and proxy server supporting HTTP and other protocols
  • OpenSSH was upgraded to version 7.8p1.
  • Vim runs default.vim script, if no ~/.vimrc file is available.
  • The ‘nobody’ & ‘nfsnobody’ user and groups are merged into ‘nobody’ ID (65534).
  • In RHEL 8, for some daemons like cups, the logs are no longer stored in specific files within the /var/log directory, which was used in RHEL 7. Instead, thet are stored only in systemd-journald.
  • Now you are forced to switch to Chronyd. The old NTP implementation is not supported in RHEL8.
  • NFS over UDP (NFS3) is no longer supported. The NFS configuration file moved to “/etc/nfs.conf”. when upgrading from RHEL7 the file is moved automatically.
  • For desktop users, Wayland is the default display server as a replacement for the X.org server. Yet X.Org is still available. Legacy X11 applications that cannot be ported to Wayland automatically use Xwayland as a proxy between the X11 legacy clients and the Wayland compositor.
  • Iptables were replaced by the nftables as a default network filtering framework. This update adds the iptables-translate and ip6tables-translate tools to convert the existing iptables or ip6tables rules into the equivalent ones for nftables.
  • GCC toolchain is based on the GCC 8.2
  • Python version installed by default is 3.6, which introduced incompatibilities with scripts written for Python 2.x but, Python 2.7 is available in the python2 package.
  • Perl 5.26, distributed with RHEL 8. The current directory . has been removed from the @INC module search path for security reasons. PHP 7.2 is also added
  • For working with containers, Red hat expects you to use the podman, buildah, skopeo, and runc tools. The podman tool manages pods, container images, and containers on a single node. It is built on the libpod library, which enables management of containers and groups of containers, called pods.
  • The basic installation provides a new version of the ifup and ifdown scripts which call NetworkManager through the nmcli tool. The NetworkManager-config-server package is only installed by default if you select either the Server or Server with GUI base environment during the setup. If you selected a different environment, use the yum install NetworkManager-config-server command to install the package.
  • Node.js, a software development platform in the JavaScript programming language, is provided for the first time in RHEL. It was previously available only as a Software Collection. RHEL 8 provides Node.js 10.
  • DNF modules improve package management.
  • New tool called Image Builder enables users to create customized RHEL images. Image Builder is available in AppStream in the lorax-composer package. Among other things, it allows created live ISO disk image and images for Azure, VMWare and AWS, See Composing a customized RHEL system image.
  • Some new storage management capabilities were introduced. Stratis is a new local storage manager. It provides managed file systems on top of pools of storage with additional features to the user. Also supports file system snapshots, and LUKSv2 disk encryption with Network-BoundDisk Encryption (NBDE).
  • VMs by default are managed via Cockpit. If required virt-manager could also be installed. Cockpit web console is available by default. It provides basic stats of the server much like Nagios and access to logs. Packages for the RHEL 8 web console, also known as Cockpit, are now part of Red Hat Enterprise Linux default repositories, and can therefore be immediately installed on a registered RHEL 8 system. (You should be using this extensively if you’re using KVM implementations of RHEL 8 virtual machines)

Deprecations:

  • Yum package is deprecated and Yum command is just a symbolic link to dnf.
  • NTP implementation is not supported in RHEL8
  • Network scripts are deprecated; ifup and ifdown map to nm-cli
  • Digital Signature Algorithm (DSA) is considered deprecated.. Authentication mechanisms that depend on DSA keys do not work in the default configuration.
  • rdist is removed as well as rsh and all r-utilities.
  • X.Org display server was replaced by Wayland’ from Gnome
  • tcp_wrappers were removed. Not clear what happened with programs previously compiled with tcp-wrapper support such as Postfix.
  • Iptables are deprecated.
  • Limited support for python 2.6.
  • KDE support has been deprecated.
  • The Up-gradation from KDE on RHEL 7 to GNOME on RHEL 8 is unsupported.
  • Removal of Btrfs support.
  • Docker is not included in RHEL 8.0.

Upgrade:

Release of RHEL 8 gives opportunity for those who still are using RHEL 6 to skip RHEL 7 completely for new server installations. RHEL 7 has five years before EOL (June 30, 2024) while many severs last more then five years now. Theoretically upgrade from RHEL 6 to RHEL 8 is possible via upgrade to RHEL 7 first, but is too risky. RHEL 8 is distributed through two main repositories: Please follow RHEL8 Upgrade path.

Base OS

Content in the BaseOS repository is intended to provide the core set of the underlying OS functionality that provides the foundation for all installations. This content is available in the RPM format and is subject to support terms similar to those in previous releases of RHEL. For a list of packages distributed through BaseOS.

AppStream

Content in the Application Stream repository includes additional user space applications, runtime languages, and databases in support of the varied workloads and use cases. Application Streams are available in the familiar RPM format, as an extension to the RPM format called modules, or as Software Collections. For a list of packages available in AppStream,

In addition, the CodeReady Linux Builder repository is available with all RHEL subscriptions. It provides additional packages for use by developers. Packages included in the CodeReady Linux Builder repository are unsupported. Please check RHEL 8 Package manifest.

With the idea of the Application stream, RHEL 8 is following the Fedora Modularity lead. Fedora 28, released earlier this year, by Fedora Linux distribution (considered as bleeding edge community edition of RHEL) introduced the concept of modularity. Without waiting for the next version of the operating system, Userspace components will update in less time than core operating system packages. Installations of many versions of the same packages (such as an interpreted language or a database) are also available by the use of an application stream.

Theoretically RHEL 8 will be able to withstand more heavy loads due to optimized TCP/IP stack and improvements in memory handling.

Installation has not been changed much from RHEL 7. RHEL 8 still pushes LVM for root filesystem in default installation. Without subscription you can still install packages from ISO, either directly or making it a repo. The default filesystem remains XFS, RedHat EnterpriseLinux 8 supports installing from a repository on a local hard drive. you only need to specify the directory instead of the ISO image.

For example:

inst.repo=hd::.

Kickstart also has changed but not much ( auth or authconfig are depreciated & you need to use authselect instead)

Source: Red Hat RHEL8 release notes, Red Hat Blogs, Linux Journal etc

Understanding API

API stands for Application Programming Interface. An API is a software intermediary that allows two applications to talk to each other. In other words, an API is the messenger that delivers your request to the provider that you’re requesting it from and then delivers the response back to you. We can say that an API is a set of programming instructions and standards for accessing a Web-based software application or Web tool.

As we understood API is a software-to-software interface, not a user interface. The most important part of this name is “interface,” because an API essentially talks to a program for you. You still need to know the language to communicate with the program, but without an API, you won’t get far. With APIs, applications talk to each other without any user knowledge or intervention. When programmers decide to make some of their data available to the public, they “expose endpoints,” meaning they publish a portion of the language they’ve used to build their program. Other programmers can then pull data from the application by building URLs or using HTTP clients (special programs that build the URLs for you) to request data from those endpoints.

Endpoints return text that’s meant for computers to read, so it won’t make complete sense if you don’t understand the computer code used to write it. A software company releases its API to the public so that other software developers can design products that are powered by its service.

Examples:

When bloggers put their Twitter handle on their blog’s sidebar, WordPress enables this by using Twitter’s API.

Amazon.com released its API so that Web site developers could more easily access Amazon’s product information. Using the Amazon API, a third party Web site can post direct links to Amazon products with updated prices and an option to “buy now.”

When you buy movie tickets online and enter your credit card information, the movie ticket Web site uses an API to send your credit card information to a remote application that verifies whether your information is correct. Once payment is confirmed, the remote application sends a response back to the movie ticket Web site saying it’s OK to issue the tickets. As a user, you only see one interface — the movie ticket Web site — but behind the scenes, many applications are working together using APIs. This type of integration is called seamless, since the user never notices when software functions are handed from one application to another

Docker engine comes with an API. Docker provides an API for interacting with the Docker daemon (called the Docker Engine API), as well as SDKs for Go and Python. The SDKs allow you to build and scale Docker apps and solutions quickly and easily. If Go or Python don’t work for you, you can use the Docker Engine API directly. The Docker Engine API is a RESTful API accessed by an HTTP client such as wget or curl , or the HTTP library which is part of most modern programming languages.

Types of APIs

There are four main types of APIs:

Open APIs: Also known as Public API, there are no restrictions to access these types of APIs because they are publicly available.
Partner APIs: A developer needs specific rights or licenses in order to access this type of API because they are not available to the public.
Internal APIs: Also known as Private APIs, only internal systems expose this type of API. These are usually designed for internal use within a company. The company uses this type of API among the different internal teams to be able to improve its products and services.
Composite APIs: This type of API combines different data and service APIs. It is a sequence of tasks that run synchronously as a result of the execution, and not at the request of a task. Its main uses are to speed up the process of execution and improve the performance of the listeners in the web interfaces.

API architecture types:

APIs can vary by architecture type but are generally used for one of three purposes:

System APIs access and maintain data. These types of APIs are responsible for managing all of the configurations within a system. To use an example, a system API unlocks data from a company’s billing database.

Process APIs take the data accessed with system APIs and synthesize it to create a new way to view or act on data across systems. To continue the example, a process API would take the billing information and combine it with inventory information and other data to fulfill an order.

Experience APIs add context to system and process APIs. These types of APIs make the information collected by system and process APIs understandable to a specified audience. Following the same example, an experience API could translate the data from the process and system APIs into an order status tracker that displays information about when the order was placed and when the customer should expect to receive it.

Apart from the main web APIs, there are also web service APIs:

The following are the most common types of web service APIs:

SOAP (Simple Object Access Protocol): This is a protocol that uses XML as a format to transfer data. Its main function is to define the structure of the messages and methods of communication. It also uses WSDL, or Web Services Definition Language, in a machine-readable document to publish a definition of its interface.

XML-RPC: This is a protocol that uses a specific XML format to transfer data compared to SOAP that uses a proprietary XML format. It is also older than SOAP. XML-RPC uses minimum bandwidth and is much simpler than SOAP. Example – YUM command in Linux uses XML-RPC calls

JSON-RPC: JavaScript Object Notation, this protocol is similar to XML-RPC but instead of using XML format to transfer data it uses JSON.

REST (Representational State Transfer): REST is not a protocol like the other web services, instead, it is a set of architectural principles. The REST service needs to have certain characteristics, including simple interfaces, which are resources identified easily within the request and manipulation of resources using the interface.

SOAPREST
It has strict rules and advanced security to follow.There are loose guidelines to follow allowing developers to make recommendations easily
It is driven by FunctionIt is driven by Data
It requires more BandwidthIt requires minimum Bandwidth
SOAP vs REST
JSONXML
Supports only text and numbers.Supports various types of data for example text, numbers, images, graphs, charts etc.
Focuses mainly on DataFocuses mainly on Document
It has low securityIt has more security
JSON vs XML

The web service APIs honor all the http methods like POST, GET, PUT, PATCH, DELETE. if we compare these with the CRUD operations,

HTTP MethodsCRUD
POSTCreate
GETRead
PUTUpdate/Replace
PATCHUpdate/Modify
DELETEDelete
HTTP methods vs CRUD operations

POST – The POST verb is most-often utilized to create new resources. It will return HTTP status 201 on success and returning a Location header with a link to the newly-created resource with the 201 HTTP status. POST is neither safe nor idempotent.

GET – The HTTP GET method is used to read (or retrieve) a representation of a resource GET returns a representation in XML or JSON and an HTTP response code of 200 (OK). GET is idempotent

PUT – PUT is most-often utilized for update capabilities. PUT is not a safe operation, in that it modifies (or creates) state on the server, but it is idempotent.

PATCH – PATCH is used for modify capabilities. The PATCH request only needs to contain the changes to the resource, not the complete resource. PATCH is neither safe nor idempotent. However, a PATCH request can be issued in such a way as to be idempotent,

DELETE – DELETE is pretty easy to understand. It is used to delete a resource identified by a URI. There is a caveat about DELETE idempotence as calling DELETE on a resource a second time will often return a 404 (NOT FOUND) since it was already removed and therefore is no longer available.

We will now just try creating a RESTFul API with Golang. For those who haven’t tried your hands with Golang can click here to follow the basics of Golang.

We are starting the go program which creates an API. The source code for this is available in this Git repository. A discussion on the implementation of API is out of scope for this blog post. We can discuss that in another post.

Here we concentrate only on the API and the requests and responses we receive and retrieve from the API end point. So, we can start our dummy API interface.

executing the code

Now we can check whether our API is accessible and it’s giving responses for our requests. We are doing it in command line with simple curl request. our application is listening on port 8001 which can be modified as per your wish in the code. We are now hitting the end point ‘/’

curl request

Next we try hitting the end point /events to get all the events in the dummy database created with slice and strut in the main.go file.

curl request to get all events

Now we will simulate the requests with an opensource Firefox extension RESTer. This is available for Chrome too. So for hitting the endpoint “/events” we are using the GET method. the Response 200 states that the request was successful.

GET method

In this simulation, we are using the POST method to create another event in the dummy database inside the application programmatically. Response code 201 is provided for successful creation of the event.

POST method to create new event

Now we can try hitting the endpoint “/events/{id}” which is the endpoint to retrieve one event with a GET method which will display the newly added event with “/events/2”

GET method on /events/2

We will now hit the “/events” endpoint and see whether both the evets are in the response.

GET method on /events endpoint

In the next simulation we are using the PATCH method to modify/update an existing event. In this case we are hitting the endpoint “/events/2” to modify the event id 2.

PATCH method to modify event id 2

Try to GET the expected results from the endpoint “/events” to verify our PATCH request.

GET method to verify PATCH request

In the final simulation we are hitting the endpoint “/events/1” with a DELETE method so that event id 1 will be removed/deleted.

DELETE Method to delete event id 1

Voila..!! We just created an API and tested it with dummy data. I believe we got a quick overview of what an API is and how we can use different HTTP methods to retrieve data or modify data using an API.

Site Reliability: SLI, SLO & SLA

Service Level Indicator(SLI), Service Level Object(SLO) & Service Level Agreement(SLA) are parameters with which reliability, availability and performance of the service are measured. The SLA, SLO, and SLI are related concepts though they’re different concepts.

It’s easy to get lost in a fog of acronyms, so before we dig in, here is a quick and easy definition:

  • SLA or Service Level Agreement is a contract that the service provider promises customers on service availability, performance, etc.
  • SLO or Service Level Objective is a goal that service provider wants to reach.
  • SLI or Service Level Indicator is a measurement the service provider uses for the goal.

Service Level Indicator
SLI are the parameters which indicates the successful transactions, requests served by the service over the predefined intervals of time. These parameters allows to measure much required performance and availability of the service. Measuring these parameters also enables to improve them gradually.

Key Examples are:

  • Availability/Uptime of the service.
  • Number of successful transactions/requests.
  • Consistency and durability of the data.

Service Level Objective
SLO defines the acceptable downtime of the service. For multiple components of the service, there can be different parameters which defines the acceptable downtime. It is common pattern to start with low SLO and gradually increase it.

Key Examples are:

  • Durability of disks should be 99.9%.
  • Availability of service should be 99.95%
  • Service should successfully serve 99.999% requests/transactions.

Service Level Agreement
SLA defines the penalty that service provider should pay in an event of service unavailability for pre-defined period of time. Service provider should clearly define the failure factors for which they will be accountable(Domain of responsibility). It is common pattern to have loose SLA than SLO, for instance: SLA is 99% and SLO is 99.5%. If the service is overly available, then SLA/SLO can be used as error budget to deploy complex releases to production.

Key Examples of Penalty are:

  • Partial refund of service subscription fee.
  • Additional subscription time added for free.

So here is the relationship. The service provider needs to collect metrics based on SLI, define thresholds of metrics based on SLO, and monitor the thresholds of metrics so that it won’t break SLA. In practical, the SLIs are the metrics in the monitoring system; the SLOs are alerting rules, and the SLAs are the numbers of the monitoring metrics applying to the SLOs.

Usually the SLO and the SLA are similar while the SLO is tighter than the SLA. The SLOs are generally used for internal only, and the SLAs are for external. If a service availability violates the SLO, operations need to react quickly to avoid it breaking SLA, otherwise, the company might need to refund some money to customers.

The SLA, SLO, and SLI are based on such assumption that is the service will not be available 100%. Instead, we guarantee that the system will be available greater than a certain number, for example, 99.5%.

When we apply this definition to availability, for example, SLIs are the key measurements of the availability of a system; SLOs are goals we set for how much availability we expect out of a system; and SLAs are the legal contracts that explains what happens if our system doesn’t meet its SLO.

SLIs exist to help engineering teams make better decisions. Your SLO performance is critical information to have when you’re making decisions about how hard and fast you can push your systems. SLOs are also important data points for other engineers when they’re making assumptions about their dependencies on your service or system. Lastly, your larger organization should use your SLIs and SLOs to make informed decisions about investment levels and about balancing reliability work against engineering velocity.

Note this abstract is taken from SRE Fundamentals, CRE and the book Site Reliability Engineering: How Google Runs Production Systems

Unikernel: Another paradigm for the cloud

In this cloud era it is hard to imagine a world without access to services in the cloud. From contacting someone through mail, to storing work-related documents on an online drive and accessing it across devices, there are lot of services we use on a daily basis that is in the cloud.

To reduce the cost of compute power, Virtualization has been adapted towards offering more services with less hardware. And then came the concept of containers where you deploy the application in isolated containers with light weight images which has few binaries and libraries to run your application, But still we need the underlying VMs to deploy such solutions. All these VMs comes with a cost. While large data-centers are offering services in the cloud, they are also hungry for electric power, which is becoming a growing concern as our planet is being drained of its resources. So what we need now is less power-hungry solutions.

What if, instead of virtualization of an entire operating system, you were to load an application with only the required components from the operating system? Effectively reducing the size of the virtual machine to its bare minimum resource footprint? This is where unikernels come into play.

Unikernel

Unikernel is a relatively new concept that was first introduced around 2013 by Anil Madhavapeddy in a paper titled “Unikernels: Library Operating Systems for the Cloud” (Madhavapeddy, et al., 2013).

You can find more details on Unilernel by searching the scholarly articles in Google.

Unikernels are defined by the community at Unikernel.org as follows.

“Unikernels are specialized, single-address-space machine images constructed by using library operating systems.”

For more detailed reading about the concepts behind Unikernel, please follow this link,

A Unikernel is an application that has been boiled down to a small, secure, light-weight virtual machine which eliminates general purpose operating systems such as Linux or Windows. Unikernels aims to be a much more secure system than Linux. It does this through several thrusts. Not having the notion of users, running a single process per VM, and limiting the amount of code that is incorporated into each VM. This means that there are no users and no shell to login to and, more importantly, you can’t run more than the one program you want to run inside. Despite their relatively young age, unikernels borrow from age-old concepts rooted in the dawn of the computer era: microkernels and library operating systems. This means that a unikernel holds a single application. Single-address space means that in its core, the unikernel does not have separate user and kernel address space. Library operating systems are the core of unikernel systems. Unikernels are provisioned directly on the hypervisor without a traditional system like Linux. So it can run 1000X more vms/per server.

You can have a look in here for details about Microkernel, Monolithic & Library Operating Systems

Virtual Machines VS Linux Containers VS Unikernel

Virtualization of services can be implemented in various ways. One of the most widespread methods today is through virtual machine, hosted on hypervisors such as VMware’s ESXi or Linux Foundation’s Xen Project.

Hypervisors allow hosting multiple guest operating systems on a single physical machine. These guest operating systems are executed in what is called virtual machines. The widespread use of hypervisors is due to their ability to better distribute and optimize the workload on the physical servers as opposed to legacy infrastructures of one operating system per physical server.

Containers are another method of virtualization, which differentiates from hypervisors by creating virtualized environments and sharing the host’s kernel. This provides a lighter approach to hypervisors which requires each guest to have their copy of the operating system kernel, making a hypervisor-virtualized environment resource heavy in contrast to containers which share parts of the existing operating system.

As aforementioned, unikernels leverage the abstraction of hypervisors in addition to using library operating systems to only include the required kernel routines alongside the application to present the lightest of all three solutions.

The figure above shows the major difference between the three virtualization technologies. Here we can clearly see that virtual machines present a much larger load on the infrastructure as opposed to containers and unikernels.

Additionally, unikernels are in direct “competition” with containers. By providing services in the form of reduced virtual machines, unikernels improve on the container model by its increased security. By sharing the host kernel, containerized applications share the same vulnerabilities as the host operating system. Furthermore, containers do not possess the same level of host/guest isolation as hypervisors/virtual machines, potentially making container breaches more damaging than both virtual machines and unikernels.

TechnologyProsCons
Virtual Machines– Allows deploying different operating systems on a single host
– Complete isolation from host
– Orchestration solutions available
– Requires compute power proportional to number of instances
– Requires large infrastructures
– Each instance loads an entire operating system
Linux Containers– Lightweight virtualization
– Fast boot times
– Ochestration solutions
– Dynamic resource allocation
– Reduced isolation between host and guest due to shared kernel
– Less flexible (i.e.: dependent on host kernel)
– Network is less flexible
Unikernels– Lightweight images
– Specialized application
– Complete isolation from host
– Higher security against absent functionalities (e.g.: remote command execution)
– Not mature enough yet for production
– Requires developing applications from the grounds up
– Limited deployment possibilities
– Lack of complete IDE support
– Static resource allocation
– Lack of orchestration tools
A Comparison of solutions

Docker and containerization technology and the container orchestra-tors like Kubernetes, OpenShift are 2 steps forward for the world of DevOps and that the principles it promotes are forward-thinking and largely on-target for the future of a more secure, performance oriented, and easy-to-manage cloud future. However, an alternative approach leveraging unikernels and immutable servers will result in smaller, easier to manage, more secure containers that will be simpler to adopt by existing enterprises. As DevOps matures, the shortcomings of cloud application deployment and management are becoming clear. Virtual machine image bloat, large attack surfaces, legacy executable, base-OS fragmentation, and unclear division of responsibilities between development and IT for cloud deployments are all causing significant friction (and opportunities for the future).

For Example: It remains virtually impossible to create a Ruby or Python web server virtual machine image that DOESN’T include build tools (gcc), ssh, and multiple latent shell executable. All of these components are detrimental for production systems as they increase image size, increase attack surface, and increase maintenance overhead.

Compared to VMs running Operating systems like Windows and Linux, the unikernel has only a tenth of 1% of the attack surface. So in the case of a unikernel — sysdig, tcpdump, and mysql-client are not installed and you can’t just “apt-get install” them either. You have to bring that with your exploit. To take it further even a simple cat /etc/hosts or grep of /var/log/nginx/access.log simply won’t work — once again they are separate processes.
So unikernels are highly resistant to remote code execution attacks, more specifically shell code exploits.

Immutable Servers & Unikernels

Immutable Servers are a deployment model that mandates that no application updates, security patches, or configuration changes happen on production systems. If any of these layers needs to be modified, a new image is constructed, pushed and cycled into production. Heroku is a great example of immutable servers in action: every change to your application requires a ‘git push’ to overwrite the existing version. The advantages of this approach include higher confidence in the code that is running in production, integration of testing into deployment workflows, easy to verify that systems have not been compromised.

Once you become a believer in the concept of immutable servers, then speed of deployment and minimizing vulnerability surface area become objectives. Containers promote the idea of single-service-per-container (microservices), and unikernels take this idea even further.

Unikernels allow you to compile and link your application code all the way down to and include the operating system. For example, if your application doesn’t require persistent disk access, no device drivers or OS facilities for disk access would even be included in final production images. Since unikernels are designed to run on hypervisors such as Xen, they only need interfaces to standardized resources such as networking and persistence. Device drivers for thousands of displays, disks, network cards are completely unnecessary. Production systems become minimalist — only requiring the application code, the runtime environment, and the OS facilities required by the applications. The net effect is smaller VM images with less surface area that can be deployed faster and maintained more easily.

Traditional Operating Systems (Linux, Windows) will become extinct on servers. They will be replaced with single-user, bare metal hypervisors optimized for the specific hardware, taking decades of multi-user, hardware-agnostic code cruft with them. More mature build-deploy-manage tool set based on these technologies will be truly game changing for hosted and enterprise clouds alike.

UnikernelLanguageTargetsFunctions
ClickOSC++XenNetwork Function Virtualization
HalVMHaskellXen
IncludeOSC++KVM, VirtualBox, ESXi, Google Cloud, OpenStackOrchestration tool available
MirageOSOCamlKVM, Xen, RTOS/MCU
Nanos UnikernelC, C++, Go, Java, Node.js, Python, Rust, Ruby, PHP, etcQEMU/KVMOrchestration tool available
OSvJava, C, C++, Node, RubyVirtualBox, ESXi, KVM, Amazon EC2, Google CloudCloud and IoT (ARM)
RumprunC, C++, Erlan, Go, Java, JavaScript, Node.js, Python, Ruby, RustXen, KVM
UnikGo, Node.js, Java, C, C++, Python, OCamlVirtualBox, ESXi, KVM, XEN, Amazon EC2, Google Cloud, OpenStack, PhotonControllerUnikernel compiler toolbox with orchestration possible through Kubernetes and Cloud Foundry
ToroKernelFreePascalVirtualBox, KVM, XEN, HyperVUnikernel dedicated to run microservices
Comparing few Unikernel solutions from active projects

Out of the various existing projects, some standout due to their wide range of supported languages. Out of the active projects, the above table describes the language they support, the hypervisors they can run on and remarks concerning their functionality.

Currently experimenting with the Unikernel in the AWS and Google Cloud Platform and will update you with another post on that soon.

Source: Medium, github, containerjournal, linuxjournal

Helm 3 – Sans tiller – Really?

Helm has recently announced it’s much-awaited version 3. The surprise factor in this release is that the server component added during Helm 2 release is missing. Yeah, you got it right – Tiller – is missing in Helm 3 which means we have a server-less Helm. Let us check out in this post why it was missing and how it matters?

As an introduction, Helm, the package manager for Kubernetes, is a useful tool for: installing, upgrading and managing applications on a Kubernetes cluster. Helm has two parts: a client (helm) and a server (tiller). Tiller runs inside of your Kubernetes cluster as a pod in the kube-system namespace onto current context specified in your .kubeconfig (can be manipulated using –kube-context flag). Tiller manages both, the releases (installations) and revisions (versions) of charts deployed on the cluster. When you run helm commands, your local Helm client sends instructions to tiller in the cluster that in turn make the requested changes. So Helm is our package manager for Kubernetes and our client tool. We use the helm cli to do all of our commands. Tiller is the service that actually communicates with the Kubernetes API to manage our Helm packages.

Helm packages are called charts. Charts are the Helm’s deploy-able artifacts. Charts are always versioned using semantic versioning, and come either packed, in versioned .tgz files, or in a flat directory structure. They are abstractions describing how to install packages onto a Kubernetes cluster. When a chart is deployed, it works as a templating engine to populate multiple yaml files for package dependencies with the required variables, and then runs kubectl apply to apply the configuration to the resource and install the package.

As we already mentioned Tiller is the tool used by Helm to deploy almost any Kubernetes resource. Helm takes the maximum permission to make changes in Kubernetes in order to do this. Because of this, anyone who can talk to the Tiller can deploy or modify any resources on the Kubernetes cluster, just like a system-admin (Think as ‘root’ user in a Linux host). This can cause security issues in the cluster if Helm has not been properly deployed, following certain security measures. Also, authentication is not enabled in Tiller by default, so if any of the pod has been compromised and has permission to talk to the Tiller, then the complete cluster in which tiller is running has been compromised. This stands for the main reason for the removal of Tiller.

The Tiller method:

Tiller was used as an in-cluster operator by the helm to maintain the state of a helm release. It’s also used to save the release information of all the releases done by the tiller — it uses config-map to save the release information in the same namespace in which Tiller is deployed. This release information was required by the helm when it updated or when there were state changes in any of the releases. So whenever a helm update command was used, Tiller used to compare the new manifest with the old manifest of the release and made changes accordingly. Thus Helm was dependent on the Tiller to provide the previous state of the release.

The Tiller less method:

The main need of Tiller was to store release information, for which helm is now using secrets and saving it in the same namespace as the release. Whenever Helm needs the release information it gets it from the namespace of the release. To make a change Helm now just fetches information from the Kubernetes API server, makes the changes on the client-side, and stores a record of the installation in Kubernetes. The benefit of tiller less Helm is that since now Helm make changes in the cluster from client-side, it can only make those changes that the client has been granted permission.

Tiller was a good addition in helm 2, but to run it in production it should be properly secured, which would add additional learning steps for the DevOps and SRE. With Helm 3 learning steps have been reduced and security management is been left in hands of Kubernetes to maintain. Helm can now focus on package-management.

Data courtesy : Medium, codecentric, Helm

Bare-Metal K8s Cluster with Raspberry Pi – Part 3

This is a continuation from the post series Bare-metal K8s cluster with Raspberry Pi – Part 1 & Part 2 here

Another option of running bare-metal K8s cluster in the Raspberry Pi I tried and tested was with Micro K8s which we discuss in this post.

Micro K8s are Lightweight upstream K8s. They are smallest, simplest, pure production K8s. For clusters, laptops, IoT and Edge, on Intel and ARM.

MicroK8s is a CNCF certified upstream Kubernetes deployment that runs entirely on your workstation or edge device. Being a snap it runs all Kubernetes services natively (i.e. no virtual machines) while packing the entire set of libraries and binaries needed. Installation is limited by how fast you can download a couple of hundred megabytes and the removal of MicroK8s leaves nothing behind.

And to give a context on snap, Snaps are app packages for desktop, cloud and IoT that are easy to install, secure, cross‐platform and dependency‐free. Snaps are discover able and install able from the Snap Store, the app store for Linux with an audience of millions.

A snap is a bundle of an app and its dependencies that works without modification across Linux distributions.

We are going to use the same components list as described in the Part 1 of this series.

Each Pi is going to need an Ubuntu server image and you’ll need to be able to SSH into them. Please follow this link here will help us to reach to this stage

Kubernetes Cluster Preparation with SSH connection to the Pi from your terminal

Installing MicroK8s
Follow this section for each of your Pis. Once completed you will have MicroK8s installed and running everywhere.

SSH to your first Pi and install the MicroK8s snap:

sudo snap install microk8s --classic

As MicroK8s is a snap and as such it will be automatically updated to newer releases of the package, which is following closely upstream Kubernetes releases, so we don’t need to worry about the K8s version we’re installing

sudo snap install microk8s --classic --channel=1.15/stable
Channels are made up of a track (or series) and an expected level of stability, based on MicroK8s releases (Stable, Candidate, Beta, Edge). For more information about which releases are available, run:

snap info microk8s

Cheat Sheet for MicroK8s
Before going further here is a quick intro to the MicroK8s command line:

  • microk8s.start – start all enabled Kubernetes services
  • microk8s.inspect – status of services
  • microk8s.stop – stop all Kubernetes services
  • microk8s.enable dns – enable Kubernetes add-ons,“kubedns”
  • microk8s.kubectl cluster-info – status of the cluster:

MicroK8s is easy to use and comes with plenty of Kubernetes add-ons you can enable or disable.

Master node and leaf nodes
Now that you have MicroK8s installed on all boards, pick one which has to be the master node of your cluster.

On the chosen Master node, run the following command:

sudo microk8s.add-node
This command will generate a connection string in the form of :/.

Adding a node
Now, you need to run the join command from another Pi you want to add to the cluster:

microk8s.join 10.55.60.14:25000/JHpbBYMIevZSAMnmjMHmFwanrOYCWZLu
You should be able to see the new node in a few seconds on the master with the following command:

microk8s.kubectl get node

For each new node, you need to run the microk8s.add-node command on the master, copy the output, then run microk8s.join on the leaf.

Removing nodes
To remove a node, run the following command on the master:

sudo microk8s remove-node
The name of nodes are available on the master by running the microk8s.kubectl get node

Alternatively, you can leave the cluster from a leaf node by running:

sudo microk8s.leave

Once Pis are setup with MicroK8s, adding and removing nodes is easy and you can scale up or down as you go.

Voila..!!
If you follow this series, you are now in control of your Kubernetes cluster. One with native kubernetes and docker and the other one with more easier to install and manage MicroK8s

This completes the 3 part series for K8s in Raspberry Pi. In a new follow-up blog post we can see how we can use kubectl & helm charts to deploy a Nginx service, Prometheus and few other services from the DevOps tools set to the cluster.

Bare-Metal K8s Cluster with Raspberry Pi – Part 2

This is a continuation from the post series Bare-metal K8s cluster with Raspberry Pi

As we have 1 master node and 3 nodes setup we continue to install Kubernetes.

Install Kubernetes

We are using version 1.15.3. There shouldn’t be any errors, however during my installation the repos were down and I had to retry in a few times.

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | \
sudo apt-key add - && echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | \
sudo tee /etc/apt/sources.list.d/kubernetes.list && sudo apt-get update -q

sudo apt-get install -qy kubelet=1.15.3-00 kubectl=1.15.3-00 kubeadm=1.15.3-00

Repeat steps for all of the Raspberry Pis.

Kubernetes Master Node Configuration
Note: You only need to do this for the master node (in this deployment I recommend only 1 master node). Each Raspberry Pi is a node.

Initiate Master Node

sudo kubeadm init
Enable Connections to Port 8080
Without this Kubernetes services won’t work

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Add Container Network Interface (CNI)
I’ve chosen to use Weaver, however you can get others working such as Flannel (I’ve verified this works with this cluster)

Get Join Command
This will be used in the next section to join the worker nodes to the cluster. It will return something like:

kubeadm join 192.168.0.101:6443 --token X.Y --discovery-token-ca-cert-hash sha256:XYZ
kubeadm token create --print-join-command

Kubernetes Worker Node Configuration
Note: You only need to do this for the worker nodes (in this deployment I recommend 3 worker node).

Join Cluster
Use the join command provided at the end of the previous section
sudo kubeadm join 192.168.0.101:6443 --token X.Y --discovery-token-ca-cert-hash sha256:XYZ

Verify Node Added Successfully (SSH on Master Node)
Should have status ready after ~30 seconds
kubectl get componentstatuses

Another option of running bare-metal K8s cluster in the Raspberry Pi I tried and tested was with Micro K8s will posted in the Part 3 of this series.

A sneak peak into the K8s cluster in Raspberry pi

Bare-Metal K8s Cluster with Raspberry Pi – Part 1

There are multiple ways we can use a Kubernetes cluster to deploy our applications. Most of us opt to use any Kubernetes service from a public cloud provider. GKE, EKS, AKS are the most prominent ones. Deploying a Kubernetes cluster on a public cloud provider is relatively easy, but what if you want a private bare-metal K8s cluster. Being worked extensively in the data center and started my career as a Sys Admin, I personally prefer a piece of tangible hardware to get the feel of building it. This blog post walk you through the steps I took in order to have a bare-metal K8s cluster to play with.

K8s is an open source container orchestration platform that helps manage distributed, containerized applications at a massive scale. Born at Google as Borg, version 1.0 was released in July 2015. It has continued to evolve and mature and is now offered as a PaaS service by all of the major cloud vendors.

Google has been running containerized workloads in production for more than a decade. Whether it’s service jobs like web front-ends and stateful servers, infrastructure systems like Bigtable and Spanner, or batch frameworks like MapReduce and Millwheel, virtually everything at Google runs as a container.

You can find the paper here.

Kubernetes traces its lineage directly from Borg. Many of the developers at Google working on Kubernetes were formerly developers on the Borg project. We’ve incorporated the best ideas from Borg in Kubernetes, and have tried to address some pain points that users identified with Borg over the years.

More than just enabling a containerized application to scale, Kubernetes has release-management features that enable updates with near-zero downtime, version rollback, and clusters that can ‘self-heal’ when there is a problem. Load balancing, auto-scaling and SSL can easily be implemented. Helm, a plugin for Kubernetes, has revolutionized the world of server management by making multi-node data stores like Redis and MongoDB incredibly easy to deploy. Kubernetes enables you to have the flexibility to move your workload where it is best suited. This compliments the hybrid cloud story and in my career it has become more apparent that my customers see this as well to help them resolve issues like; cost, availability and compliance. In parallel software vendors are starting to embrace containers as a standard deployment model leading to a recent increase in requests for container solutions.

As you can see in the workflow comparison below, there is greater room for error when deploying on-premises. Public clouds provide the automation and reduces the risk of error as less steps are required. But as mentioned above, private cloud provides you more options when you have unique requirements.

Pros:

  • Using Kubernetes and its huge ecosystem can improve your productivity
  • Kubernetes and a cloud-native tech stack attracts talent
  • Kubernetes is a future proof solution
  • Kubernetes helps to make your application run more stable
  • Kubernetes can be cheaper than its alternatives

Cons:

  • Kubernetes can be an overkill for simple applications
  • Kubernetes is very complex and can reduce productivity
  • The transition to Kubernetes can be cumbersome
  • Kubernetes can be more expensive than its alternatives

Pre-requisites:

Compute:

3 x Raspberry Pi 4 Model B with 2 GB RAM
1 x Raspberry Pi 3 Model B+ with 1 GB RAM

Storage:

4 x 16GB High Speed Sand-disk Micro-SD Cards

Network:

1 x Network Switch – for local LAN for k8s internal connectivity
1 x Network Router – for Wifi (My default ISP router was used here) only master node had internet connectivity once completed the setup
4 x Ethernet Cables
1 x Keyboard, HDMI, Mouse (for initial setup only)

Initial Raspberry Pi Configuration:

Flash Raspbian to the Micro-SD Cards

Download image from the below link,

Raspbian OS

I have used BalenaEtcher to flash image onto micro-SD card

Perform Initial Setup on Boot on startup screen, we need to connect keyboard, monitor and mouse for this setup.

Choose Country, Language, Timezone
Define new password for user ‘pi’
Connect to WiFi or skip if using ethernet
Skip update software (We will perform this activity manually later).
Choose restart later

Configure Additional Settings Click the Raspberry Pi icon (top left of screen) > Preferences > Raspberry Pi Configuration

System

Configure Hostname
Boot: To CLI

Interfaces
SSH: Enable

Choose restart later

Configure Static Network Perform one of the following:
Define Static IP on Raspberry Pi: Right Click the arrow logo top right of screen and select ‘Wireless & Wired Network Settings’

Define Static IP on DHCP Server: Configure your DHCP server to define a static IP on the Raspberry Pi Mac Address.

Reboot and Test SSH
Username: pi

Password: Defined in step 2 above

From the terminal ssh pi@[IP Address]

Repeat steps for all of the Raspberry Pis.

Kubernetes Cluster Preparation with SSH connection to the Pi from your terminal. I am using a 12 years old Lenovo laptop running MX Linux. Open a terminal and establish ssh connection to the Pi

Perform Updates

apt-get update: Updates the package indexes
apt-get upgrade: Performs the upgrades

Configure Net.IP4.IP configuration Edit sudo vi /etc/sysctl.conf, uncomment net.ipv4.ip_forward = 1 and add net.ipv4.ip_nonlocal_bind=1.
Note: This is required to allow for traffic forwarding, for example Node Ports from containers to/from non-cluster devices.

Install Docker

curl -sSL get.docker.com | sh

Grant privilege for user pi to execute docker commands

sudo usermod pi -aG docker

Disable Swap

sudo systemctl disable dphys-swapfile.service
sudo reboot

We can verify this with the top command, on the top left corner next to MiB Swap should be 0.0.

As we completed the initial steps to create our K8s bare-metal cluster, we can see how we build the cluster in Part 2 of this blog post