Skip to main content

Command Palette

Search for a command to run...

A Beginner's Guide to Kubernetes: Exploring the Building Blocks

Updated
11 min read
A Beginner's Guide to Kubernetes: Exploring the Building Blocks
A

I am a software engineer who helps startups in implementing their ideas.

This article serves as a introduction to Kubernetes (K8s), a powerful open-source Kubernetes, often abbreviated as K8s, is an open-source platform designed to automate the deployment, scaling, and operation of application containers. Originally developed by Google and now maintained by the Cloud Native Computing Foundation (CNCF), Kubernetes has become the de facto standard for container orchestration. It allows developers to manage containerized applications in various environments, providing a highly resilient, scalable system for modern application deployment.

Why Do We Need Kubernetes?

With the rise of micro services managing applications at scale has become increasingly complex. Kubernetes addresses this complexity by providing:

  • Automation: Simplifies the deployment, scaling, and operations of application containers.

  • Scalability: Easily scale applications up or down based on demand.

  • Resilience: Automatically handles failures, ensuring high availability.

  • Portability: Runs on various environments including on-premises, cloud, and hybrid setups.

What are the fundamentals components of Kubernetes?

Pods:

Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.

Pods are fundamental to Kubernetes, providing a higher level of abstraction over containers, enabling them to be managed more efficiently in a clustered environment. Pods are ephemeral; they can be created, destroyed, and replaced dynamically as needed by the application. Due to the ephemeral nature of pods, they cannot be accessed via single IP address which makes one ponder how should we communicate with them ?

Service:

A Kubernetes Service is an abstraction that defines a logical set of Pods and a policy by which to access them. Services enable communication between different components of an application without requiring clients to track the dynamic changes in Pod IP addresses. This is crucial for maintaining a stable interface for applications to communicate within and outside the cluster.

Key Characteristics of a Kubernetes Service:

  1. Permanent IP Address:

    • A Service provides a stable IP address that remains constant regardless of changes in the underlying Pods.

    • This IP address is often referred to as the "ClusterIP."

  2. Decoupled Lifecycle:

    • The lifecycle of a Service is independent of the Pods it routes traffic to.

    • If a Pod dies and is replaced by a new one, the Service’s IP remains unchanged, ensuring consistent access.

Types of Kubernetes Services:

ClusterIP (Internal Service):

  • Definition: The default type of Service, providing an internal IP address accessible only within the cluster.

  • Use Case: Ideal for internal communication between different microservices, such as a backend service communicating with a database.

NodePort (External Service):

  • Definition: Exposes the Service on each Node’s IP at a static port (the NodePort). This allows external traffic to access the Service.

  • Use Case: Useful for exposing applications to the outside world for direct access.

Ingress:

Kubernetes Ingress is a powerful API object that manages external access to services within a cluster, typically using HTTP and HTTPS. Ingress provides a way to define rules for routing traffic to the appropriate services based on the request's host and path. It helps in presenting a more user-friendly URL structure and handling SSL termination.

Key Characteristics of K8s Ingress

  1. User-Friendly URLs:

    • Example: Instead of accessing your application via an IP address and port like http://124.91.105.3:8080, you can use a more practical and user-friendly URL like https://application.com.

    • Functionality: Ingress maps these friendly URLs to the appropriate backend services.

  2. Request Routing:

    • Process: The request first comes to the Ingress, and the Ingress controller forwards it to the relevant service based on defined rules.

    • Example: Request -> Ingress -> Service

  3. SSL/TLS Termination:

    • Handles SSL termination, allowing you to use HTTPS without needing each service to manage its own certificates.

Config Maps and Secrets:

What is Config Map:

A ConfigMap is a Kubernetes object used to store non-sensitive configuration data in key-value pairs. It allows you to decouple configuration artifacts from container images, making applications more portable and easier to manage. This way, you can update the configuration without needing to rebuild and redeploy your container images.

Key Characteristics of ConfigMap:

  1. External Configuration:

    • Stores configuration data such as database connection strings, feature flags, or external service URLs.

    • The configuration can be updated without altering the container image.

What are Secrets: -

A Secret is similar to a ConfigMap but is specifically designed to store sensitive information such as passwords, OAuth tokens, and SSH keys. Secrets ensure that sensitive data is handled more securely and is not exposed directly in the Pod definition or source code.

Key Characteristics of Secret:

  1. Sensitive Data Handling:

    • Secrets are encoded (base64) and stored securely.

    • Access to Secrets can be tightly controlled using Kubernetes RBAC (Role-Based Access Control).

Example: Storing DB credentials

Kubernetes Volumes:

A Kubernetes Volume is a directory that is accessible to containers in a Pod, used to store data persistently across the Pod's lifecycle. Unlike the ephemeral storage provided by containers, which is lost when the container is terminated, Volumes ensure that data remains available even if a Pod dies and is recreated.

Key Characteristics of Kubernetes Volumes:

  1. Persistence:

    • Data stored in a Volume is preserved across Pod restarts.

    • Ensures that critical application data, such as database files, are not lost when containers are restarted or moved.

  2. Local vs. Remote Storage:

    • Local Storage: The storage is present inside the K8s cluster (e.g., a hard drive in the K8s cluster).

    • Remote Storage: The storage is provided by a remote service (e.g., AWS EBS, NFS). This can provide greater resilience and scalability as the storage is independent of the node’s lifecycle.

  3. Stateful Applications:

    • Volumes are essential for stateful applications, such as databases, which need to retain data even when Pods are restarted or rescheduled.

    • Kubernetes itself does not manage database activities like replication or backups. These need to be handled by the database software or external tools.

Example Scenario:

Consider a database Pod that requires persistent storage. Without a Volume, if the Pod dies, all data stored in the container’s filesystem would be lost. By attaching a Volume, the data is stored persistently, ensuring it is retained across Pod restarts.

Deployment and Stateful Sets:

What is a Deployment:

A Deployment in Kubernetes is a higher-level abstraction that manages the desired state of a set of Pods. It provides mechanisms to deploy, update, and scale applications without manual intervention, ensuring high availability and fault tolerance.

Key Characteristics of Deployments:

  1. Replica Management: Specify the desired number of Pod replicas. Kubernetes ensures the specified number is running at all times.

  2. Rolling Updates: Allows updating the application without downtime by gradually replacing old Pods with new ones. This ensures zero downtime for end-users during updates.

  3. Rollback: If a new deployment causes issues, you can easily roll back to a previous version.

  4. Self-Healing: If a Pod dies, the Deployment automatically creates a new Pod to maintain the desired number of replicas.

What is Stateful Sets:

StatefulSet is a Kubernetes object used to manage stateful applications. Unlike Deployments, StatefulSets provide guarantees about the ordering and uniqueness of Pods, making them ideal for applications that require stable, unique network identifiers or stable storage.

Key Characteristics of StatefulSets:

  1. Stable, Unique Pod Identities: Each Pod gets a unique, stable network identity (hostname). Pods are named in a predictable, consistent manner.

  2. Ordered, Graceful Deployment and Scaling: Pods are created, deleted, and scaled in a specific, defined order, ensuring that dependencies are respected.

  3. Persistent Storage: Each Pod in a StatefulSet can have its own persistent storage, defined via PersistentVolumeClaims. This ensures data is preserved across Pod restarts.

  4. Pod Management Policy: Pods can be managed in either OrderedReady (Pods are started sequentially) or Parallel (Pods are started simultaneously) fashion.

In a nutshell, Deployments manage stateless application Pods, ensuring high availability by distributing them across nodes and using Services for load balancing. Whereas, StatefulSets manage stateful Pods, ensuring each Pod has a stable network identity and persistent storage. They are often used for databases that require consistent and stable storage.

Since we've explored most of the K8s components, let's dive deep into K8s architecture

K8s Architecture:

In Kubernetes, the architecture is divided into two main components: the Control Plane and the Worker Nodes. The Worker Nodes are the backbone of Kubernetes, responsible for running the actual applications in the form of Pods. Each Worker Node is a machine that performs the necessary operations to run containers.

Worker Nodes

Each worker nodes should have 3 processes:

  1. Container Runtime

  2. Kubelet

  3. Kube Proxy

Container Runtime:

The Container Runtime is responsible for running the containers on each Worker Node. It is the software component that executes containerized applications and manages their lifecycle.

Key Functions:

  • Container Execution: Starts and stops containers based on the instructions from the Kubelet.

  • Image Management: Pulls container images from container registries and caches them locally.

  • Resource Isolation: Ensures that containers have the required resources (CPU, memory, etc.) and isolates them from each other using namespaces and control groups (cgroups).

Examples of Container Runtimes:

  • Docker: The most commonly used container runtime, known for its wide adoption and rich feature set.

  • containerd: A lightweight runtime that provides the core container functionality.

  • CRI-O: An Open Container Initiative (OCI) compatible runtime optimized for Kubernetes.

2. Kubelet

Kubelet is an agent that runs on each Worker Node and ensures that containers are running in a Pod as expected. It acts as the bridge between the Kubernetes Control Plane and the Worker Node.

Key Functions:

  • Pod Management: Receives Pod specifications from the Control Plane and ensures that the specified containers are running and healthy.

  • Node Status: Continuously reports the status of the node and its workloads back to the Control Plane.

  • Health Monitoring: Monitors the health of the containers and takes corrective actions, such as restarting containers if they fail.

Responsibilities:

  • Node Registration: Registers the node with the Kubernetes API server.

  • Pod Lifecycle Management: Creates, updates, and destroys Pods as per the instructions from the Control Plane.

  • Resource Monitoring: Tracks the resource usage of Pods and containers on the node.

3. Kube Proxy

Kube Proxy is a network proxy that runs on each Worker Node and maintains network rules for communication within the Kubernetes cluster.

Key Functions:

  • Network Routing: Forwards requests to the appropriate Pods across nodes in the cluster.

  • Service Discovery: Enables Pods to find and communicate with each other using Kubernetes Services.

  • Load Balancing: Distributes traffic among the Pods in a Service to ensure even workload distribution.

Responsibilities:

  • IP Tables Management: Manages IP tables rules to ensure traffic is properly routed to the correct Pods.

  • Service VIP Management: Handles the virtual IP addresses assigned to Services, making it possible for clients to access them without knowing the specifics of Pod IP addresses.

Ok, so now we've understood the worker nodes in K8s architecture. But we still have some unanswered questions, let analyze them one by one

  1. How would one interact with this cluster ?

  2. On which pod should the new pod be scheduled ?

  3. If a replica pod dies, who would monitor it and then reschedules it ?

So, all these aforementioned tasks are managed by master nodes (control plane). Let's analyze control plan in depth.

Master Nodes:

Worker nodes handle the grunt work of running containerized applications within Kubernetes. However, the brains of the operation reside in the master nodes, which collectively form the Kubernetes control plane. The control plane is responsible for managing the entire cluster, ensuring worker nodes are utilized efficiently and applications run smoothly.

There are four key processes that work together in the control plane:

  1. API Server: This is the front-end for the control plane. It acts as the single point of entry, accepting requests from tools like kubectl (the Kubernetes command-line tool) or programmatic interactions from applications. The API server validates these requests against Kubernetes API definitions and then interacts with other control plane components to fulfill them.

  2. Scheduler : As the name suggests, the scheduler is responsible for placing new or rescheduled pods onto worker nodes. The API server sends pod information to the scheduler, which considers factors like resource availability, node health, and existing deployments to determine the optimal placement for each pod.

  3. Controller Manager: This is the workhorse of the control plane, running multiple controllers in the background. Each controller is responsible for maintaining the desired state of a specific Kubernetes resource (e.g., pods, deployments, services). The controller manager constantly monitors the cluster state through the API server and takes corrective actions if any resource deviates from its desired state. For instance, if a pod crashes unexpectedly, the replication controller will launch tell scheduler to schedule a new replica to maintain the desired number of pods running.

  4. etcd: Unlike the other three components, etcd is a separate process that acts as the distributed key-value store for Kubernetes. It stores all the cluster state information, including pod definitions, node statuses, and configuration data. The API server, scheduler, and controller manager all rely on etcd to access and update this critical information.

Another important thing is to consider what if the master node (control plane) crashes. In order to make sure our cluster runs as intended, we generally have 2 replica counts for control plane too.

Thanks alot for reading by understanding these fundamental components and architecture, you'll gain a solid foundation for exploring Kubernetes and its capabilities in managing containerized applications.