Identify the most important steps to secure your Kubernetes clusters.
This action plan is your guide to the best practices for deploying secure, scalable, and highly available Kubernetes infrastructure. Accelerate your team's development capabilities by building in security guardrails and best practices into your Kubernetes clusters.
Your customized action plan is prioritized with the most important and most impactful changes to make first. It is provider specific with detailed steps to improve your Kubernetes cluster whether you are running on EKS, AKS, or GKE.
0 items
Cloud Security
0 items
Cluster Security
0 items
Workload Security
0 items
Image Security
You can view your action plan right here or download it as a printable PDF.
Cloud Infrastructure Security
For each cluster, determine whether it shares a network and corresponding network access controls with unrelated resources. Consider iterating on the architecture to separate clusters from each other instead of sharing subnets and firewall rules.
Cloud Infrastructure Security
Review all accounts to ensure 2FA/2SV is enabled in your identity management solution, and after distributing software or hardware (preferred) tokens to all users, enable enforcement across all accounts.
Cloud Infrastructure Security
Review the IAM Role bindings in your cloud account to see if permissions are assigned to groups or directly to users, and review the permissions to see how many users are granted control of the underlying resources used by the cluster or full control over the cluster via the API server.
Cloud Infrastructure Security
Ensure that cloud provider audit logs are fully enabled, all managed cloud services have the native logging features enabled, all virtual machines have a logging agent that sends operating system and application logs centrally, and leverage either cloud-native or third-party SIEM tools to aggregate, filter, prioritize, and handle issues as they occur.
Kubernetes Cluster Security
Ensure that worker nodes and control plane instances are not assigned public IP addresses, if possible. Validate that the network access control or firewall rules does not permit just any system (0.0.0.0/0) to attempt a connection to the API server or SSH daemon on the nodes.
Kubernetes Cluster Security
Review the configuration of the production cluster control planes to ensure it spans three availability zones. Ensure that worker nodes span two or more availability zones. This will reduce the chance of a single zone failure from bringing down the applications running inside the cluster.
Kubernetes Cluster Security
Review the configuration of the control plane logging settings and make sure every available audit log is enabled and shipped to a central location. Validate these logs are capturing both successful and unsuccessful actions to ensure the complete history is present in case of a security incident.
Kubernetes Cluster Security
Review cluster configurations to ensure RBAC is enabled, support for dynamic admission controller web hooks are enabled, a container network interface (CNI) that supports networkpolicy objects is installed on worker nodes, and “pod identity” or “workload identity” features are configured for native IAM integration of cloud service accounts to Kubernetes service accounts. In many cases, clusters may need to be rebuilt in order to take advantage of these features, so it’s best to define them as part of the organization’s standard build.
Kubernetes Cluster Security
Review the instance permissions assigned to the worker nodes and ensure they only provide the minimum permissions to function, and consider implementing a solution to block metadata API access from workloads.
Workload Security
Ensure that all worker nodes are configured to ship both host and container logs and metrics to a central aggregation solution such that it’s possible to see artifacts created by all malicious activity within the cluster, the nodes, and the cloud APIs leveraged.
Workload Security
For each cluster, review the namespaces to ensure that workloads are running in dedicated namespaces and grouped/named in a way that makes the intent and purpose clear. Pods exposed externally via ingress/load balancers should be in separated namespaces from the database or caching applications, for example. Workloads owned and managed by different teams should be separated to avoid overgranting privileges and accidental disruption.
Workload Security
Review IAM access provided by the cloud provider to ensure permissions properly separate cluster administrators from developers in terms of cluster-wide capabilities and access to sensitive namespaces like kube-system. Ensure service accounts used by workloads are bound to custom Roles and ClusterRoles that restrict permissions to only those resources needed instead of the common built-in Roles and ClusterRoles. Focus specifically on restricting the ability to get/list secrets at the cluster scope and the ability to modify resources that control in-cluster security policies.
Workload Security
Ensure that either PodSecurityPolicy or a dynamic admission controller such as Gatekeeper or K-rail are installed and configured to prevent workloads that do not need elevated access to the host from leveraging those specifications. Look specifically for policies that allow pods to run as root, run as privileged, and mount the host filesystem inside the pod.
Workload Security
Review each namespace, its workloads, and the current network policies defined. Do they exist? If so, do they adequately support just the necessary traffic flows? Applying network policies should be prioritized to workloads that are exposed externally, those that handle external inputs, and to workloads that store or process sensitive information.
Workload Security
For an OSS behavior monitoring and alerting solution, consider deploying Falco. For monitoring and prevention, consider evaluating solutions from Aqua Security, Palo Alto (formerly Twistlock), Sysdig Secure, and others.
Container Image Security
Consider standardizing on a minimal operating system image and maintaining a small number of base image variants that support the main categories of needs in the organization (e.g. Python, Go, NodeJS, etc) and enforce their usage as the base image for all application containers. Implement automation to handle vulnerability scanning of images in the container registry as well as in use, and use build triggers to automatically initiate updates of image builds.
Container Image Security
Review the container images referenced by all Pod specifications inside the cluster, and look for container images referencing external sources and DockerHub locations that are not operated by trusted organizations. Consider rebuilding those images from the source code, hosting that image in a private container registry owned by your organization, and update the deployment to reference that private container image registry path.
Container Image Security
While there is no easy and direct way to know if an existing application built into a container image exposes all of its configuration settings and secrets, reviewing the Pod specification for all deployed workloads for the consistent use of ConfigMaps and Secrets mounted as volumes or ENV variables into the pod are a quick litmus test. Another potential indicator is if container images are named or tagged with an environment designation such as “dev”, “staging”, or “prod” as that might hint at certain variables or settings being “baked in” to the container.
Container Image Security
Review application code repositories and image build pipelines and look for opportunities to bake in automated security checks to serve as guiderails and to catch simple issues early.