External Data Sources

Use data from ConfigMaps, the Kubernetes API server, and image registries in Kyverno policies.

The Variables section discusses how variables can help create smarter and reusable policy definitions and introduced the concept of a rule context that stores all variables.

This section provides details on using ConfigMaps, API calls, and image registries to reference external data as variables in policies.

Variables from ConfigMaps

A ConfigMap resource in Kubernetes is commonly used as a source of configuration details which can be consumed by applications. This data can be written in multiple formats, stored in a Namespace, and accessed easily. Kyverno supports using a ConfigMap as a data source for variables. When a policy referencing a ConfigMap resource is evaluated, the ConfigMap data is checked at that time ensuring that references to the ConfigMap are always dynamic. Should the ConfigMap be updated, subsequent policy lookups will pick up the latest data at that point.

In order to consume data from a ConfigMap in a rule, a context is required. For each rule you wish to consume data from a ConfigMap, you must define a context. The context data can then be referenced in the policy rule using JMESPath notation.

Looking up ConfigMap values

A ConfigMap that is defined in a rule’s context can be referred to using its unique name within the context. ConfigMap values can be referenced using a JMESPath style expression.

1{{ <context-name>.data.<key-name> }}

Consider a simple ConfigMap definition like so.

1apiVersion: v1
2kind: ConfigMap
3metadata:
4  name: some-config-map
5  namespace: some-namespace
6data:
7  env: production

To refer to values from a ConfigMap inside a rule, define a context inside the rule with one or more ConfigMap declarations. Using the sample ConfigMap snippet referenced above, the below rule defines a context which references this specific ConfigMap by name.

 1rules:
 2  - name: example-lookup
 3    # Define a context for the rule
 4    context:
 5    # A unique name for the context variable under which the below contents will later be accessible
 6    - name: dictionary
 7      configMap:
 8        # Name of the ConfigMap which will be looked up
 9        name: some-config-map
10        # Namespace in which this ConfigMap is stored
11        namespace: some-namespace 

Based on the example above, we can now refer to a ConfigMap value using {{dictionary.data.env}}. The variable will be substituted with the value production during policy execution.

Put into context of a full ClusterPolicy, referencing a ConfigMap as a variable looks like the following.

 1apiVersion: kyverno.io/v1
 2kind: ClusterPolicy
 3metadata:
 4  name: cm-variable-example
 5  annotations:
 6    pod-policies.kyverno.io/autogen-controllers: DaemonSet,Deployment,StatefulSet
 7spec:
 8    rules:
 9    - name: example-configmap-lookup
10      context:
11      - name: dictionary
12        configMap:
13          name: some-config-map
14          namespace: some-namespace
15      match:
16        any:
17        - resources:
18            kinds:
19            - Pod
20      mutate:
21        patchStrategicMerge:
22          metadata:
23            labels:
24              my-environment-name: "{{dictionary.data.env}}"

In the above ClusterPolicy, a mutate rule matches all incoming Pod resources and adds a label to them with the name of my-environment-name. Because we have defined a context which points to our earlier ConfigMap named mycmap, we can reference the value with the expression {{dictionary.data.env}}. A new Pod will then receive the label my-environment-name=production.

Kyverno also has the ability to cache ConfigMaps commonly used by policies to reduce the number of API calls made. This both decreases the load on the API server and increases the speed of policy evaluation. Assign the label cache.kyverno.io/enabled: "true" to any ConfigMap and Kyverno will automatically cache it. Policy decisions will fetch the data from cache rather than querying the API server.

Handling ConfigMap Array Values

In addition to simple string values, Kyverno has the ability to consume array values from a ConfigMap stored as either JSON- or YAML-formatted values. Depending on how you choose to store an array, the policy which consumes the values in a variable context will need to be written accordingly.

For example, let’s say you wanted to define a list of allowed roles in a ConfigMap. A Kyverno policy can refer to this list to deny a request where the role, defined as an annotation, does not match one of the values in the list.

Consider a ConfigMap with the following content written as a JSON array. You may also store array values in a YAML block scalar (in which case the parse_yaml() filter will be necessary in a policy definition).

1apiVersion: v1
2kind: ConfigMap
3metadata:
4  name: roles-dictionary
5  namespace: default
6data:
7  allowed-roles: '["cluster-admin", "cluster-operator", "tenant-admin"]'

Now that the array data is saved in the allowed-roles key, here is a sample ClusterPolicy containing a single rule that blocks a Deployment if the value of the annotation named role is not in the allowed list. Notice how the parse_json() JMESPath filter is used to interpret the value of the ConfigMap’s allowed-roles key into an array of strings.

 1apiVersion: kyverno.io/v1
 2kind: ClusterPolicy
 3metadata:
 4  name: cm-array-example
 5spec:
 6  validationFailureAction: Enforce
 7  background: false
 8  rules:
 9  - name: validate-role-annotation
10    context:
11      - name: roles-dictionary
12        configMap:
13          name: roles-dictionary
14          namespace: default
15    match:
16      any:
17      - resources:
18          kinds:
19          - Deployment
20    validate:
21      message: "The role {{ request.object.metadata.annotations.role }} is not in the allowed list of roles: {{ \"roles-dictionary\".data.\"allowed-roles\" }}."
22      deny:
23        conditions:
24          any:
25          - key: "{{ request.object.metadata.annotations.role }}"
26            operator: AnyNotIn
27            value:  "{{ \"roles-dictionary\".data.\"allowed-roles\" | parse_json(@) }}"

This rule denies the request for a new Deployment if the annotation role is not found in the array we defined in the earlier ConfigMap named roles-dictionary.

Once creating this sample ClusterPolicy, attempt to create a new Deployment where the annotation role=super-user and test the result.

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: busybox
 5  annotations:
 6    role: super-user
 7  labels:
 8    app: busybox
 9spec:
10  replicas: 1
11  selector:
12    matchLabels:
13      app: busybox
14  template:
15    metadata:
16      labels:
17        app: busybox
18    spec:
19      containers:
20      - image: busybox:1.28
21        name: busybox
22        command: ["sleep", "9999"]

Submit the manifest and see how Kyverno reacts.

1kubectl create -f deploy.yaml
1Error from server: error when creating "deploy.yaml": admission webhook "validate.kyverno.svc" denied the request:
2
3resource Deployment/default/busybox was blocked due to the following policies
4
5cm-array-example:
6  validate-role-annotation: 'The role super-user is not in the allowed list of roles: ["cluster-admin", "cluster-operator", "tenant-admin"].'

Changing the role annotation to one of the values present in the ConfigMap, for example tenant-admin, allows the Deployment resource to be created.

Variables from Kubernetes API Server Calls

Kubernetes is powered by a declarative API that allows querying and manipulating resources. Kyverno policies can use the Kubernetes API to fetch a resource, or even collections of resource types, for use in a policy. Additionally, Kyverno allows applying JMESPath (JSON Match Expression) to the resource data to extract and transform values into a format that is easy to use within a policy.

A Kyverno Kubernetes API call works just as with kubectl and other API clients, and can be tested using existing tools.

For example, here is a command line that uses kubectl to fetch the list of Pods in a Namespace and then pipes the output to kyverno jp which counts the number of Pods:

1kubectl get --raw /api/v1/namespaces/kyverno/pods | kyverno jp "items | length(@)"

The corresponding API call in Kyverno is defined as below. It uses a variable {{request.namespace}} to use the Namespace of the object being operated on, and then applies the same JMESPath to store the count of Pods in the Namespace in the context as the variable podCount. Variables may be used in both fields. This new resulting variable podCount can then be used in the policy rule.

1rules:
2- name: example-api-call
3  context:
4  - name: podCount
5    apiCall:
6      urlPath: "/api/v1/namespaces/{{request.namespace}}/pods"
7      jmesPath: "items | length(@)"   

URL Paths

The Kubernetes API organizes resources under groups and versions. For example, the resource type Deployment is available in the API Group apps with a version v1.

The HTTP URL paths of the API calls are based on the group, version, and resource type as follows:

  • /apis/{GROUP}/{VERSION}/{RESOURCETYPE}: get a collection of resources
  • /apis/{GROUP}/{VERSION}/{RESOURCETYPE}/{NAME}: get a resource

For Namespaced resources, to get a specific resource by name or to get all resources in a Namespace, the Namespace name must also be provided as follows:

  • /apis/{GROUP}/{VERSION}/namespaces/{NAMESPACE}/{RESOURCETYPE}: get a collection of resources in the namespace
  • /apis/{GROUP}/{VERSION}/namespaces/{NAMESPACE}/{RESOURCETYPE}/{NAME}: get a resource in a namespace

For historic resources, the Kubernetes Core API is available under /api/v1. For example, to query all Namespace resources the path /api/v1/namespaces is used.

The Kubernetes API groups are defined in the API reference documentation for v1.22 and can also be retrieved via the kubectl api-resources command shown below:

 1$ kubectl api-resources
 2NAME                              SHORTNAMES   APIGROUP                       NAMESPACED   KIND
 3bindings                                                                      true         Binding
 4componentstatuses                 cs                                          false        ComponentStatus
 5configmaps                        cm                                          true         ConfigMap
 6endpoints                         ep                                          true         Endpoints
 7events                            ev                                          true         Event
 8limitranges                       limits                                      true         LimitRange
 9namespaces                        ns                                          false        Namespace
10nodes                             no                                          false        Node
11persistentvolumeclaims            pvc                                         true         PersistentVolumeClaim
12
13...

The kubectl api-versions command prints out the available versions for each API group. Here is a sample:

 1$ kubectl api-versions
 2admissionregistration.k8s.io/v1
 3admissionregistration.k8s.io/v1beta1
 4apiextensions.k8s.io/v1
 5apiextensions.k8s.io/v1beta1
 6apiregistration.k8s.io/v1
 7apiregistration.k8s.io/v1beta1
 8apps/v1
 9authentication.k8s.io/v1
10authentication.k8s.io/v1beta1
11authorization.k8s.io/v1
12authorization.k8s.io/v1beta1
13autoscaling/v1
14autoscaling/v2beta1
15autoscaling/v2beta2
16batch/v1
17...

You can use these commands together to find the URL path for resources, as shown below:

Kyverno can also fetch data from other API locations such as /version and aggregated APIs.

For example, fetching from /version might return something similar to what is shown below.

 1$ kubectl get --raw /version
 2{
 3  "major": "1",
 4  "minor": "23",
 5  "gitVersion": "v1.23.8+k3s1",
 6  "gitCommit": "53f2d4e7d80c09a7db1858e3f4e7ddfa13256c45",
 7  "gitTreeState": "clean",
 8  "buildDate": "2022-06-27T21:48:01Z",
 9  "goVersion": "go1.17.5",
10  "compiler": "gc",
11  "platform": "linux/amd64"
12}

Fetching from an aggregated API, for example the metrics.k8s.io group, can be done with /apis/metrics.k8s.io/<api_version>/<resource_type> as shown below.

 1$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes | jq
 2{
 3  "kind": "NodeMetricsList",
 4  "apiVersion": "metrics.k8s.io/v1beta1",
 5  "metadata": {},
 6  "items": [
 7    {
 8      "metadata": {
 9        "name": "k3d-kyv180rc1-server-0",
10        "creationTimestamp": "2022-09-11T13:37:39Z",
11        "labels": {
12          "beta.kubernetes.io/arch": "amd64",
13          "beta.kubernetes.io/instance-type": "k3s",
14          "beta.kubernetes.io/os": "linux",
15          "egress.k3s.io/cluster": "true",
16          "kubernetes.io/arch": "amd64",
17          "kubernetes.io/hostname": "k3d-kyv180rc1-server-0",
18          "kubernetes.io/os": "linux",
19          "node-role.kubernetes.io/control-plane": "true",
20          "node-role.kubernetes.io/master": "true",
21          "node.kubernetes.io/instance-type": "k3s"
22        }
23      },
24      "timestamp": "2022-09-11T13:37:24Z",
25      "window": "10.059s",
26      "usage": {
27        "cpu": "298952967n",
28        "memory": "1311340Ki"
29      }
30    }
31  ]
32}

Query parameters are also accepted in the urlPath field. This allows, for example, making API calls with a label selector or a return limit which is beneficial in that some of the processing of these API calls may be offloaded to the Kubernetes API server rather than Kyverno having to process them in JMESPath statements. The following shows a context variable being set which uses an API call with label selector and limit queries.

1context:
2- name: serviceCount
3  apiCall:
4    urlPath: "/api/v1/namespaces/{{ request.namespace }}/services?labelSelector=foo=bar?limit=5"
5    jmesPath: "items[?spec.type == 'LoadBalancer'] | length(@)"    

Handling collections

The API server response for a HTTP GET on a URL path that requests collections of resources will be an object with a list of items (resources).

Here is an example that fetches all Namespace resources:

1kubectl get --raw /api/v1/namespaces | jq

This will return a NamespaceList object with a property items that contains the list of Namespaces:

 1{
 2    "kind": "NamespaceList",
 3    "apiVersion": "v1",
 4    "metadata": {
 5      "selfLink": "/api/v1/namespaces",
 6      "resourceVersion": "2009258"
 7    },
 8    "items": [
 9      {
10        "metadata": {
11          "name": "default",
12          "selfLink": "/api/v1/namespaces/default",
13          "uid": "5011b5d5-abb7-4fef-93f9-8b5fa4b2eba9",
14          "resourceVersion": "155",
15          "creationTimestamp": "2021-01-19T20:20:37Z",
16          "managedFields": [
17            {
18              "manager": "kube-apiserver",
19              "operation": "Update",
20              "apiVersion": "v1",
21              "time": "2021-01-19T20:20:37Z",
22              "fieldsType": "FieldsV1",
23              "fieldsV1": {
24                "f:status": {
25                  "f:phase": {}
26                }
27              }
28            }
29          ]
30        },
31        "spec": {
32          "finalizers": [
33            "kubernetes"
34          ]
35        },
36        "status": {
37          "phase": "Active"
38        }
39      },
40      ...

To process this data in JMESPath, reference the items. Here is an example which extracts a few metadata fields across all Namespace resources:

1kubectl get --raw /api/v1/namespaces | kyverno jp "items[*].{name: metadata.name, creationTime: metadata.creationTimestamp}"

This produces a new JSON list of objects with properties name and creationTime.

 1[
 2  {
 3    "creationTimestamp": "2021-01-19T20:20:37Z",
 4    "name": "default"
 5  },
 6  {
 7    "creationTimestamp": "2021-01-19T20:20:36Z",
 8    "name": "kube-node-lease"
 9  },
10  ...

To find an item in the list you can use JMESPath filters. For example, this command will match a Namespace by its name:

1kubectl get --raw /api/v1/namespaces | kyverno jp "items[?metadata.name == 'default'].{uid: metadata.uid, creationTimestamp: metadata.creationTimestamp}"

In addition to wildcards and filters, JMESPath has many additional powerful features including several useful functions. Be sure to go through the JMESPath tutorial and try the interactive examples in addition to the Kyverno JMESPath page here.

Sample Policy: Limit Services of type LoadBalancer in a Namespace

Here is a complete sample policy that limits each namespace to a single service of type LoadBalancer.

 1apiVersion: kyverno.io/v1
 2kind: ClusterPolicy
 3metadata:
 4  name: limits
 5spec:
 6  validationFailureAction: Enforce
 7  rules:
 8  - name: limit-lb-svc
 9    match:
10      any:
11      - resources:
12          kinds:
13          - Service
14    context:
15    - name: serviceCount
16      apiCall:
17        urlPath: "/api/v1/namespaces/{{ request.namespace }}/services"
18        jmesPath: "items[?spec.type == 'LoadBalancer'] | length(@)"    
19    preconditions:
20      any:
21      - key: "{{ request.operation }}"
22        operator: Equals
23        value: CREATE
24    validate:
25      message: "Only one LoadBalancer service is allowed per namespace"
26      deny:
27        conditions:
28          any:
29          - key: "{{ serviceCount }}"
30            operator: GreaterThan
31            value: 1

This sample policy retrieves the list of Services in the Namespace and stores the count of type LoadBalancer in a variable called serviceCount. A deny rule is used to ensure that the count cannot exceed one.

Variables from Image Registries

A context can also be used to store metadata on an OCI image by using the imageRegistry context type. By using this external data source, a Kyverno policy can make decisions based on details of the container image that occurs as part of an incoming resource.

For example, if you are using an imageRegistry like shown below:

1context: 
2- name: imageData
3  imageRegistry: 
4    reference: "ghcr.io/kyverno/kyverno"

the output imageData variable will have a structure which looks like the following:

1{
2    "image":         "ghcr.io/kyverno/kyverno",
3    "resolvedImage": "ghcr.io/kyverno/kyverno@sha256:17bfcdf276ce2cec0236e069f0ad6b3536c653c73dbeba59405334c0d3b51ecb",
4    "registry":      "ghcr.io",
5    "repository":    "kyverno/kyverno",
6    "identifier":    "latest",
7    "manifest":      manifest,
8    "configData":    config,
9}

The manifest and config keys contain the output from crane manifest <image> and crane config <image> respectively.

For example, one could inspect the labels, entrypoint, volumes, history, layers, etc of a given image. Using the crane tool, show the config of the ghcr.io/kyverno/kyverno:latest image:

 1$ crane config ghcr.io/kyverno/kyverno:latest | jq
 2{
 3  "architecture": "amd64",
 4  "author": "github.com/ko-build/ko",
 5  "created": "2023-01-08T00:10:08Z",
 6  "history": [
 7    {
 8      "author": "apko",
 9      "created": "2023-01-08T00:10:08Z",
10      "created_by": "apko",
11      "comment": "This is an apko single-layer image"
12    },
13    {
14      "author": "ko",
15      "created": "0001-01-01T00:00:00Z",
16      "created_by": "ko build ko://github.com/kyverno/kyverno/cmd/kyverno",
17      "comment": "kodata contents, at $KO_DATA_PATH"
18    },
19    {
20      "author": "ko",
21      "created": "0001-01-01T00:00:00Z",
22      "created_by": "ko build ko://github.com/kyverno/kyverno/cmd/kyverno",
23      "comment": "go build output, at /ko-app/kyverno"
24    }
25  ],
26  "os": "linux",
27  "rootfs": {
28    "type": "layers",
29    "diff_ids": [
30      "sha256:c9770b71bc04d50fb006eaacea8180b5f7c0fc72d16618590ec5231f9cec2525",
31      "sha256:ffe56a1c5f3878e9b5f803842adb9e2ce81584b6bd027e8599582aefe14a975b",
32      "sha256:de3816af2ab66f6b306277c83a7cc9af74e5b0e235021a37f2fc916882751819"
33    ]
34  },
35  "config": {
36    "Entrypoint": [
37      "/ko-app/kyverno"
38    ],
39    "Env": [
40      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/ko-app",
41      "SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt",
42      "KO_DATA_PATH=/var/run/ko"
43    ],
44    "User": "65532"
45  }
46}

In the output above, we can see under config.User that the USER Dockerfile statement to run this container is 65532. A Kyverno policy can be written to harness this information and perform, for example, a validation that the USER of an image is non-root.

 1apiVersion: kyverno.io/v1
 2kind: ClusterPolicy
 3metadata:
 4  name: imageref-demo
 5spec:
 6  validationFailureAction: Enforce
 7  rules:
 8  - name: no-root-images
 9    match:
10      any:
11      - resources:
12          kinds:
13          - Pod
14    preconditions:
15      all:
16      - key: "{{request.operation}}"
17        operator: NotEquals
18        value: DELETE
19    validate:
20      message: "Images run as root are not allowed."  
21      foreach:
22      - list: "request.object.spec.containers"
23        context: 
24        - name: imageData
25          imageRegistry: 
26            reference: "{{ element.image }}"
27        deny:
28          conditions:
29            any:
30              - key: "{{ imageData.configData.config.User || ''}}"
31                operator: Equals
32                value: ""

In the above sample policy, a new context has been written named imageData which uses the imageRegistry type. The reference key is used to instruct Kyverno where the image metadata is stored. In this case, the location is the same as the image itself hence element.image where element is each container image inside of a Pod. The value can then be referenced in an expression, for example in deny.conditions via the key {{ imageData.configData.config.User || ''}}.

Using a sample “bad” resource to test which violates this policy, such as below, the Pod is blocked.

1apiVersion: v1
2kind: Pod
3metadata:
4  name: badpod
5spec:
6  containers:
7  - name: ubuntu
8    image: ubuntu:latest
1$ kubectl apply -f bad.yaml 
2Error from server: error when creating "bad.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: 
3
4resource Pod/default/badpod was blocked due to the following policies
5
6imageref-demo:
7  no-root-images: 'validation failure: Images run as root are not allowed.'

By contrast, when using a “good” Pod, such as the Kyverno container image referenced above, the resource is allowed.

1apiVersion: v1
2kind: Pod
3metadata:
4  name: goodpod
5spec:
6  containers:
7  - name: kyverno
8    image: ghcr.io/kyverno/kyverno:latest
1$ kubectl apply -f good.yaml 
2pod/goodpod created

The imageRegistry context type also has an optional property called jmesPath which can be used to apply a JMESPath expression to contents returned by imageRegistry prior to storing as the context value. For example, the below snippet stores the total size of an image in a context named imageSize by summing up all the constituent layers of the image as reported by its manifest (visible with, for example, crane by using the crane manifest command). The value of the context variable can then be evaluated in a later expression.

1context: 
2  - name: imageSize
3    imageRegistry: 
4      reference: "{{ element.image }}"
5      # Note that we need to use `to_string` here to allow kyverno to treat it like a resource quantity of type memory
6      # the total size of an image as calculated by docker is the total sum of its layer sizes
7      jmesPath: "to_string(sum(manifest.layers[*].size))"

To access images stored on private registries, see using private registries

For more examples of using an imageRegistry context, see the samples page.

The policy-level setting failurePolicy when set to Ignore additionally means that failing calls to image registries will be ignored. This allows for Pods to not be blocked if the registry is offline, useful in situations where images already exist on the nodes.

Last modified January 17, 2023 at 11:46 AM PST: 1.9 documentation updates (#733) (702f6d2)