Kubernetes Client Facilities

This chapter describes the types and functions provided by k8s core and client modules that are leveraged by the MCM - it only covers what is required to understand MCM code and is simply meant to be a helpful review. References are provided for further reading.

K8s apimachinery

K8s objects

A K8s object represents a persistent entity. When using the K8s client-go framework to define such an object, one should follow the rules:

  1. A Go type representing a object must embed the k8s.io/apimachinery/pkg/apis/meta/v1.ObjectMeta struct. ObjectMeta is metadata that all persisted resources must have, which includes all objects users must create.
  2. A Go type representing a singluar object must embed k8s.io/apimachinery/pkg/apis/meta/v1.TypeMeta which describes an individual object in an API response or request with strings representing the Kind of the object and its API schema version called APIVersion.
graph TB
    subgraph CustomType

    ObjectMeta
    TypeMeta
    end
  1. A Go type representing a list of a custom type must embed k8s.io/apimachinery/pkg/apis/meta/v1.ListMeta
graph TB
    subgraph CustomTypeList

    ObjectMeta
    ListMeta
    end

TypeMeta

type TypeMeta struct {
	// Kind is a string value representing the REST resource this object represents.
	Kind string 

	// APIVersion defines the versioned schema of this representation of an object.
	APIVersion string
}

ObjectMeta

A snippet of ObjectMeta struct fields shown below for convenience with the MCM relevant fields that are used by controller code.

type ObjectMeta struct { //snippet 
    // Name must be unique within a namespace. Is required when creating resources,
    Name string 

    // Namespace defines the space within which each name must be unique. An empty namespace is  equivalent to the "default" namespace,
    Namespace string

    // An opaque value that represents the internal version of this object that can be used by clients to determine when objects have changed.
    ResourceVersion string

	// A sequence number representing a specific generation of the desired state.
	Generation int64 

    // UID is the unique in time and space value for this object. It is typically generated by the API server on successful creation of a resource and is not allowed to change on PUT operations.
    UID types.UID 

    // CreationTimestamp is a timestamp representing the server time when this object was  created.
    CreationTimestamp Time 

    // DeletionTimestamp is RFC 3339 date and time at which this resource will be deleted. This field is set by the server when a graceful deletion is requested by the user.  The resource is expected to be deleted (no longer reachable via APIs) after the time in this field, once the finalizers list is empty.
    DeletionTimestamp *Time


    // Must be empty before the object is deleted from the registry by the API server. Each entry is an identifier for the responsible controller that will remove the entry from the list.
    Finalizers []string 

    // Map of string keys and values that can be used to organize and categorize (scope and select) objects. Valid label keys have two segments: an optional prefix and name, separated by a slash (/).  Meant to be meaningful and relevant to USERS.
    Labels map[string]string

    // Annotations is an unstructured key value map stored with a resource that may be  set by controllers/tools to store and retrieve arbitrary metadata. Meant for TOOLS.
    Annotations map[string]string 

    // References to owner objects. Ex: Pod belongs to its owning ReplicaSet. A Machine belongs to its owning MachineSet.
    OwnerReferences []OwnerReference

    // The name of the cluster which the object belongs to. This is used to distinguish resources with same name and namespace in different clusters.
    ClusterName string

    //... other fields omitted.
}
OwnerReferences

k8s.io/apimachinery/pkg/apis/meta/v1.OwnerReference is a struct that contains TypeMeta fields and a small sub-set of the ObjectMetadata - enough to let you identify an owning object. An owning object must be in the same namespace as the dependent, or be cluster-scoped, so there is no namespace field.

type OwnerReference struct {
   APIVersion string
   Kind string 
   Name string 
   UID types.UID 
   //... other fields omitted. TODO: check for usages.
}
Finalizers and Deletion

Every k8s object has a Finalizers []string field that can be explicitly assigned by a controller. Every k8s object has a DeletionTimestamp *Time that is set by API Server when graceful deletion is requested.

These are part of the k8s.io./apimachinery/pkg/apis/meta/v1.ObjectMeta struct type which is embedded in all k8s objects.

When you tell Kubernetes to delete an object that has finalizers specified for it, the Kubernetes API marks the object for deletion by populating .metadata.deletionTimestamp aka Deletiontimestamp, and returns a 202 status code (HTTP Accepted). The target object remains in a terminating state while the control plane takes the actions defined by the finalizers. After these actions are complete, the controller should removes the relevant finalizers from the target object. When the metadata.finalizers field is empty, Kubernetes considers the deletion complete and deletes the object.

Diff between Labels and Annotations

Labels are used in conjunction with selectors to identify groups of related resources and meant to be meaningful to users. Because selectors are used to query labels, this operation needs to be efficient. To ensure efficient queries, labels are constrained by RFC 1123. RFC 1123, among other constraints, restricts labels to a maximum 63 character length. Thus, labels should be used when you want Kubernetes to group a set of related resources. See https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ on label key and value restrictions

Annotations are used for “non-identifying information” i.e., metadata that Kubernetes does not care about. As such, annotation keys and values have no constraints. Can include characters not

RawExtension

k8s.io/apimachinery/pkg/runtime.RawExtension is used to hold extension objects whose structures can be arbitrary. An example of MCM type that leverages this is the MachineClass type whose ProviderSpec field is of type runtime.RawExtension and whose structure can vary according to the provider.

One can use a different custom structure type for each extension variant and then decode the field into an instance of that extension structure type using the standard Go json decoder.

Example:

var providerSpec *api.AWSProviderSpec
json.Unmarshal(machineClass.ProviderSpec.Raw, &providerSpec)

One can

API Errors

k8s.io/apimachinery/pkg/api/errors provides detailed error types ans IsXXX methods for k8s api errors.

errors.IsNotFound

k8s.io/apimachinery/pkg/api/errors.IsNotFound returns true if the specified error was due to a k8s object not found. (error or wrapped error created by errors.NewNotFound)

errors.IsNotFound

k8s.io/apimachinery/pkg/api/errors.IsTooManyRequests determines if err (or any wrapped error) is an error which indicates that there are too many requests that the server cannot handle.

API Machinery Utilities

wait.Until

k8s.io/apimachinery/pkg/wait.Until loops until stop channel is closed, running f every given period.

func Until(f func(), period time.Duration, stopCh <-chan struct{})

wait.PollImmediate

k8s.io/apimachinery/pkg/util/wait.PollImmediate tries a condition func until it returns true, an error, or the timeout is reached.

func PollImmediate(interval, timeout time.Duration, condition ConditionFunc) error

wait.Backoff

k8s.io/apimachinery/pkg/util/wait.Backoff holds parameters applied to a Backoff function. There are many retry functions in client-go and MCM that take an instance of this struct as parameter.

type Backoff struct {
	Duration time.Duration
	Factor float64
	Jitter float64
	Steps int
	Cap time.Duration
}
  • Duration is the initial backoff duration.
  • Duration is multiplied by Factor for the next iteration.
  • Jitter is the random amount of each duration added (between Duration and Duration*(1+jitter)
  • Steps is the remaining number of iterations in which the duration may increase.
  • Cap is the cap on the duration and may not exceed this value.

Errors

k8s.io/apimachinery/pkg/util/errors.Aggregate represents an object that contains multiple errors

Use k8s.io/apimachinery/pkg/util/errors.NewAggregate to construct the aggregate error from a slice of errors.

type Aggregate interface {
	error
	Errors() []error
	Is(error) bool
}
func NewAggregate(errlist []error) Aggregate {//...}

K8s API Core

The MCM leverages several types from https://pkg.go.dev/k8s.io/api/core/v1

Node

k8s.io/api/core/v1.Node represents a worker node in Kubernetes.

type Node struct {
    metav1.TypeMeta
    metav1.ObjectMeta 
    Spec NodeSpec
    // Most recently observed status of the node.
    Status NodeStatus
}

NodeSpec

k8s.io/api/core/v1.NodeSpecdescribes the attributes that a node is created with. Both Node and MachineSpec use this. A snippet of MCM-relevant NodeSpec struct fields shown below for convenience.

type NodeSpec struct {
    // ID of the node assigned by the cloud provider in the format: <ProviderName>://<ProviderSpecificNodeID>
    ProviderID string 
    // podCIDRs represents the IP ranges assigned to the node for usage by Pods on that node.
    PodCIDRs []string 

    // Unschedulable controls node schedulability of new pods. By default, node is schedulable.
    Unschedulable bool

    // Taints represents the Node's Taints. (taint is opposite of affinity. allow a Node to repel pods as opposed to attracting them)
    Taints []Taint
}
Node Taints

See Taints and Tolerations

k8s.io/api/core/v1.Taint is a Kubernetes Node property that enable specific nodes to repel pods. Tolerations are a Kubernetes Pod property that overcome this and allow a pod to be scheduled on a node with a matching taint.

Instead of applying the label to a node, we apply a taint that tells a scheduler to repel Pods from this node if it does not match the taint. Only those Pods that have a toleration for the taint can be let into the node with that taint.

kubectl taint nodes <node name> <taint key>=<taint value>:<taint effect>

Example:

kubectl taint nodes node1 gpu=nvidia:NoSchedule

Users can specify any arbitrary string for the taint key and value. The taint effect defines how a tainted node reacts to a pod without appropriate toleration. It must be one of the following effects;

  • NoSchedule: The pod will not get scheduled to the node without a matching toleration.

  • NoExecute:This will immediately evict all the pods without the matching toleration from the node.

  • PerferNoSchedule:This is a softer version of NoSchedule where the controller will not try to schedule a pod with the tainted node. However, it is not a strict requirement.

type Taint struct {
	// Key of taint to be applied to a node.
	Key string
	// Value of taint corresponding to the taint key.
	Value string 
	// Effect represents the effect of the taint on pods
	// that do not tolerate the taint.
	// Valid effects are NoSchedule, PreferNoSchedule and NoExecute.
	Effect TaintEffect 
	// TimeAdded represents the time at which the taint was added.
	// It is only written for NoExecute taints.
	// +optional
	TimeAdded *metav1.Time 
}

Example of a PodSpec with toleration below:

apiVersion: v1
kind: Pod
metadata:
  name: pod-1
  labels:
    security: s1
spec:
  containers:
  - name: bear
    image: supergiantkir/animals:bear
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "nvidia"
    effect: "NoSchedule"

Example use case for a taint/tolerance: If you have nodes with special hardware (e.g GPUs) you want to repel Pods that do not need this hardware and attract Pods that do need it. This can be done by tainting the nodes that have the specialized hardware (e.g. kubectl taint nodes nodename gpu=nvidia:NoSchedule ) and adding corresponding toleration to Pods that must use this special hardware.

NodeStatus

See Node status

k8s.io/api/core/v1.NodeStatus represents the current status of a node and is an encapsulation illustrated below:

type NodeStatus struct {
	Capacity ResourceList 
	// Allocatable represents the resources of a node that are available for scheduling. Defaults to Capacity.
	Allocatable ResourceList
	// Conditions is an array of current observed node conditions.
	Conditions []NodeCondition 
	// List of addresses reachable to the node.Queried from cloud provider, if available.
	Addresses []NodeAddress 
	// Set of ids/uuids to uniquely identify the node.
	NodeInfo NodeSystemInfo 
	// List of container images on this node
	Images []ContainerImage 
	// List of attachable volumes that are in use (mounted) by the node.
  //UniqueVolumeName is just typedef for string
	VolumesInUse []UniqueVolumeName 
	// List of volumes that are attached to the node.
	VolumesAttached []AttachedVolume 
}
Capacity

Capacity. The fields in the capacity block indicate the total amount of resources that a Node has.

  • Allocatable indicates the amount of resources on a Node that is available to be consumed by normal Pods. Defaults to Capacity.

A Node Capacity is of type k8s.io/api/core/v1.ResourceList which is effectively a set of set of (resource name, quantity) pairs.

type ResourceList map[ResourceName]resource.Quantity

ResourceNames can be cpu/memory/storage

const (
	// CPU, in cores. (500m = .5 cores)
	ResourceCPU ResourceName = "cpu"
	// Memory, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
	ResourceMemory ResourceName = "memory"
	// Volume size, in bytes (e,g. 5Gi = 5GiB = 5 * 1024 * 1024 * 1024)
	ResourceStorage ResourceName = "storage"
	// Local ephemeral storage, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
	// The resource name for ResourceEphemeralStorage is alpha and it can change across releases.
	ResourceEphemeralStorage ResourceName = "ephemeral-storage"
)

A k8s.io/apimachinery/pkg/api/resource.Quantity is a serializable/de-serializable number with a SI unit

Conditions

Conditions are valid conditions of nodes.

https://pkg.go.dev/k8s.io/api/core/v1.NodeCondition contains condition information for a node.

type NodeCondition struct {
	// Type of node condition.
	Type NodeConditionType 
	// Status of the condition, one of True, False, Unknown.
	Status ConditionStatus 
	// Last time we got an update on a given condition.
	LastHeartbeatTime metav1.Time 
	// Last time the condition transitioned from one status to another.
	LastTransitionTime metav1.Time 
	// (brief) reason for the condition's last transition.
	Reason string 
	// Human readable message indicating details about last transition.
	Message string 
}
NodeConditionType

NodeConditionType is one of the following:

const (
	// NodeReady means kubelet is healthy and ready to accept pods.
	NodeReady NodeConditionType = "Ready"
	// NodeMemoryPressure means the kubelet is under pressure due to insufficient available memory.
	NodeMemoryPressure NodeConditionType = "MemoryPressure"
	// NodeDiskPressure means the kubelet is under pressure due to insufficient available disk.
	NodeDiskPressure NodeConditionType = "DiskPressure"
	// NodePIDPressure means the kubelet is under pressure due to insufficient available PID.
	NodePIDPressure NodeConditionType = "PIDPressure"
	// NodeNetworkUnavailable means that network for the node is not correctly configured.
	NodeNetworkUnavailable NodeConditionType = "NetworkUnavailable"
)

Note: The MCM extends the above with further custom node condition types of its own. Not sure if this is correct - could break later if k8s enforces some validation ?

ConditionStatus

These are valid condition statuses. ConditionTrue means a resource is in the condition. ConditionFalse means a resource is not in the condition. ConditionUnknown means kubernetes can't decide if a resource is in the condition or not.

type ConditionStatus string

const (
	ConditionTrue    ConditionStatus = "True"
	ConditionFalse   ConditionStatus = "False"
	ConditionUnknown ConditionStatus = "Unknown"
)
Addresses

See Node Addresses

k8s.io/api/core/v1.NodeAddress contains information for the node's address.

type NodeAddress struct {
	// Node address type, one of Hostname, ExternalIP or InternalIP.
	Type NodeAddressType 
	// The node address string.
	Address string 
}
NodeSystemInfo

Describes general information about the node, such as machine id, kernel version, Kubernetes version (kubelet and kube-proxy version), container runtime details, and which operating system the node uses. The kubelet gathers this information from the node and publishes it into the Kubernetes API.

type NodeSystemInfo struct {
	// MachineID reported by the node. For unique machine identification
	// in the cluster this field is preferred. 
	MachineID string 
	// Kernel Version reported by the node from 'uname -r' (e.g. 3.16.0-0.bpo.4-amd64).
	KernelVersion string 
	// OS Image reported by the node from /etc/os-release (e.g. Debian GNU/Linux 7 (wheezy)).
	OSImage string 
	// ContainerRuntime Version reported by the node through runtime remote API (e.g. docker://1.5.0).
	ContainerRuntimeVersion string 
	// Kubelet Version reported by the node.
	KubeletVersion string 
	// KubeProxy Version reported by the node.
	KubeProxyVersion string 
	// The Operating System reported by the node
	OperatingSystem string 
	// The Architecture reported by the node
	Architecture string 
}
  • The MachineID is a single newline-terminated, hexadecimal, 32-character, lowercase ID. from /etc/machine-id
Images

A slice of k8s.io/api/core/v1.ContainerImage which describes a contianer image.

type ContainerImage struct {
	Names []string 
	SizeBytes int64 
}
  • Names is the names by which this image is known. e.g. ["kubernetes.example/hyperkube:v1.0.7", "cloud-vendor.registry.example/cloud-vendor/hyperkube:v1.0.7"]
  • SizeBytes is the size of the image in bytes.
Attached Volumes

k8s.io/api/core/v1.AttachedVolume describes a volume attached to a node.

type UniqueVolumeName string
type AttachedVolume struct {
	Name UniqueVolumeName 
	DevicePath string 
}
  • Name is the Name of the attached volume. UniqueVolumeName is just a typedef for a Go string.
  • DevicePath represents the device path where the volume should be available

Pod

k8s.io/api/core/v1.Pod struct represents a k8s Pod which a collection of containers that can run on a host. This resource is created by clients and scheduled onto hosts.

type Pod struct {
	metav1.TypeMeta
	metav1.ObjectMeta

	// Specification of the desired behavior of the pod.
	Spec PodSpec 

	// Most recently observed status of the pod.
	Status PodStatus
}

See k8s.io/api/core/v1.PodSpec. Each PodSpec has a priority value where the higher the value, the higher the priority of the Pod.

PodSpec.Volumes slice is List of volumes that can be mounted by containers belonging to the pod and is relevant to MC code that attaches/detaches volumes .

type PodSpec struct {
	 Volumes []Volume 
	 //TODO: describe other PodSpec fields used by MC
}

Pod Eviction

A k8s.io/api/policy/v1.Eviction can be used to evict a Pod from its Node - eviction is the graceful terimation of Pods on nodes.See API Eviction

type Eviction struct {
	metav1.TypeMeta 

	// ObjectMeta describes the pod that is being evicted.
	metav1.ObjectMeta 

	// DeleteOptions may be provided
	DeleteOptions *metav1.DeleteOptions 
}

Construct the ObjectMeta using the Pod and namespace and then use instance of typed Kubernetes Client Interface, and get the PolicyV1Interface, get the EvictionInterface and invoke the invoke the Evict method

Example

client.PolicyV1().Evictions(eviction.Namespace).Evict(ctx, eviction)

Pod Disruption Budget

A k8s.io/api/policy/v1.PodDisruptionBudgetis a struct type that represents a Pod Disruption Budget which is the max disruption that can be caused to a collection of pods.

type PodDisruptionBudget struct {
	metav1.TypeMeta 
	metav1.ObjectMeta

	// Specification of the desired behavior of the PodDisruptionBudget.
	Spec PodDisruptionBudgetSpec 
	// Most recently observed status of the PodDisruptionBudget.
	Status PodDisruptionBudgetStatus 
}
PodDisruptionBudgetSpec
type PodDisruptionBudgetSpec struct {
	MinAvailable *intstr.IntOrString 
	Selector *metav1.LabelSelector 
	MaxUnavailable *intstr.IntOrString 
}
  • Selector specifies Label query over pods whose evictions are managed by the disruption budget. A null selector will match no pods, while an empty ({}) selector will select all pods within the namespace.
  • An eviction is allowed if at least MinAvailable pods selected by Selector will still be available after the eviction, i.e. even in the absence of the evicted pod. So for example you can prevent all voluntary evictions by specifying "100%".
  • An eviction is allowed if at most MaxUnavailable pods selected by Selector are unavailable after the eviction, i.e. even in absence of the evicted pod. For example, one can prevent all voluntary evictions by specifying 0.
  • MinAvailable is a mutually exclusive setting with MaxUnavailable
PodDisruptionBudgetStatus

See godoc for k8s.io/api/policy/v1.PodDisruptionBudgetStatus

Pod Volumes

A k8s.io/api/core/v1.Volume represents a named volume in a pod - which is a directory that may be accessed by any container in the pod. See Pod Volumes

A Volume has a Name and embeds a VolumeSource as shown below. A VolumeSource represents the location and type of the mounted volume.

type Volume struct {
	Name string 
	VolumeSource
}

VolumeSource which represents the source of a volume to mount should only have ONE of its fields popularted. The MC uses PersistentVolumeClaim field which is pointer to a PersistentVolumeClaimVolumeSource which represents a reference to a PersistentVolumeClaim in the same namespace.

type VolumeSource struct {
	PersistentVolumeClaim *PersistentVolumeClaimVolumeSource
}
type PersistentVolumeClaimVolumeSource struct  {
	ClaimName string
	ReadOnly bool
}

PersistentVolume

A k8s.io/api/core/v1.PersistentVolume (PV) represents a piece of storage in the cluster. See K8s Persistent Volumes

PersistentVolumeClaim

A k8s.io/api/core/v1.PersistentVolumeClaim represents a user's request for and claim to a persistent volume.

type PersistentVolumeClaim struct {
	metav1.TypeMeta 
	metav1.ObjectMeta
	Spec PersistentVolumeClaimSpec
	Status PersistentVolumeClaimStatus
}
type PersistentVolumeClaimSpec struct {
	StorageClassName *string 
	//...
	VolumeName string
	//...
}

Note that PersistentVolumeClaimSpec.VolumeName is of interest to the MC which represents the binding reference to the PersistentVolume backing this claim. Please note that this is different from Pod.Spec.Volumes[*].Name which is more like a label for the volume directory.

Secret

A k8s.io/api/core/v1.Secret holds secret data of a secret type whose size < 1MB. See K8s Secrets

  • Secret Data is in Secret.Data which is a map[string][]byte where the bytes is the secret value and key is simple ASCII alphanumeric.
type Secret struct {
	metav1.TypeMeta 
	metav1.ObjectMeta 
	Data map[string][]byte 
	Type SecretType 
	//... omitted for brevity
}

SecretType can be of many types: SecretTypeOpaque which represents user-defined secreets, SecretTypeServiceAccountToken whichcontains a token that identifies a service account to the API, etc.

client-go

k8s clients have the type k8s.io/client-go/kubernetes.ClientSet which is actually a high-level client set facade encapsulating clients for the core, appsv1, discoveryv1, eventsv1, networkingv1, nodev1, policyv1, storagev1 api groups. These individual clients are available via accessor methods. ie use clientset.AppsV1() to get the the AppsV1 client.

// Clientset contains the clients for groups. Each group has exactly one
// version included in a Clientset.
type Clientset struct {
    appsV1                       *appsv1.AppsV1Client
    coreV1                       *corev1.CoreV1Client
    discoveryV1                  *discoveryv1.DiscoveryV1Client
    eventsV1                     *eventsv1.EventsV1Client
    // ...

  // AppsV1 retrieves the AppsV1Client
  func (c *Clientset) AppsV1() appsv1.AppsV1Interface {
    return c.appsV1
  }
    // ...
}

As can be noticed from the above snippet, each of these clients associated with api groups expose an interface named GroupVersionInterface that in-turn provides further access to a generic REST Interface as well as access to a typed interface containing getter/setter methods for objects within that API group.

For example EventsV1Client which is used to interact with features provided by the events.k8s.io group implements EventsV1Interface

type EventsV1Interface interface {
	RESTClient() rest.Interface // generic REST API access
	EventsGetter // typed interface access
}
// EventsGetter has a method to return a EventInterface.
type EventsGetter interface {
	Events(namespace string) EventInterface
}

The ClientSet struct implements the kubernetes.Interface facade which is a high-level facade containing all the methods to access the individual GroupVersionInterface facade clients.

type Interface interface {
	Discovery() discovery.DiscoveryInterface
	AppsV1() appsv1.AppsV1Interface
	CoreV1() corev1.CoreV1Interface
	EventsV1() eventsv1.EventsV1Interface
	NodeV1() nodev1.NodeV1Interface
 // ... other facade accessor
}

One can generate k8s clients for custom k8s objects that follow the same pattern for core k8s objects. The details are not covered here. Please refer to How to generate client codes for Kubernetes Custom Resource Definitions , gengo, k8s.io codegenrator and the slightly-old article: Kubernetes Deep Dive: Code Generation for CustomResources

client-go Shared Informers.

The vital role of a Kubernetes controller is to watch objects for the desired state and the actual state, then send instructions to make the actual state be more like the desired state. The controller thus first needs to retrieve the object's information. Instead of making direct API calls using k8s listers/watchers, client-go controllers should use SharedInformers.

cache.SharedInformer is a primitive exposed by client-go lib that maintains a local cache of k8s objects of a particular API group and kind/resource. (restricable by namespace/label/field selectors) which is linked to the authoritative state of the corresponding objects in the API server.

Informers are used to reduced the load-pressure on the API Server and etcd.

All that is needed to be known at this point is that Informers internally watch for k8s object changes, update an internal indexed store and invoke registered event handlers. Client code must construct event handlers to inject the logic that one would like to execute when an object is Added/Updated/Deleted.

type SharedInformer interface {
	// AddEventHandler adds an event handler to the shared informer using the shared informer's resync period.  Events to a single handler are delivered sequentially, but there is no coordination between different handlers.
	AddEventHandler(handler cache.ResourceEventHandler)

	// HasSynced returns true if the shared informer's store has been
	// informed by at least one full LIST of the authoritative state
	// of the informer's object collection.  This is unrelated to "resync".
	HasSynced() bool

	// Run starts and runs the shared informer, returning after it stops.
	// The informer will be stopped when stopCh is closed.
	Run(stopCh <-chan struct{})
	//..
}

Note: resync period tells the informer to rebuild its cache every every time the period expires.

cache.ResourceEventHandler handle notifications for events that happen to a resource.

type ResourceEventHandler interface {
	OnAdd(obj interface{})
	OnUpdate(oldObj, newObj interface{})
	OnDelete(obj interface{})
}

cache.ResourceEventHandlerFuncs is an adapter to let you easily specify as many or as few of the notification functions as you want while still implementing ResourceEventHandler. Nearly all controllers code use instance of this adapter struct to create event handlers to register on shared informers.

type ResourceEventHandlerFuncs struct {
	AddFunc    func(obj interface{})
	UpdateFunc func(oldObj, newObj interface{})
	DeleteFunc func(obj interface{})
}

Shared informers for standard k8s objects can be obtained using the k8s.io/client-go/informers.NewSharedInformerFactory or one of the variant factory methods in the same package.

Informers and their factory functions for custom k8s objects are usually found in the generated factory code usually in a factory.go file.

See github.com/machine-controller-manager/pkg/client/informers/externalversions/externalversions.NewSharedInformerFactory

client-go workqueues

The basic workqueue.Interface has the following methods:

type Interface interface {
	Add(item interface{})
	Len() int
	Get() (item interface{}, shutdown bool)
	Done(item interface{})
	ShutDown()
	ShutDownWithDrain()
	ShuttingDown() bool
}

This is extended with ability to Add Item as a later time using the workqueue.DelayingInterface.

type DelayingInterface interface {
	Interface
	// AddAfter adds an item to the workqueue after the indicated duration has passed
	// Used to requeue items after failueres to avoid ending in hot-loop
	AddAfter(item interface{}, duration time.Duration)
}

This is further extended with rate limiting using workqueue.RateLimiter

type RateLimiter interface {
 	// When gets an item and gets to decide how long that item should wait
	When(item interface{}) time.Duration
	// Forget indicates that an item is finished being retried.  Doesn't matter whether its for perm failing
	// or for success, we'll stop tracking it
	Forget(item interface{})
	// NumRequeues returns back how many failures the item has had
	NumRequeues(item interface{}) int
}

client-go controller steps

The basic high-level contract for a k8s-client controller leveraging work-queues goes like the below:

  1. Create rate-limited work queue(s) created using workqueue.NewNamedRateLimitingQueue
  2. Define lifecycle callback functions (Add/Update/Delete) which accept k8s objects and enqueue k8s object keys (namespace/name) on these rate-limited work queue(s).
  3. Create informers using the shared informer factory functions.
  4. Add event handlers to the informers specifying these callback functions.
    1. When informers are started, they will invoke the appropriate registered callbacks when k8s objects are added/updated/deleted.
  5. The controller Run loop then picks up objects from the work queue using Get and reconciles them by invoking the appropriate reconcile function, ensuring that Done is called after reconcile to mark it as done processing.

Example:

%%{init: {'themeVariables': { 'fontSize': '10px'}, "flowchart": {"useMaxWidth": false }}}%%
flowchart TD

CreateWorkQueue["machineQueue:=workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), 'machine')"]
-->
CreateInformer["machineInformerFactory := externalversions.NewSharedInformerFactory(...)
machineInformer := machineInformerFactory.Machine().V1alpha1().Machines()"]
-->
AddEventHandler["machineInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
		AddFunc:    controller.machineToMachineClassAdd,
		UpdateFunc: controller.machineToMachineClassUpdate,
		DeleteFunc: controller.machineToMachineClassDelete,
	})"]
-->Run["while (!shutdown) {
  key, shutdown := queue.Get()
  defer queue.Done(key)
  reconcile(key.(string))
}
"]
-->Z(("End"))

A more elaborate example of basic client-go controller flow is demonstrated in the clien-go workqueue example

client-go utilities

k8s.io/client-go/util/retry.RetryOnConflict

func RetryOnConflict(backoff wait.Backoff, fn func() error) error
  • retry.RetryOnConflict is used to make an update to a resource when you have to worry about conflicts caused by other code making unrelated updates to the resource at the same time.
  • fn should fetch the resource to be modified, make appropriate changes to it, try to update it, and return (unmodified) the error from the update function.
  • On a successful update, RetryOnConflict will return nil. If the update fn returns a Conflict error, RetryOnConflict will wait some amount of time as described by backoff, and then try again.
  • On a non-Conflict error, or if it retries too many times (backoff.Steps has reached zero) and gives up, RetryOnConflict will return an error to the caller.

References