Software Lifecycle Management
Beta
The gNOI operations, write-class actions, and software lifecycle features
described on this page are Beta. They are functional and tested on
supported platforms but carry v1alpha1 API versions and schemas that
may change between releases. Write-class gNOI actions and software
upgrades are disabled by default and require explicit runtime gates before
use. Evaluate in non-production environments first.
Beta
The gNOI operations, write-class actions, and software lifecycle features
described on this page are Beta. They are functional and tested on
supported platforms but carry v1alpha1 API versions and schemas that
may change between releases. Write-class gNOI actions and software
upgrades are disabled by default and require explicit runtime gates before
use. Evaluate in non-production environments first.
This section covers the gNOI control plane for device operations and IOS-XE software lifecycle management. It is separate from the pod app-hosting lifecycle: pods still use RESTCONF app-hosting RPCs, while gNOI handles device-level probes, file access, reboot, factory reset, and OS upgrade.
Responsibility Split
CVK separates gNOI workflows by trust level so operators can grant narrow RBAC.
| Surface | CRD | Purpose | Runtime gate |
|---|---|---|---|
| Read-only operations | DeviceOperation |
Show commands, config diff, packet capture, and read-only gNOI probes | none beyond normal CRD/RBAC access |
| Write-class actions | IOSXEOperationalAction |
Reboot, cancel reboot, kill process, file put/remove, factory reset | --enable-write-class-gnoi or CISCO_VK_ENABLE_WRITE_CLASS_GNOI |
| Software lifecycle | IOSXESoftwareUpgrade |
Install, activate, verify, and rollback IOS-XE software | --enable-iosxesoftwareupgrade or CISCO_VK_ENABLE_IOSXE_SOFTWARE_UPGRADE |
The write-class and software-upgrade gates are intentionally separate. Enabling read-only gNOI does not enable reboot, file writes, factory reset, or OS activation.
Helm exposes the same controls under the gnoi values block:
| Helm value | Environment rendered into VK pods | Effect |
|---|---|---|
gnoi.insecure |
CISCO_VK_GNOI_INSECURE=1 |
Use the insecure IOS-XE gNxI listener. |
gnoi.port |
CISCO_VK_GNOI_PORT=<port> |
Pin the gNOI listener port. Empty lets CVK infer 50052 for insecure and 9339 for secure gNOI. |
gnoi.disabled |
CISCO_VK_GNOI_DISABLED=1 |
Prevent the per-device gNOI client from being constructed. |
gnoi.enableSoftwareUpgrade |
CISCO_VK_ENABLE_IOSXE_SOFTWARE_UPGRADE=1 |
Enable IOSXESoftwareUpgrade reconciliation. |
gnoi.enableWriteClass |
CISCO_VK_ENABLE_WRITE_CLASS_GNOI=1 |
Enable destructive/write-class IOSXEOperationalAction reconciliation. |
Connection Model
The IOS-XE driver uses a workload-classed gRPC connection pool for gNOI and gNMI work:
| Class | Used by | Why it is separate |
|---|---|---|
ClassControl |
Unary gNOI RPCs such as time, ping, reboot status, cert get, and OS verify | Keeps small control RPCs responsive. |
ClassTelemetry |
gNMI Subscribe streams | Keeps telemetry streams independent from operations. |
ClassBulkTransfer |
OS install and file put/get | Prevents large file transfers from back-pressuring control or telemetry traffic. |
The gNOI client validates IOS-XE filesystem prefixes for file paths and caches
per-service capability probes. gNMI capabilities do not enumerate gNOI
services, so CVK learns support by observing gNOI responses. A
codes.Unimplemented response marks that service unsupported in the in-process
cache and later calls fail fast with ErrServiceUnsupported until the cache
expires or the process restarts.
Read-Only Operations
DeviceOperation contains the low-trust operational surface. gNOI-backed kinds
return structured output through the same status path as read-only show
commands and packet captures.
| Kind | gNOI service | Typical use |
|---|---|---|
GNOIPing |
System | Reachability probe from the device. |
GNOITraceroute |
System | Hop-by-hop path check from the device. |
GNOITime |
System | Device clock check. |
GNOIFileGet |
File | Read a bounded file preview or spill to ConfigMap. |
GNOIFileStat |
File | Validate staged files and metadata. |
GNOICertGet |
Cert | List installed certificates. |
GNOICanGenerateCSR |
Cert | Check CSR support for a key/certificate profile. |
GNOIRebootStatus |
System | Inspect pending or active reboot state. |
GNOIOSVerify |
OS | Verify the current running version and activation state. |
For concrete examples, see the DeviceOperation runbook.
Write-Class Actions
IOSXEOperationalAction supports one-shot device-mutating gNOI actions:
Reboot, CancelReboot, KillProcess, FilePut, FileRemove, and
FactoryReset.
Every action targets exactly one CiscoDevice and must set spec.confirm to
the target device name. The spec is immutable after creation, the request must
contain exactly the args block matching spec.action.kind, and a Running
action is not dispatched a second time after controller restart. This gives the
operation an audit trail without turning transient controller restarts into
duplicate destructive RPCs.
FilePut is intentionally ConfigMap-backed in the current API: the bytes come
from binaryData["content"] in a same-namespace ConfigMap. File write/remove
paths must use IOS-XE filesystem prefixes such as flash:, bootflash:,
harddisk:, usbflash0:, or usbflash1:.
Software Lifecycle
IOSXESoftwareUpgrade manages the device OS lifecycle as an auditable
Kubernetes object. Operators provide exactly one image source:
| Source | Required fields | Use case |
|---|---|---|
| URL | imageSource.url and imageSource.sha256 |
Fetch an image from http, https, tftp, ftp, scp, or sftp and verify the digest. |
| URL with credentials | URL fields plus imageSource.urlSecretRef |
Fetch from authenticated FTP/SCP/SFTP sources. SCP/SFTP can use knownHosts unless the URL opts out with insecureSkipHostKey=true. |
| ConfigMap | imageSource.configMapRef |
Stage small test artifacts from Kubernetes data. This is not for production-sized IOS-XE images. |
| Local path | imageSource.localPath |
Activate an image already present on device storage. |
For localPath, use localPathSHA256 when the device can report file hashes
through gNOI File.Get. Without that hash, CVK can activate a staged image but
cannot verify the local file before activation.
The upgrade strategy controls activation:
| Strategy | Behavior |
|---|---|
Reload |
Default. Calls gNOI OS.Activate with reboot allowed, then waits for the device to return and verifies the running version. |
ISSU |
Requests the normal activate path, then asserts after verify that the device selected the ISSU path. IOS-XE still makes the final choice based on platform and version compatibility. |
NoReboot |
Calls gNOI OS.Activate with NoReboot=true, stages the image for a later reload, and ends as Succeeded. Trigger the reload separately through IOSXEOperationalAction when ready. |
The normal lifecycle is:
| Phase | Meaning |
|---|---|
Pending |
CR accepted; no device operation started yet. Maintenance windows are checked here. |
Resolving |
Image source and target version are resolved. |
Transferring |
Image is copied or staged when needed. |
TransferInterrupted |
Transfer failed with a retryable error and can re-enter Transferring. |
Validating |
Staged image and preflight requirements are checked. |
Activating |
gNOI OS activation is requested. |
AwaitingReachability |
Device may be rebooting after activation. |
Verifying |
Running version and activation result are verified. |
RollingBack |
Previous version is being re-activated after a verify mismatch. |
Succeeded |
Requested version is active and verified. |
Terminal failure phases include Failed, PreflightFailed,
ValidationFailed, RolledBack, RebootTimeout, and Cancelled.
OS.Activate reboots the device when the chosen strategy requires it; CVK does
not issue a separate System.Reboot after activation. With rollback enabled,
CVK re-activates the previously observed running version if post-activation
verification does not match the requested target.
Important defaults:
| Field | Default | Notes |
|---|---|---|
strategy |
Reload |
Use NoReboot when activation should stage only. |
rollbackOnFailure |
true |
Attempts to restore the previously observed version after verify mismatch. |
resumePolicy |
Retry |
Abort makes transfer interruptions terminal. |
maxRetries |
3 |
Applies to transfer retries. |
rebootTimeoutSeconds |
1800 |
Controls how long AwaitingReachability waits. |
targetVersion accepts IOS-XE version shapes such as 17.15.01a,
26.01.01, 26.01.01.0.340, and 17.18.02.0.4112.1766116039. Verification
uses a prefix-aware comparison, so operators may use the shortest unambiguous
form for the staged image.
Upgrade Manifest Walkthrough
Each upgrade is a single Kubernetes CR. Create one manifest per device and apply it when ready — the upgrade begins immediately and the CR records the full audit trail.
Preparing the image
Before creating the manifest, collect the image hash. Use the sha256 digest
for imageSource.sha256. The MD5 is included as a human-reference annotation
but CVK uses sha256 exclusively for verification:
# On a Linux host that holds the image:
sha256sum cat9k_iosxe.26.01.01.SPA.bin
md5sum cat9k_iosxe.26.01.01.SPA.bin
stat --format='%s' cat9k_iosxe.26.01.01.SPA.bin
Serve the image from a source the device can reach — TFTP and HTTP are the most common options in lab and production environments respectively.
Example — TFTP upgrade (Catalyst 9300, IOS-XE 26.01.01)
This is a real-world production manifest. The in-cluster TFTP service
(rust-tftp-otel-headless) serves images to devices over the management
network. Labels and annotations carry out-of-band metadata that is useful
for audit queries, dashboards, and GitOps tooling — they are not processed
by CVK.
# 9300-4 IOS-XE upgrade manifest for Cisco Virtual Kubelet software lifecycle.
#
# Target device:
# CiscoDevice: default/cat9000-4
# Management IP: 198.51.100.103
# Hardware: C9300-24P, serial FOC2416U0MV
#
# Image source served by the in-cluster TFTP service:
# tftp://rust-tftp-otel-headless.ng-pnp.svc.cluster.local:6969/images/cat9k_iosxe.26.01.01.SPA.bin
# size: 1260618344 bytes
# md5: fd4a41c41a7de1a9d907c4f35f46e334
# sha256: 7de3c6875e3c1c96d5920e8542c72b1bcb5d913d99645ef5687f44dd4024cdf4
#
# Apply when ready to perform the upgrade/reload:
# kubectl apply -f 9300-4-upgrade-26.01.01.yaml
# kubectl get iosxesoftwareupgrade upgrade-cat9000-4-26-01-01 -n default -o wide -w
apiVersion: ops.cisco.vk/v1alpha1
kind: IOSXESoftwareUpgrade
metadata:
name: upgrade-cat9000-4-26-01-01
namespace: default
labels:
app.kubernetes.io/part-of: lifecycle-mgmt
lifecycle.cisco.com/device: cat9000-4
lifecycle.cisco.com/operation: iosxe-upgrade
annotations:
lifecycle.cisco.com/source-protocol: tftp
lifecycle.cisco.com/tftp-url: tftp://rust-tftp-otel-headless.ng-pnp.svc.cluster.local:6969/images/cat9k_iosxe.26.01.01.SPA.bin
lifecycle.cisco.com/image-file: cat9k_iosxe.26.01.01.SPA.bin
lifecycle.cisco.com/image-size-bytes: "1260618344"
lifecycle.cisco.com/image-md5: fd4a41c41a7de1a9d907c4f35f46e334
lifecycle.cisco.com/image-sha256: 7de3c6875e3c1c96d5920e8542c72b1bcb5d913d99645ef5687f44dd4024cdf4
spec:
deviceRef:
name: cat9000-4
imageSource:
url: tftp://rust-tftp-otel-headless.ng-pnp.svc.cluster.local:6969/images/cat9k_iosxe.26.01.01.SPA.bin
sha256: 7de3c6875e3c1c96d5920e8542c72b1bcb5d913d99645ef5687f44dd4024cdf4
targetVersion: 26.01.01
strategy: Reload
rollbackOnFailure: false
resumePolicy: Retry
maxRetries: 3
rebootTimeoutSeconds: 3600
Field notes:
targetVersion: 26.01.01— shortest unambiguous prefix. CVK's prefix-aware comparison will match26.01.01.0.340or26.01.01.0.340.1766116039.strategy: Reload— OS.Activate is called with reboot allowed; CVK waits for the device to come back (up torebootTimeoutSeconds) and then verifies the running version.rollbackOnFailure: false— the device is left on whatever version it landed on after a verify mismatch. Set totruein environments where automatic rollback is preferred.rebootTimeoutSeconds: 3600— allow a full hour for the device to reload. This is appropriate for Catalyst 9000; set to 1800 for faster platforms.
Monitoring the upgrade
# Watch the phase progress in real time:
kubectl get iosxesoftwareupgrade upgrade-cat9000-4-26-01-01 -n default -o wide -w
NAME DEVICE PHASE TARGET AGE
upgrade-cat9000-4-26-01-01 cat9000-4 Pending 26.01.01 0s
upgrade-cat9000-4-26-01-01 cat9000-4 Resolving 26.01.01 2s
upgrade-cat9000-4-26-01-01 cat9000-4 Transferring 26.01.01 8s
upgrade-cat9000-4-26-01-01 cat9000-4 Validating 26.01.01 4m22s
upgrade-cat9000-4-26-01-01 cat9000-4 Activating 26.01.01 4m35s
upgrade-cat9000-4-26-01-01 cat9000-4 AwaitingReachability 26.01.01 4m38s
upgrade-cat9000-4-26-01-01 cat9000-4 Verifying 26.01.01 16m14s
upgrade-cat9000-4-26-01-01 cat9000-4 Succeeded 26.01.01 16m31s
# See full status detail including conditions:
kubectl describe iosxesoftwareupgrade upgrade-cat9000-4-26-01-01
...
Status:
Phase: Succeeded
Target Version: 26.01.01
Running Version: 26.01.01.0.340
Conditions:
Type Status Reason
---- ------ ------
TransferComplete True ImageTransferred
ValidationPassed True PrefligthOK
ActivationComplete True OSActivated
VerificationPassed True VersionMatch
Events:
Normal PhaseTransition 16m Transferring → Validating: image hash verified
Normal PhaseTransition 11m Activating → AwaitingReachability: OS.Activate sent
Normal PhaseTransition 0s Verifying → Succeeded: running 26.01.01.0.340
Example — local flash path (image already on device)
Use localPath when the image has been pre-staged to device storage (e.g.
via ZTP or a previous TFTP transfer):
spec:
deviceRef:
name: cat9000-3
imageSource:
localPath: flash:cat9k_iosxe.17.18.02.SPA.bin
localPathSHA256: a1b2c3d4e5f6... # optional; from gNOI File.Get
targetVersion: 17.18.02
strategy: Reload
rollbackOnFailure: true
rebootTimeoutSeconds: 1800
Example — stage only, reload later (NoReboot)
NoReboot stages the image for activation without triggering an immediate
reload. Schedule the reload separately through IOSXEOperationalAction during
a maintenance window:
spec:
deviceRef:
name: cat9000-5
imageSource:
url: tftp://rust-tftp-otel-headless.ng-pnp.svc.cluster.local:6969/images/cat9k_iosxe.26.01.01.SPA.bin
sha256: 7de3c6875e3c1c96d5920e8542c72b1bcb5d913d99645ef5687f44dd4024cdf4
targetVersion: 26.01.01
strategy: NoReboot # stage only; does not reboot
rebootTimeoutSeconds: 0 # not applicable for NoReboot
The CR ends in Succeeded once the image is staged and activated
(without reload). Trigger the reload during maintenance:
apiVersion: ops.cisco.vk/v1alpha1
kind: IOSXEOperationalAction
metadata:
name: reload-cat9000-5-maint-window
namespace: default
spec:
deviceRef:
name: cat9000-5
confirm: cat9000-5 # must match deviceRef.name exactly
action:
kind: Reboot
reboot:
message: "Scheduled maintenance window reload"
delay: 0
Operator Workflow
- Confirm the device exposes the gNxI listener (
gnxi serveris enabled). - Enable the software upgrade gate on the per-device VK pod via Helm
(
gnoi.enableSoftwareUpgrade: true) or the env varCISCO_VK_ENABLE_IOSXE_SOFTWARE_UPGRADE=1. - Grant RBAC: upgrade operators need
create/get/watchonIOSXESoftwareUpgrade. Read-only users getDeviceOperationonly. - Compute the image sha256 and prepare the manifest (see examples above).
- Apply the manifest and monitor with
kubectl get iosxesoftwareupgrade -w. - After
Succeeded, delete the CR when the audit record is no longer needed, or retain it as an immutable upgrade record.
For DeviceOperation show-command and diagnostic examples see the
Operations Runbook.