Architecture
This document describes the technical architecture of Cisco Virtual Kubelet.
Overview
Cisco Virtual Kubelet implements the Virtual Kubelet provider interface so Kubernetes can treat Cisco IOS-XE devices as compute nodes. Each device appears as a node in the cluster; pods scheduled to that node are deployed as IOx App-Hosting containers on the device over RESTCONF.
The two-binary split
Everything the project does comes out of a single cisco-vk CLI with two subcommands:
cisco-vk manager— the Kubernetes controller. WatchesCiscoDevicecustom resources and, for each one, creates a ConfigMap + Deployment. It knows nothing about individual devices; it just reconciles CR state into Kubernetes resources.cisco-vk run— the Virtual Kubelet provider. One process per device. Reads its config, registers a virtual node in the cluster, and drives the device via RESTCONF when pods come and go.
This split keeps device credentials and per-device logic out of the controller. The controller holds minimal cluster-level RBAC; each VK pod holds only the credentials for its one device.
Component architecture
%%{init: {'theme': 'base', 'themeVariables': {'background': '#ffffff', 'mainBkg': '#f8fafc', 'clusterBkg': '#f1f5f9', 'clusterBorder': '#94a3b8', 'lineColor': '#64748b', 'edgeLabelBackground': '#f8fafc', 'titleColor': '#1e293b'}}}%%
flowchart TB
subgraph Cluster["Kubernetes cluster"]
API["API server"]
CRD["CiscoDevice CR"]
Controller["CiscoDevice controller / cisco-vk manager"]
CM["ConfigMap / device config"]
Dep["Deployment / device VK"]
VK["VK pod / cisco-vk run"]
Node["Virtual node"]
Pod["User pod"]
end
subgraph IOSXE["Cisco IOS-XE device"]
IOx["IOx platform"]
Container["App-hosting container"]
end
API --> CRD
CRD --> Controller
Controller --> CM
Controller --> Dep
Dep --> VK
VK --> Node
Pod -->|"scheduled to"| Node
VK -->|"RESTCONF over HTTPS"| IOx
IOx --> Container
style Controller fill:#ede9fe,stroke:#7c3aed,color:#3b0764
style VK fill:#dbeafe,stroke:#2563eb,color:#1e3a8a
style IOx fill:#dcfce7,stroke:#16a34a,color:#14532d
style Container fill:#d1fae5,stroke:#059669,color:#064e3b
style API fill:#f1f5f9,stroke:#64748b,color:#1e293b
style CRD fill:#f1f5f9,stroke:#64748b,color:#1e293b
style CM fill:#f1f5f9,stroke:#64748b,color:#1e293b
style Dep fill:#f1f5f9,stroke:#64748b,color:#1e293b
style Node fill:#f1f5f9,stroke:#64748b,color:#1e293b
style Pod fill:#e2e8f0,stroke:#94a3b8,color:#334155
Core components
AppHostingProvider
Implements the Virtual Kubelet nodeutil.Provider interface — the main entry point for pod operations.
| Method | Purpose |
|---|---|
CreatePod(ctx, pod) |
Delegates to driver.DeployPod, then forces a node status update |
UpdatePod(ctx, pod) |
Oper-state-aware redeploy: skips apps already RUNNING; reinstalls apps that are stuck in unexpected states |
DeletePod(ctx, pod) |
Delegates to driver.DeletePod then forces a node status update |
GetPod(ctx, ns, name) |
Fast path: informer cache. Fallback: device query for cleanup |
GetPodStatus(ctx, ns, name) |
Queries live pod status from the device; returns PullingImage waiting state while a copy-then-install fallback is in progress |
GetPods(ctx) |
Lists all managed pods discovered on the device |
GetStatsSummary(ctx) |
Returns Kubernetes stats/summary data (see Observability) |
GetMetricsResource(ctx) |
Returns Prometheus metrics (see Observability) |
Operations that don't make sense for app-hosting — RunInContainer, AttachToContainer, GetContainerLogs, PortForward — return HTTP 501.
AppHostingNode
Implements node.NodeProvider for node registration, heartbeat, and status.
Ping()— throttled to a 30-second minimum; triggerssyncNodeStatusasynchronouslyNotifyNodeStatus(cb)— fires the callback whenever node status changes (labels, annotations, capacity, conditions)ForceStatusUpdate()— called after every pod lifecycle event so resource accounting stays fresh
The node's Labels include standard topology (topology.kubernetes.io/zone=cisco-iosxe, region=cisco-iosxe) plus type=virtual-kubelet. Node annotations are populated dynamically on every status sync from the driver's TopologyProvider data — see Observability → Node annotations.
Conditions published:
Ready— true when IOx is enabled on the deviceDiskPressure— true when device storage < 5% available
Driver factory
Selects a driver based on spec.Driver:
XE→ IOS-XE driver (production)FAKE→ mock driver for testingXR,NXOS,OPENCONFIG→ placeholders, currently unsupported
Driver interfaces
Every driver must implement CiscoKubernetesDeviceDriver:
type CiscoKubernetesDeviceDriver interface {
GetDeviceResources(ctx) (*v1.ResourceList, error)
GetDeviceInfo(ctx) (*common.DeviceInfo, error)
DeployPod(ctx, pod)
UpdatePod(ctx, pod)
DeletePod(ctx, pod)
GetPodStatus(ctx, pod) (*v1.Pod, error)
ListPods(ctx) ([]*v1.Pod, error)
GetGlobalOperationalData(ctx) (*common.AppHostingOperData, error)
}
Drivers may additionally implement the optional TopologyProvider interface — this is what enables OTEL topology export, the cisco_device_cdp_*/ospf_*/interface_* metrics, and the cisco.io/* node annotations.
type TopologyProvider interface {
GetCDPNeighbors(ctx) ([]common.CDPNeighbor, error)
GetOSPFNeighbors(ctx) ([]common.OSPFNeighbor, error)
GetInterfaceStats(ctx) ([]common.InterfaceStats, error)
GetInterfaceIPs(ctx) ([]common.InterfaceIP, error)
GetHostedApps(ctx) ([]common.HostedApp, error)
}
The IOS-XE driver implements it. Drivers without topology support still work — their VKs just skip the optional metrics/traces/annotations.
IOS-XE driver — internal layering
internal/drivers/iosxe/ is organised by responsibility. Each layer is independently testable.
| File | Responsibility |
|---|---|
driver.go |
Driver construction, marshallers, config hooks, recovery state helpers |
device.go |
Device-level queries (connectivity, resources, device info) |
client.go |
App-hosting RPCs, image delivery (device-native HTTP pull + copy-then-install fallback), Kubernetes event emission |
reconciler.go |
App lifecycle state machine (see below) |
pod_lifecycle.go |
DeployPod / DeletePod / GetPodStatus / ListPods |
pod_transforms.go |
Pod.Spec → IOS-XE AppHostingConfig |
status_transforms.go |
Device oper-data → Pod.Status |
ip_discovery.go |
Pod IP discovery (oper-data first, ARP table fallback) |
topology.go |
TopologyProvider implementation — CDP, OSPF, interfaces |
models.go |
YANG structs, auto-generated via ygot |
IOS-XE Config Driver and NetAsCode
The IOS-XE configuration plane is separate from App Hosting. IOSXEConfig
resources carry NetAsCode-shaped intent, and the configdriver resolves the same
kind of hierarchy NetAsCode uses: defaults, device groups, interface groups,
templates, and per-device configuration. The resolved intent remains plain
YAML/JSON data until a family writer owns it.
Family writers are the translation boundary. They take canonical NetAsCode
fields, apply release-aware overrides, and emit transport.Op values addressed
to IOS-XE YANG paths. The engine then validates and applies those operations
over RESTCONF, NETCONF, or gNMI.
IOSXEConfig source
-> intent resolver
-> family writer
-> version override table
-> YANG validation boundary
-> RESTCONF, NETCONF, or gNMI transport
-> IOS-XE device
-> status, revision, and apply log
The validation boundary is deliberately device-facing. CONFIG_YANG_VALIDATION
controls it:
| Value | Behaviour |
|---|---|
disabled |
no validation gate; default for backward compatibility |
warn |
log validation failures and continue |
strict |
fail before mutation |
This preserves NetAsCode as the stable public model while giving CVK a place to use release-specific ygot/ytypes validation as those generated model packages are added.
Data flow
Controller reconciliation
sequenceDiagram
actor User
participant API as K8s API
participant Ctrl as CiscoDevice Controller
participant CM as ConfigMap
participant Dep as Deployment
participant VK as VK Pod
User->>API: kubectl apply ciscodevice.yaml
API->>Ctrl: Reconcile event
Ctrl->>Ctrl: Render device config (strip password + secretRef)
Ctrl->>CM: Create/Update {name}-config
Ctrl->>Dep: Create or update {name}-vk with VK_DEVICE_PASSWORD from Secret
Dep->>VK: Start pod
VK->>API: Register virtual node
The controller never persists credentials. password and credentialSecretRef are both stripped from the DeviceSpec before it is marshalled into the ConfigMap. Credentials reach the VK pod via a valueFrom.secretKeyRef env var injected into the Deployment spec. See Security.
Pod creation flow
sequenceDiagram
actor User
participant API as Kubernetes API
participant VK as AppHostingProvider
participant Drv as IOS-XE driver
participant Dev as IOS-XE device
User->>API: Create Pod targeting the virtual node
API->>VK: CreatePod
VK->>Drv: DeployPod
Drv->>Dev: POST app-hosting config with RESTCONF
alt Flash package already on the device
Drv->>Dev: Install from flash path
Dev-->>Drv: DEPLOYED or RUNNING
else HTTP URL supported by the device
Drv->>Dev: Install from HTTP URL
Dev-->>Drv: DEPLOYED
else HTTP URL requires copy fallback
Drv->>Dev: Copy image to flash
Note over VK,Drv: Pod status reports PullingImage while the copy runs
Drv->>Dev: Install from copied flash path
Dev-->>Drv: DEPLOYED
end
opt App is not already running
Drv->>Dev: RPC activate
Drv->>Dev: RPC start
Dev-->>Drv: RUNNING
end
Drv->>Dev: Query oper-data for pod IP
alt IP in oper-data
Dev-->>Drv: IPv4
else fallback
Drv->>Dev: Query ARP for container MAC
Dev-->>Drv: IPv4
end
Drv-->>VK: Pod ready
Pod deletion flow
Reverse of creation: RUNNING → stop → ACTIVATED → deactivate → DEPLOYED → uninstall → config DELETE. If a state is skipped (e.g. app already stopped), the reconciler picks up from wherever the observed state actually is.
App lifecycle state machine
The IOS-XE driver's reconciler (reconciler.go) drives each app toward its desired state by observing live state and issuing a single RPC per pass.
stateDiagram-v2
[*] --> NoConfig
NoConfig --> INSTALLING: POST config + install RPC
INSTALLING --> DEPLOYED: validation passes
INSTALLING --> Error: pkg-policy invalid + notification
DEPLOYED --> ACTIVATED: activate RPC
ACTIVATED --> RUNNING: start RPC
STOPPED --> RUNNING: start RPC
RUNNING --> [*]
Error --> [*]
RUNNING --> STOPPED: stop RPC (reverse)
STOPPED --> DEPLOYED: deactivate RPC
DEPLOYED --> NoConfig: uninstall + DELETE config
Key behaviours:
INSTALLING— normally transient. The reconciler waits unlesspkg-policy = iox-pkg-policy-invalidand a confirming install notification has been received from the device. During the first few seconds of every install the device reportspkg-policy = invalidas a YANG default; waiting for the notification prevents this transient value from being treated as a fatal error. Ifspec.allowUnsignedApps = true, the check is skipped entirely — use this when you're running unsigned packages (e.g. your own custom application builds, or devices not enforcing signed-verification).STOPPED— restartable directly viastart(no re-activate needed).- No oper data with config present — reconciler re-issues the install RPC.
- Error — surfaces as Pod
Failedwith reasonPackagePolicyInvalidand a message from the device's notification.
Reverse path (desired = Deleted)
| Observed state | Action |
|---|---|
RUNNING |
stop |
ACTIVATED or STOPPED |
deactivate |
DEPLOYED |
uninstall |
| No oper data | DELETE config |
Pod status and discovery
Pod-to-app naming
App IDs follow cvk{index}_{podUID} where the UID has hyphens stripped. Up to 10 containers per pod (index 0-9). Example: cvk00000_a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4.
This lets the driver rebuild pod identity when listing apps from the device — the UID is embedded in the app ID, and pod metadata (name, namespace, container name) is stored as key-value labels in the app's --run-opts:
io.kubernetes.pod.name=<name>
io.kubernetes.pod.namespace=<namespace>
io.kubernetes.pod.uid=<uid>
io.kubernetes.container.name=<container>
Container state mapping
| IOS-XE state | Container state | Ready |
|---|---|---|
RUNNING |
Running | true |
DEPLOYED, ACTIVATED, INSTALLING |
Pending | false |
STOPPED |
Terminated (exit 0) | false |
Error / pkg-policy-invalid confirmed |
Terminated with exit code + reason | false |
IP discovery
- Query app-hosting oper-data for
ipv4-addr - Fallback: ARP table lookup by the container's MAC address
- Default:
0.0.0.0(no IP yet)
Pod recovery loop
A background goroutine in cisco-vk run periodically lists all pods on this node in Failed phase and resets them to Pending if their status.Reason is one of:
NotFoundProviderFailedPackagePolicyInvalid
This lets the VK controller reprocess the pod without manual kubectl delete/apply. The recovery loop uses exponential backoff: 15 s → 30 s → 60 s → 120 s → 240 s → 300 s (5 min cap). On any successful recovery the interval resets to 15 s.
Pods with a DeletionTimestamp are skipped — the normal deletion path owns those.
Networking
Three interface modes are supported (see Configuration for field reference, and the platform-specific pages Catalyst 8000V / Catalyst 9000 for device-side setup):
| Mode | IP source | Use case |
|---|---|---|
| VirtualPortGroup | DHCP pool on VPG or static from podCIDR |
Catalyst 8000V; containers share a private L3 to the device |
| AppGigabitEthernet | DHCP (access) or DHCP in VLAN (trunk) | Catalyst 9000; containers on a dedicated front-panel interface with optional VLAN tagging |
| Management | DHCP or static on the management interface | Containers sharing the device management network |
DHCP flow (VirtualPortGroup example)
┌────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Container │───▶│ VirtualPortGroup0│───▶│ DHCP Pool │
│ (eth0) │ │ gateway .254 │ │ /24 subnet │
│ │◀───│ │◀───│ assigns IP │
└────────────┘ └──────────────────┘ └──────────────────┘
Provider-side IP discovery always happens after deploy completes — it polls oper-data first, then falls back to ARP, then retries on the next reconcile.
Observability
See Observability for the full reference. At a high level:
- Prometheus metrics are served on the standard kubelet
/metrics/resourceendpoint and cover device CPU, memory, storage, per-interface rates/state, and CDP/OSPF neighbor counts. - Kubernetes stats/summary is served on
/stats/summary, enablingkubectl top node. - OpenTelemetry topology traces are emitted on a configurable interval (default 60 s) and include a device root span with child link spans (one per CDP/OSPF neighbor) and app spans (one per hosted container).
- Node annotations (
cisco.io/router-id,cisco.io/hostname,cisco.io/cdp-neighbor-count,cisco.io/ospf-neighbor-count,cisco.io/protocols) make topology context queryable viakubectl describe node.
gNOI and software lifecycle
gNOI operations and IOS-XE software lifecycle management have their own operator-facing section: gNOI and Software Lifecycle. At the architecture level, the important boundary is that read-only operations, write-class actions, and software upgrades use separate CRDs, separate runtime gates, and separate RBAC grants. Read-only gNOI access does not implicitly enable reboot, file write, factory reset, or OS activation.
The per-device gNOI client uses a workload-classed gRPC connection pool so
small control RPCs, gNMI telemetry streams, and bulk OS/file transfers do not
block each other. The client also caches per-service capability probes: when a
device returns codes.Unimplemented, later calls to that service fail fast
with ErrServiceUnsupported instead of repeatedly attempting an unsupported
RPC.
RESTCONF endpoints
| Operation | Method | Endpoint |
|---|---|---|
| App config (create/list) | POST/GET | /restconf/data/Cisco-IOS-XE-app-hosting-cfg:app-hosting-cfg-data/apps |
| App config (delete) | DELETE | /restconf/data/Cisco-IOS-XE-app-hosting-cfg:app-hosting-cfg-data/apps/app={id} |
| App oper-data | GET | /restconf/data/Cisco-IOS-XE-app-hosting-oper:app-hosting-oper-data/app |
| Lifecycle RPCs (install/activate/start/stop/deactivate/uninstall) | POST | /restconf/operations/Cisco-IOS-XE-app-hosting-rpcs:app-* |
| Device version | GET | /restconf/data/Cisco-IOS-XE-native:native/version |
| CDP neighbors | GET | /restconf/data/Cisco-IOS-XE-cdp-oper:cdp-neighbor-details |
| OSPF neighbors | GET | /restconf/data/Cisco-IOS-XE-ospf-oper:ospf-oper-data |
| Interfaces | GET | /restconf/data/ietf-interfaces:interfaces |
| ARP table | GET | /restconf/data/Cisco-IOS-XE-arp-oper:arp-data |
YANG models
Cisco-IOS-XE-app-hosting-cfg— app configCisco-IOS-XE-app-hosting-oper— app runtime stateCisco-IOS-XE-app-hosting-rpcs— lifecycle RPCsCisco-IOS-XE-cdp-oper— CDP neighbor discoveryCisco-IOS-XE-ospf-oper— OSPF neighbor stateCisco-IOS-XE-arp-oper— ARP-based IP discovery
Project layout
cisco-virtual-kubelet/
├── api/v1alpha1/ # CiscoDevice API types (DeviceSpec, XEConfig, OTELConfig)
├── api/config/v1alpha1/ # IOS-XE config, telemetry, diagnostic API types
├── api/ops/v1alpha1/ # DeviceOperation, gNOI action, and software-upgrade API types
├── cmd/cisco-vk/ # Unified binary
│ ├── main.go # cobra root
│ ├── run.go # 'run' subcommand — the VK provider
│ └── manager.go # 'manager' subcommand — the controller
├── charts/cisco-virtual-kubelet/ # Helm chart (controller + RBAC + CRDs)
├── config/crd/ # Generated CRD manifests
├── internal/
│ ├── config/ # YAML/viper loader
│ ├── controller/ # CiscoDevice reconciler
│ ├── provider/ # AppHostingProvider, metrics, OTEL topology
│ └── drivers/ # Driver implementations
│ ├── common/ # Shared types (DeviceInfo, CDPNeighbor, etc.)
│ ├── iosxe/ # IOS-XE driver (see layering table above)
│ └── fake/ # Mock driver for tests
├── examples/ # Example device configs and pod manifests
└── docs/ # This documentation