The Zero-Drift Frontier: Modern Edge Demands on Kubernetes

A man with a shaved head and a striped dark shirt smiles at the camera against a plain, light gray background.
The Zero-Drift Frontier: Modern Edge Demands on Kubernetes

Edge computing has come to the fore. Too often, edge computing gets confused with the Internet of Things (IoT), which refers to devices ranging from smart city sensors to kiosk computers to wind turbine telemetry units and such.

Today, the notion of edge has become a more high-stakes component of the enterprise IT stack. Where we used to think about edge devices as nice-to-have additions to the core enterprise server estate, these have now become first-class citizens in terms of their operational importance. When an edge device goes down today, a car fails to make it along the production line or a traffic light stops working during rush hour causing havoc in the city center.

Many in the technology industry now regard edge computing as the frontline for digital sovereignty. The use of edge today means running workloads outside of traditional datacenters in order to reduce latency and improve reliability. But life out in the real world (in the field) is tough; it’s rather like the moment you take your smartphone into an elevator and lose the signal, a disconnected edge device leaves a computing transaction hanging in limbo… and that’s just not acceptable for many modern businesses.

In a retail outfit or a manufacturing site, you simply cannot depend on 100% connectivity. If you think about the fact that in Tesla (where I worked previously) where the production line is engineered to enable the company to make a car per minute, every second the organization’s central management software system isn’t working (and this could be as a result of cloud connectivity issues or some type of misconfiguration or mismanagement issue) represents a significant portion of a car that the company is unable to make.

All of this leads us to something of a paradox. In a deployment like the Tesla car factory example, there are thousands of computing endpoints across the edge estate, along with the central cloud-native technology deployment that this type of operation necessitates. Bringing that all together is ripe for Kubernetes, the cloud orchestration technology that now represents something of a de facto standard for the technology industry. 

But traditional Kubernetes is a bear to manage. A bear is weighty, clunky, cumbersome, rarely agile and occasionally dangerous; Kubernetes at this level is too heavy in terms of its computing resource requirements and too dangerous to consider using as a platform for a fine-tuned, fragile edge deployment use case. Let’s use the example of a Raspberry Pi computer stationed on an outcrop on a coral reef or, closer to home, a kiosk computer in a rural drug store.

The traditional approach to scaling Kubernetes has been to create massive clusters and carve them into namespaces, a term we use to explain a virtual cluster within a physical cluster. The problems start to arise if we think about what happens if we have 100 tenants on the same cluster and then we need to try upgrading the whole cluster. In reality, it’s impossible to locate and coordinate a time to update, because someone is always using the system. We quickly get to a point where lifecycle management becomes impossible.

Software Snowflakes

All of which gives us something we can call the snowflake problem (as you know, every single one of them is different) where we’re running hundreds of edge locations that all look slightly different because they couldn’t be updated simultaneously. When a security patch or a new retail feature needs to roll out, some boxes restart, some hang and others are blocked by local firewalls. Without a centralized way to manage this drift, the edge becomes a graveyard of unpatched, out-of-sync hardware.

But there are ways forward, so let’s not give up hope. We can look at the evolution of zero-dependency distributions, where we have stripped Kubernetes down to a single binary. The footprint here becomes small enough to run in air-gapped environments, those locations with intermittent and inconsistent Internet connectivity, a Navy warship as an extreme example, or a remote sensor on an oil rig to bring it closer (but only slightly closer) to home.

To solve this management issue, organizations need a holistic control plane designed for distributed execution. The enterprise IT team is then able to manage centrally, but you execute on a distributed basis. The tech team will need visibility to see the outliers, so that they know that out of 500 locations which three didn’t update and why. Then, the remediation process can start as the team starts to understand whether an update failed due to a network issue, a botched restart or some other reason. Either way, they will need an audit trail to maintain compliance at scale.

Cloud Connectivity Conundrums

As the Broadcom-VMware acquisition sends shockwaves through IT departments, many CIOs are looking for an escape route to manage their increasingly hybrid multi-cloud computing deployments. While the knee-jerk reaction is to “move it all to the cloud”, that doesn’t always work because (as we have hopefully established here) cloud and connectivity are not always a given.

The thought of running Kubernetes at the edge is daunting to people running VMware right now, they think it’s not an option. But the challenge of ‘rolling your own’ is gone.

Today’s edge isn’t just about occasionally connected pipes, it’s about creating a standardized layer that allows a lean team to manage thousands of instances with the same ease as a single cloud cluster. Whether it’s a diagnostic tool plugged into a car’s OBD-II port or a 3D printer churning out Invisalign aligners, the goal is the same: robustness through simplification.

The Future: Performance Anywhere

Looking to the future, enterprise IT departments shouldering edge computing will operate with new goals defined by the new architectural constructs we are talking about here. By utilizing Cloud Native Computing Foundation (CNCF)-validated, open source foundations, companies can finally stop treating the edge like a specialized exception and start treating it like the robust, automated extension of the datacenter it was always meant to be.

Edge is no longer a place where software goes to get stuck; it’s where it goes to work.

#  #  #

Jerry Ibrahim, head of engineering at Mirantis, has more than 30 years of experience innovating in technology. He worked previously as IT CTO at VMware, and has held executive roles at Tesla, Align Technology, and Juniper Networks.

This article was submitted by Mirantis.