In today’s rapidly evolving tech landscape, modern systems like Kubernetes are often seen as the cutting edge of infrastructure and application management. However, even in this world of containerization and dynamic scaling, there are moments when these advanced technologies need to interface with more traditional systems and practices.
One such area is the need for predictable network connectivity, especially when dealing with legacy firewalls and external services that require traffic to originate from a specific IP address. As we face these requirements multiple times at Nine, we had to come up with a solution. Let’s have a closer look into our new static egress feature.
Why do we need a solution for this?
When we designed our NKE (Nine Kubernetes Engine) product, two of the main goals were automation and the ability to offer self-service. Kubernetes and the corresponding add-ons already provide the foundation for that. For example, the cluster autoscaler adds/removes nodes based on the resource requirements of the cluster running workloads. One thing though, which is not really taken care of by Kubernetes out of the box, is maintaining the underlying nodes.
Traditionally, we update the installed software and the Linux kernels of our VMs with the help of a configuration system and the package manager of the underlying Linux distribution. This also involves upgrades from one major distribution release to the next (e.g. Ubuntu 22.04 to 24.04), which needs a lot of preparation and work in advance. When we started with NKE, we did not want to go down that path and searched for different solutions.
Luckily, when designing NKE, we already had quite some experience with managing Google Kubernetes Engine (GKE) clusters, and have been generally very happy with how node upgrades are done there. In a GKE cluster, node software itself is not updated in an already running instance, but instead machines are replaced with newer, updated ones, one after another. Our implementation of this workflow uses Flatcar Linux, which provides a rolling release Linux distribution with enhanced security. One of the added security mechanisms is an immutable file system that doesn’t allow changes to the already installed software. This prevents version and configuration drift.
But let’s get back to the upgrade mechanism itself. As the sequential upgrade workflow first adds a new machine before removing the old one, both machines are running in parallel for a short amount of time. One consequence of this is that both nodes need to use different IP addresses, which in turn influences the workloads running in the Kubernetes cluster, as the outgoing IP address for cluster external traffic depends on the IP of the node itself (workload
initiated traffic to external targets makes use of a node source NAT).
After we implemented and introduced our node upgrade mechanism, some of our customers asked for the IP addresses of their NKE cluster nodes as they wanted to configure them on external firewalls. But the node IP addresses couldn’t really be predicted anymore, as they might change during the next maintenance window. Even if they hadn’t changed during maintenance, using cluster autoscaled node pools could lead to new nodes with new IP addresses as well. It was clear that the dynamic world of Kubernetes doesn’t really fit into the already existing, more static environments.
For quite some time, the only solution to this problem was to allow all NKE subnets on external firewalls. Although this lowers the attack risk for external services, I think one can agree that this was not really a satisfying solution. Moreover, we have to add new NKE subnets over time, which complicates things even more.
How did we solve it?
When we investigated how this problem can be solved, we noticed that our CNI provider Cilium already provided a feature called Egress Gateway. The documentation describes the feature like this:
“The egress gateway feature routes all IPv4 connections originating from pods and destined to specific cluster-external CIDRs through particular nodes, from now on called ‘gateway nodes’. When the egress gateway feature is enabled and egress gateway policies are in place, matching packets that leave the cluster are masqueraded with selected, predictable IPs associated with the gateway nodes.”
This feature could pretty much solve our problem, but there were still some issues which we needed to tackle.
First, the “gateway node” might just disappear during a maintenance window (as it gets replaced) and so another node has to take over the “gateway node” role. This new “gateway node” might have an IP address from a totally different subnet than the previous “gateway node”. So it was clear that we cannot use IP addresses from the already existing NKE subnets as static egress IPs.
Second, the IP address to use as egress IP needs to be configured on the “gateway node” by some mechanism. Cilium doesn’t set it automatically, but instead requires it to be already set.
To solve the first problem, we are using BGP (Border Gateway Protocol) to announce routes for special, independent egress IPs to our routers. We are using the same mechanism already for Kubernetes services of the type loadbalancer (for ingressing traffic) in our NKE clusters. It was therefore logical to use the same system again. This allows the static egress IPs to be totally independent from the subnets which we use for our NKE nodes.
Once we finished the investigation and did some preliminary tests, we knew what we had to do. The task was to write an agent that runs on the control-plane nodes (we run 3 of them) in every NKE cluster and selects one of the nodes to be the “gateway node”. It then adds a specific Kubernetes label to that node, does all the BGP work to establish a proper routing for a selected static egress IP and configures that IP on the node itself. Once the selected egress node gets removed during maintenance (or just doesn’t work anymore because of other issues), the agent chooses a different existing node and does a reconfiguration. Luckily, there are existing solutions for leader election (the leader is the selected egress node) which we were able to use in Kubernetes already. After a month of work and tests, the agent was finished.
The rest of the static egress feature consisted of integrating Cilium’s “egress gateway” and our agent configuration into one setup workflow and exposing it via our API. Because Deploio is also based on NKE, it was easy to make the static egress feature available to it. The same goes for vClusters. Once the documentation was ready, it was time to make the feature available to customers.
Running the feature in production for quite some time now, it has proven to be a reliable solution to offer predictable egress identities.