Walk from K8s Service name resolution to load balancing to Pods by kube-proxy's iptables

kubernetes

A Kubernetes Service has an FQDN like my-svc.my-namespace.svc.cluster-domain.example, and when you make a request to it, it will reach the Pods pointed to by the selector. Let’s walk how this is done. Kubernetes version is 1.28 and the cluster is set up with EKS.

First, name resolution from the Service’s FQDN to the ClusterIP is done by querying kube-dns (coredns) according to the nameserver setting in /etc/resolv.conf of each Pod. By the way, this IP address itself is also the ClusterIP of the kube-dns Service, so it will be routed in the same way.

$ cat /etc/resolv.conf
search testapp.svc.cluster.local svc.cluster.local cluster.local ap-northeast-1.compute.internal
nameserver 172.20.0.10
options ndots:5

Service provides load balancing feature, which is realized by iptables. kube-proxy, which runs as a Daemonset, updates the iptables settings. From the name, you might imagine it as something like a proxy server, but since it does not actually receive traffic, it runs on very few resources, cpu: 100m. kube-proxy also has modes other than iptables, but I will not describe them this time.

iptables is a Netfilter configuration tool that performs packet filtering and NAT etc. You can check the nat table settings by following command on the host. The KUBE-SERVICES Chain contains KUBE-SVC with each Service’s ClusterIP as the destination and KUBE-NODEPORTS. KUBE-NODEPORTS is also connected to KUBE-SVC via KUBE-EXT. KUBE-SVC contains KUBE-SEP, which DNATs the IP address of each Pod, and these are selected randomly for load balancing. kube-proxy monitors the Kubernetes API and adds/removes rules for Healthy/Unhealthy Pods.

$ sudo iptables -t nat -L -n
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
...

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
KUBE-SVC-CE6JHM1SEKP18XIF  tcp  --  0.0.0.0/0            172.20.36.232        /* testapp/testapp-svc cluster IP */ tcp dpt:8080
...
KUBE-NODEPORTS  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

Chain KUBE-NODEPORTS (1 references)
target     prot opt source               destination
KUBE-EXT-W1EMVGLJN3U6KBK2  tcp  --  0.0.0.0/0            0.0.0.0/0            /* testapp/testapp2-svc */ tcp dpt:31218
...

Chain KUBE-EXT-W1EMVGLJN3U6KBK2 (1 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all  --  0.0.0.0/0            0.0.0.0/0            /* masquerade traffic for testapp/testapp2-svc external destinations */
KUBE-SVC-W1EMVGLJN3U6KBK2  all  --  0.0.0.0/0            0.0.0.0/0

Chain KUBE-SVC-CE6JHM1SEKP18XIF (2 references)
target     prot opt source               destination
KUBE-SEP-F3KEHKXHH6PPCNMD  all  --  0.0.0.0/0            0.0.0.0/0            /* testapp/testapp-svc -> 10.161.33.109:8080 */ statistic mode random probability 0.01351351338
KUBE-SEP-GONI7XQNZNJEEBEO  all  --  0.0.0.0/0            0.0.0.0/0            /* testapp/testapp-svc -> 10.161.33.116:8080 */ statistic mode random probability 0.01369863003
KUBE-SEP-C2L3SYA5JOE66TCB  all  --  0.0.0.0/0            0.0.0.0/0            /* testapp/testapp-svc -> 10.161.33.129:8080 */ statistic mode random probability 0.01388888899
...

Chain KUBE-SEP-F3KEHKXHH6PPCNMD (1 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all  --  10.161.33.109        0.0.0.0/0            /* testapp/testapp-svc */
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            /* testapp/testapp-svc */ tcp to:10.161.33.109:8080

References

kube-proxy入門

ネットワークの概要 | Google Kubernetes Engine (GKE) | Google Cloud