set net.ipv4.tcp_timestamps=0 in Pod

2/9/2020

I would like to set net.ipv4.tcp_timestamps=0 in Pods of my k8s cluster, However it gives me error:

sysctl: cannot stat /proc/sys/net/ipv4/tcp_timestamps: No such file or directory"

Same issue with container directly created with docker run command. Anyway to set the parameter in container/Pod? Thanks.

-- James Pei
kernel
kubernetes
pod
sysctl

1 Answer

2/10/2020

In Kubernetes sysctl have been grouped into safe and unsafe.

In addition to proper namespacing, a safe sysctl must be properly isolated between pods on the same node. This means that setting a safe sysctl for one pod

  • must not have any influence on any other pod on the node
  • must not allow to harm the node’s health
  • must not allow to gain CPU or memory resources outside of the resource limits of a pod.

By far, most of the namespaced sysctls are not necessarily considered safe. The following sysctls are supported in the safe set:

  • kernel.shm_rmid_forced,
  • net.ipv4.ip_local_port_range,
  • net.ipv4.tcp_syncookies.

By default all safe sysctls are enabled by default. All unsafe sysctls are disabled and need to be allowed manually by cluster admin on each node.

kubelet --allowed-unsafe-sysctls \
 'kernel.msg*,net.core.somaxconn' ...

For Minikube, this can be done via the extra-config flag:

minikube start --extra-config="kubelet.allowed-unsafe-sysctls=kernel.msg*,net.core.somaxconn"...

Only namespaced sysctls can be enabled this way.

This is mentioned on Enabling Unsafe Sysctls k8s documentation.

As for, Setting Sysctls for a Pod:

A number of sysctls are namespaced in today’s Linux kernels. This means that they can be set independently for each pod on a node. Only namespaced sysctls are configurable via the pod securityContext within Kubernetes.

The following sysctls are known to be namespaced. This list could change in future versions of the Linux kernel. - kernel.shm*, - kernel.msg*, - kernel.sem, - fs.mqueue.*, - The parameters under net.* that can be set in container networking namespace. However, there are exceptions (e.g., net.netfilter.nf_conntrack_max and net.netfilter.nf_conntrack_expect_max can be set in container networking namespace but they are unnamespaced).

Sysctls with no namespace are called node-level sysctls. If you need to set them, you must manually configure them on each node’s operating system, or by using a DaemonSet with privileged containers.

Use the pod securityContext to configure namespaced sysctls. The securityContext applies to all containers in the same pod.

This example uses the pod securityContext to set a safe sysctl kernel.shm_rmid_forced and two unsafe sysctls net.core.somaxconn and kernel.msgmax. There is no distinction between safe and unsafe sysctls in the specification.

apiVersion: v1
kind: Pod
metadata:
  name: sysctl-example
spec:
  securityContext:
    sysctls:
    - name: kernel.shm_rmid_forced
      value: "0"
    - name: net.core.somaxconn
      value: "1024"
    - name: kernel.msgmax
      value: "65536"
  ...

You may be interested in reading following questions on StackOverflow Pros and cons of disabling TCP timestamps and What benefit is conferred by TCP timestamp?.

-- Crou
Source: StackOverflow