Pod fails to allocate hugepages in kubernetes

7/27/2019

I'm running a pod in kubernetes, with hugepages allocated in host and hugepages defined in the pod. The kubernetes worker is in a VM. The VM (host) has huge pages allocated. The pod fails to allocate hugepages though. Application gets SIGBUS when trying to write to the first hugepage allocation.

the pod definition includes hugepages:

    securityContext:
      allowPrivilegeEscalation: true
      privileged: true
      runAsUser: 0
      capabilities:
        add: ["SYS_ADMIN", "IPC_LOCK"]
    resources:
      requests:
        intel.com/intel_sriov_netdevice : 2
        memory: 2Gi
        hugepages-2Mi: 4Gi
      limits:
        intel.com/intel_sriov_netdevice : 2
        memory: 2Gi
        hugepages-2Mi: 4Gi
    volumeMounts:
    - mountPath: /sys
      name: sysfs
    - mountPath: /dev/hugepages
      name: hugepage
      readOnly: false
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages
  - name: sysfs
    hostPath:
      path: /sys

The VM hosting the pod has hugepages allocated:

cat /proc/meminfo | grep -i hug
AnonHugePages:         0 kB
HugePages_Total:    4096
HugePages_Free:     4096
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

The following piece of code runs fine in the VM hosting the pod, I can see the hugepages files getting created in /dev/hugepages, also the HugePages_Free counter decreases while the process is running.

#include <stdio.h>
#include <sys/mman.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#define LENGTH (2UL*1024*1024)
#define FILE_NAME "/dev/hugepages/hugepagefile"
static void write_bytes(char *addr)
{
        unsigned long i;

        for (i = 0; i < LENGTH; i++)
                *(addr + i) = (char)i;
}
int main ()
{
   void *addr;
   int i;
   char buf[32];
   int fd;

   for (i = 0 ; i < 16 ; i++ ) {
           sprintf(buf, "%s_%d", FILE_NAME, i);
           fd = open(buf, O_CREAT | O_RDWR, 0755);
           addr = mmap((void *)(0x0UL), LENGTH, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_HUGETLB , fd, 0);

           printf("address returned %p \n", addr);

           if (addr == MAP_FAILED) {
                   perror("mmap ");
           } else {
                write_bytes(addr);
                //munmap(addr, LENGTH);
                //unlink(FILE_NAME);
           }
           close(fd);
   }
   while (1){}
   return 0;
}

But if I run the same code in the pod, I get a SIGBUS while trying to write to the first hugepage allocated.

Results on the VM (hosting the pod)

root@k8s-1:~# cat /proc/meminfo | grep -i hug
AnonHugePages:         0 kB
HugePages_Total:    4096
HugePages_Free:     4096
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
root@k8s-1:~# ./mmap  &
[1] 19428
root@k8s-1:~# address returned 0x7ffff7800000
address returned 0x7ffff7600000
address returned 0x7ffff7400000
address returned 0x7ffff7200000
address returned 0x7ffff7000000
address returned 0x7ffff6e00000
address returned 0x7ffff6c00000
address returned 0x7ffff6a00000
address returned 0x7ffff6800000
address returned 0x7ffff6600000
address returned 0x7ffff6400000
address returned 0x7ffff6200000
address returned 0x7ffff6000000
address returned 0x7ffff5e00000
address returned 0x7ffff5c00000
address returned 0x7ffff5a00000

root@k8s-1:~# cat /proc/meminfo | grep -i hug
AnonHugePages:         0 kB
HugePages_Total:    4096
HugePages_Free:     4080
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Results in the pod:

Program received signal SIGBUS, Bus error.
0x00005555555547cb in write_bytes ()
(gdb) where
#0  0x00005555555547cb in write_bytes ()
#1  0x00005555555548a6 in main ()
-- emartin
kubernetes

0 Answers