Do I benefit from the 96 vCPUs of a gcloud Compute Engine if in python multiprocessing.cpu_count() returns 64?

5/22/2018

I am running a python parallel CPU intensive task on Google Compute Engine. Hence, the greater the number of vCPUs I can run it on, the greater the speed.

I've read that there is no point in creating a multiprocessing pool with greater size than the number of available vCPUs, which makes sense, so I determine the size of my multiprocessing.dummy.Pool pool using multiprocessing.cpu_count().

I am running this script in a Pod, using gcloud Kubernetes Engine, and tested on machines with less than 96 vCPUs during development. The pool size automatically determined seemed always to match to the number of vCPUs. However, running it on a machine with 96 vCPUs, multiprocessing.cpu_count() returns 64 and not 96. I don't care setting that size manually to 96 but the question is, will I benefit from those extra 32 vCPUs if python is not "aware" of them?

The machine is a n1-highcpu-96 (96 vCPUs, 86.4 GB memory) running the Container-Optimized OS (cos). Python version is 3.6.3.

-- xEc
docker
google-kubernetes-engine
kubernetes
multiprocessing
python

1 Answer

8/1/2019

There is an answer in the message board that someone linked to in a comment to the question, however, it seems better to have the answer on this page, as well as some explanation.

The short answer: inside a pod, run grep -c ^processor /proc/cpuinfo - this number should agree with multiprocessing.cpu_count(). If it does, you can trust multiprocessing.cpu_count().

However, AFAICT, this identifies all the cores on the node and completely ignores the resource limits set in your Kubernetes deployment YAML. For example, your deployment file might contain:

spec:
  containers:
  - image: IMAGENAME
    name: LABEL
    ports:
    - containerPort: 5000
    resources:
      limits:
        cpu: 100m
        memory: 400M
      requests:
        cpu: 50m
        memory: 200M

In this article, the author gives the following function, which respects the resource limits (not requests):

import math
from pathlib import Path


def get_cpu_quota_within_docker():
    cpu_cores = None

    cfs_period = Path("/sys/fs/cgroup/cpu/cpu.cfs_period_us")
    cfs_quota = Path("/sys/fs/cgroup/cpu/cpu.cfs_quota_us")

    if cfs_period.exists() and cfs_quota.exists():
        # we are in a linux container with cpu quotas!
        with cfs_period.open('rb') as p, cfs_quota.open('rb') as q:
            p, q = int(p.read()), int(q.read())

            # get the cores allocated by dividing the quota
            # in microseconds by the period in microseconds
            cpu_cores = math.ceil(q / p) if q > 0 and p > 0 else None

    return cpu_cores

So, for the example YAML, the division yields 0.1, but b/c of the call to ceil, it returns 1.0. So what you may be looking for is something like the following (assuming that you have the above-defined function get_cpu_quota_within_docker defined):

import multiprocessing

from somewhere import get_cpu_quota_within_docker

docker_cpus = get_cpu_quota_within_docker()
cpu_count = docker_cpus if docker_cpus else multiprocessing.cpu_count()
-- Sam H.
Source: StackOverflow