Kubernetespodoperator how to use cmds or cmds and arguments to run multiple commands

1/27/2022

I'm using GCP composer to run an algorithm and at the end of the stream I want to run a task that will perform several operations copying and deleting files and folders from a volume to a bucket I'm trying to perform these copying and deleting operations via a kubernetespodoperator. I'm having hardship finding the right way to run several commands using "cmds" I also tried using "cmds" with "arguments". Here is my KubernetesPodOperator and the cmds and arguments combinations I tried:

post_algo_run = kubernetes_pod_operator.KubernetesPodOperator(
    task_id="multi-coher-post-operations",
    name="multi-coher-post-operations",
    namespace="default",
    image="google/cloud-sdk:alpine",
    
    ### doesn't work ###
    cmds=["gsutil", "cp", "/data/splitter-output\*.csv",  "gs://my_bucket/data" , "&" , "gsutil", "rm", "-r", "/input"], 
    #Error:
        #[2022-01-27 09:31:38,407] {pod_manager.py:197} INFO - CommandException: Destination URL must name a directory, bucket, or bucket
        #[2022-01-27 09:31:38,408] {pod_manager.py:197} INFO - subdirectory for the multiple source form of the cp command.
    ####################

    ### doesn't work ###
    # cmds=["gsutil", "cp", "/data/splitter-output\*.csv",  "gs://my_bucket/data ;","gsutil", "rm", "-r", "/input"],
        # [2022-01-27 09:34:06,865] {pod_manager.py:197} INFO - CommandException: Destination URL must name a directory, bucket, or bucket
        # [2022-01-27 09:34:06,866] {pod_manager.py:197} INFO - subdirectory for the multiple source form of the cp command.
    ####################

    ### only preform the first command - only copying ###
    # cmds=["bash", "-cx"],
    # arguments=["gsutil cp /data/splitter-output\*.csv gs://my_bucket/data","gsutil rm -r /input"],                                    
        # [2022-01-27 09:36:09,164] {pod_manager.py:197} INFO - + gsutil cp '/data/splitter-output*.csv' gs://my_bucket/data
        # [2022-01-27 09:36:11,200] {pod_manager.py:197} INFO - Copying file:///data/splitter-output\Coherence Results-26-Jan-2022-1025Part1.csv [Content-Type=text/csv]...
        # [2022-01-27 09:36:11,300] {pod_manager.py:197} INFO - / [0 files][    0.0 B/ 93.0 KiB]                                                
        # / [1 files][ 93.0 KiB/ 93.0 KiB]
        # [2022-01-27 09:36:11,302] {pod_manager.py:197} INFO - Operation completed over 1 objects/93.0 KiB.
        # [20   22-01-27 09:36:12,317] {kubernetes_pod.py:459} INFO - Deleting pod: multi-coher-post-operations.d66b4c91c9024bd289171c4d3ce35fdd
    ####################


    volumes=[
        Volume(
            name="nfs-pvc",
            configs={
                "persistentVolumeClaim": {"claimName": "nfs-pvc"}
            },
        )
    ],
    volume_mounts=[
        VolumeMount(
            name="nfs-pvc",
            mount_path="/data/",
            sub_path=None,
            read_only=False,
        )
    ],
)
-- Amit Lipman
ariflow
google-cloud-composer
kubernetes
python

2 Answers

2/7/2022

I found a technic for running multiple commands. First I found the relations between Kubernetespodoperator cmds and arguments properties to Docker's ENTRYPOINT and CMD.

Kubernetespodoperator cmds overwrite the docker original ENTRYPOINT and Kubernetespodoperator arguments is equivalent to docker's CMD.

And so in order to run multiple commands from the Kubernetespodoperator I've used the following syntax: I've set the Kubernetespodoperator cmds to run bash with -c:

cmds=["/bin/bash", "-c"],

And I've set the Kubernetespodoperator arguments to run two echo commands separated by &:

arguments=["echo hello && echo goodbye"],

So my Kubernetespodoperator looks like so:

stajoverflow_test = KubernetesPodOperator(
    task_id="stajoverflow_test",
    name="stajoverflow_test",
    namespace="default",
    image="google/cloud-sdk:alpine",
    cmds=["/bin/bash", "-c"],
    arguments=["echo hello && echo goodbye"],
)
-- Amit Lipman
Source: StackOverflow

1/28/2022

For your first command you need to make sure that inside your docker you are able to reach the working directory that will allows you to find file /data/splitter-output\*.csv

"gsutil", "cp", "/data/splitter-output*.csv", "gs://my_bucket/data"

You can test your commands on your docker image by using docker RUN so you can verify if you are providing correctly the commands.

On your second statement if you are making reference to a path inside your docker image again use run to test it. If you are referring to google storage you have to provide the full path.

"gsutil", "rm", "-r", "/input"

Its worth to mention that ENTRYPOINT will run once container starts running as described on understand how cmd and entrypoint interact. As mention in the comment, if you look at the code cmds it replaces docker image ENTRYPOINT. It also recommends to follow the guidelines of Define a Command and Arguments for a Container

-- Betjens
Source: StackOverflow