I'm using a ffmpeg docker image from a KubernetesPodOperator() inside Airflow for extracting frames from a video.
It works fine, but I am not able to retrieve the frames stored: how can store the frames generated by the Pod directly into my file system (host-machine)?
Update:
From https://airflow.apache.org/kubernetes.html# I think I figured out that I need to work on the volume_mount
, volume_config
and volume
parameters, but still no luck.
Error message:
"message":"Not found: \"test-volume\"","field":"spec.containers[0].volumeMounts[0].name"
PV and PVC:
command kubectl get pv,pvc test-volume
gives:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/test-volume 10Gi RWO Retain Bound default/test-volume manual 3m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/test-volume Bound test-volume 10Gi RWO manual 3m
Code:
volume_mount = VolumeMount('test-volume',
mount_path='/',
sub_path=None,
read_only=False)
volume_config= {
'persistentVolumeClaim':
{
'claimName': 'test-volume' # uses the persistentVolumeClaim given in the Kube yaml
}
}
volume = Volume(name="test-volume", configs=volume_config)
with DAG('test_kubernetes',
default_args=default_args,
schedule_interval=schedule_interval,
) as dag:
extract_frames = KubernetesPodOperator(namespace='default',
image="jrottenberg/ffmpeg:3.4-scratch",
arguments=[
"-i", "http://www.jell.yfish.us/media/jellyfish-20-mbps-hd-hevc-10bit.mkv",
"test_%04d.jpg"
],
name="extract-frames",
task_id="extract_frames",
volume=[volume],
volume_mounts=[volume_mount],
get_logs=True
)
Here's some speculation as to what may be wrong:
(Where your error is most likely coming from) KubernetesPodOperator expects parameter "volumes", not "volume"
In general, it's bad practice to mount onto "/" since you will be deleting everything that comes on the image you're running. i.e. you should probably change "mount_path" in your VolumeMount object to something else like "/stored_frames"
You should create a test pod to verify your k8s objects (volumes, pod, configmap, secrets,etc) before wrapping that pod creation in the DAG with KubernetesPodOperator. Based from your code above, it can look like this:
apiVersion: v1
kind: Pod
metadata:
name: "extract-frames-pod"
namespace: "default"
spec:
containers:
- name: "extract-frames"
image: "jrottenberg/ffmpeg:3.4-scratch"
command:
args: ["-i", "http://www.jell.yfish.us/media/jellyfish-20-mbps-hd-hevc-10bit.mkv", "test_%04d.jpg"]
imagePullPolicy: IfNotPresent
volumeMounts:
- name: "test-volume"
# do not use "/" for mountPath.
mountPath: "/images"
restartPolicy: Never
volumes:
- name: "test-volume"
persistentVolumeClaim:
claimName: "test-volume"
serviceAccountName: default
I expect you will get the same error that you had: "message":"Not found: \"test-volume\"","field":"spec.containers[0].volumeMounts[0].name"
Which I think is an issue with your PersistentVolume
manifest file. Did you set the path test-volume
? Something like:
path: /test-volume
and does the path exists in the target volume? If not create that directory/folder. That might solve your problem.