Kubernetes emptyDir and symlinks

10/27/2021

Context

I have a pod with two containers:

  • main whose simple job is to display the content of a directory
  • sidecar whose responsibility is to synchronize the content of a blob storage into a predefined directory

In order for the synchronization to be atomic, sidecar download the blob storage content into a new temp directory and then switch a symlink in the target directory.

The target directory is shared between the two containers using an emptyDir volume.

Problem

main has the symlink but cannot list the content sitting behind.

Question

How to access the latest synchronized data?

Additional information

Reason

I try to achieve what is being done by Apache Airflow with Git-Sync but, instead of using Git, I need to synchronize files from an Azure Blob storage. This is necessary because (1) my content is mostly dynamic and (2) the azureFile volume type has some serious performance issues.

Sync routine

declare -r container='https://mystorageaccount.dfs.core.windows.net/mycontainer'
declare -r destination='/shared/container'

declare -r temp_dir="$(mktemp -d)"
azcopy copy --recursive "$container/*" "$temp_dir"

declare -r temp_file="$(mktemp)"
ln -sf "$temp_dir" "$temp_file"
mv -Tf "$temp_file" "$destination"

What we end up with:

$ ls /shared
container -> /tmp/tmp.doGz2U0QNy
$ ls /shared/container
file1.txt file2.txt

Solution

My initial attempt had two mistakes: 1. The symlink target was not present in the volume 2. The symlink target pointed to an absolute path in the sidecar container so, from the point of view of the main container, the folder did not exist

Here is the routine revised:

declare -r container='https://mystorageaccount.dfs.core.windows.net/mycontainer'
declare -r destination='/shared/container'
declare -r cache_dir="$(dirname $destination)"

declare -r temp_dir="$(mktemp -d -p $cache_dir)"
azcopy copy --recursive "$container/*" "$temp_dir"

ln -sf "$(basename $temp_dir)" "$cache_dir/symlink"
mv -Tf "$cache_dir/symlink" "$destination"
-- flappy
airflow
kubernetes
linux
persistent-volumes
symlink

1 Answer

10/27/2021

A symlink is just a special kind of file that contains a filename; it doesn't actually contain the file content in any meaningful way, and it doesn't have to point to a file that exists. mktemp(1) by default creates directories in /tmp, which probably isn't in the shared volume.

Imagine putting a physical file folder in a physical file cabinet, writing the third drawer at the very front on a Post-It note, and driving to another building, and handing the note to a colleague. The Post-It note (the symlink) still exists, but in the other building's (container filesystem's) context, the location it names isn't especially meaningful.

The easiest way around this is to ask mktemp to create the file directly in the destination volume, and then create a relative-path symlink.

# extract the volume location (you may already have this)
volume_dir=$(dirname "$destination")

# force the download location to be inside the volume
# (mktemp --tmpdir option)
temp_dir=$(mktemp -d --tmpdir "$volume_dir")

# actually do the download
azcopy copy --recursive "$container/*" "$temp_dir"

# set the symlink to a relative-path symlink, since the directory
# and the link are in the same place; avoids problems if the volume
# is mounted in different places in the two containers
ln -sf $(basename "$temp_dir") "$destination"
-- David Maze
Source: StackOverflow