A complex .yaml
file from this link needs to be fed into a bash script that runs as part of an automation program running on an EC2 instance of Amazon Linux 2. Note that the .yaml
file in the link above contains many objects, and that I need to extract one of the environment variables defined inside one of the many objects that are defined in the file.
Specifically, how can I extract the
192.168.0.0/16
value of theCALICO_IPV4POOL_CIDR
variable into a bash variable?
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
I have read a lot of other postings and blog entries about parsing flatter, simpler .yaml
files, but none of those other examples show how to extract a nested value like the value
of CALICO_IPV4POOL_CIDR
in this question.
You have two problems there:
I have guessed that you need the YAML document of kind 'DaemonSet' from reading Gregory Nisbett's answer.
I will try to only use tools that are likely to be already installed on your system because you mentioned you want to do this in a Bash script. I assume you have JQ because it is hard to do much in Bash without it!
For the YAML library I tend to use Ruby for this because:
It was suggested to use yq, but that won't help so much in this case because you still need a tool that can extract the YAML document.
Having extracted the document I am going to again use Ruby to save the file as JSON. Then we can use jq.
Extracting the YAML document
To get the YAML document using Ruby and save it as JSON:
url=...
curl -s $url | \
ruby -ryaml -rjson -e \
"puts YAML.load_stream(ARGF.read)
.select{|doc| doc['kind']=='DaemonSet'}[0].to_json" \
| jq . > calico.json
Further explanation:
I pass that response through jq .
so that it's formatted for human readability but that step isn't really necessary. I could do the same in Ruby but I'm guessing you want Ruby code kept to a minimum.
Selecting the key you want
To select the key you want the following JQ query can be used:
jq -r \
'.spec.template.spec.containers[].env[] | select(.name=="CALICO_IPV4POOL_CIDR") | .value' \
calico.json
Further explanation:
spec.template.spec.containers[].env[]
iterates for all containers and for all envs inside themPutting it all together:
#!/usr/bin/env bash
url='https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml'
curl -s $url | \
ruby -ryaml -rjson -e \
"puts YAML.load_stream(ARGF.read)
.select{|doc| doc['kind']=='DaemonSet'}[0].to_json" \
| jq . > calico.json
jq -r \
'.spec.template.spec.containers[].env[] | select(.name=="CALICO_IPV4POOL_CIDR") | .value' \
calico.json
Testing:
▶ bash test.sh
192.168.0.0/16
If you're able to install new dependencies, and are planning on dealing with lots of yaml files, yq
is a wrapper around jq that can handle yaml. It'd allow a safe (non-grep) way of accessing nested yaml values.
Usage would look something like MY_VALUE=$(yq '.myValue.nested.value' < config-file.yaml)
Alternatively, How can I parse a YAML file from a Linux shell script? has a bash-only parser that you could use to get your value.
MYVAR=$(\
curl https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml | \
grep -A 1 CALICO_IPV4POOL_CIDR | \
grep value | \
cut -d ':' -f2 | \
tr -d ' "')
Replace curl https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
with however you're sourcing the file. That gets piped to grep -A 1 CALICO_IPV4POOL_CIDR
. This gives you 2 lines of text: the name line, and the value line. That gets piped to grep value
, which now gives us the line we want with just the value. That gets piped to cut -d ':' -f2
which uses the colon as a delimiter and gives us the second field. $(...)
executes the enclosed script, and it is assigned to MYVAR
. After this script, echo $MYVAR
should produce 192.168.0.0/16
.
The right way to do this is to use a scripting language and a YAML parsing library to extract the field you're interested in.
Here's an example of how to do it in Python. If you were doing this for real you'd probably split it out into multiple functions and have better error reporting. This is literally just to illustrate some of the difficulties caused by the format of calico.yaml
, which is several YAML documents concatenated together, not just one. You also have to loop over some of the lists internal to the document in order to extract the field you're interested in.
#!/usr/bin/env python3
import yaml
def foo():
with open('/tmp/calico.yaml', 'r') as fil:
docs = yaml.safe_load_all(fil)
doc = None
for candidate in docs:
if candidate["kind"] == "DaemonSet":
doc = candidate
break
else:
raise ValueError("no YAML document of kind DaemonSet")
l1 = doc["spec"]
l2 = l1["template"]
l3 = l2["spec"]
l4 = l3["containers"]
for containers_item in l4:
l5 = containers_item["env"]
env = l5
for entry in env:
if entry["name"] == "CALICO_IPV4POOL_CIDR":
return entry["value"]
raise ValueError("no CALICO_IPV4POOL_CIDR entry")
print(foo())
However, sometimes you need a solution right now and shell scripts are very good at that.
If you're hitting an API endpoint, then the YAML will usually be pretty-printed so you can get away with extracting text in ways that won't work on arbitrary YAML.
Something like the following should be fairly robust:
cat </tmp/calico.yaml | grep -A1 CALICO_IPV4POOL_CIDR | grep value: | cut -d: -f2 | tr -d ' "'
Although it's worth checking at the end with a regex that the extracted value really is valid IPv4 CIDR notation.
The key thing here is grep -A1 CALICO_IPV4POOL_CIDR
.
The two-element dictionary you mentioned (shown below) will always appear as one chunk since it's a subtree of the YAML document.
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
The keys in calico.yaml
are not sorted alphabetically in general, but in {"name": <something>, "value": <something else>}
constructions, name
does consistently appear before value
.
As others are commenting, it is recommended to make use of yq
(along with jq
) if available.
Then please try the following:
value=$(yq -r 'recurse | select(.name? == "CALICO_IPV4POOL_CIDR") | .value' "calico.yaml")
echo "$value"
Output:
192.168.0.0/16