Recently I am playing with Chef and I am having fun on some software with a little difference across different VMs. For example, on VM1:
at log.xml:
... verbosity=INFO ...
On VM2 it is set to ERROR
, VM3 is set to DEBUG
, etc.
TL;DR: https://github.com/hlx98007/deployment-scripts/blob/master/chef_scripts/merge.py
We know this can be separated by Chef Environment or Chef Role depend on how you set it up. Let’s say, I am using environment for this configuration.
{ "name" : "vm1" ... "log_level" : "INFO", ... }
I assume that you can derive the other VM template file on your head.
OK. I happen to manage a large group of Chef managed laboratories during my internship. We have several development environments and several test environments. And those values are apparently not secret (to our employees). So we have the need to manage them. With the complexity of our product, if any new key is added to, or some old keys are deleted from the environment file, we need to add/subtract the same key to each environment file across all my responsible labs.
This is a pain!
So we have a brilliant idea to include all labs detail into 1 master template.json
. Like this:
{ "name" : { // We must have a "name" key due to it's Chef! "default" : "", "vm1" : "vm1", "vm2" : "vm2", ... }, ... "log_level" : { "default" : "INFO", "vm2" : "ERROR", "vm3" : "DEBUG", ... }, ... }
Looks neat? The keyword to any merged template is the “default” keyword, or anything you define. As long as we just keep the 1 template file, we can version control it, tag it however we like. In the Python script I wrote, I used “template_default” keyword.
We will talk about how to extract those values in the next blog post, here, we will talk about how to merge those files in Python.
In Python’s eye, JSONs are dictionaries. Dictionaries can have indefinite depth and it is not easy to create the same structure with an empty dict. I believe it is best to do this recursively, calling a merge()
to a smaller subset of itself, and create the result to a new dictionary.
Each time when we call the merge()
, we will create an empty dictionary to hold the copied values and eventually return to the upper function. Eventually we will have the correct structure and data.
Consider the following scenarios and my decisions when we call merge.py env1.json env2.json template.json
, we will extract the keys on the current level only for env1.
for key in env1_json.keys()
If env2 has the new key, we can do merge.py env2.json template.json template2.json
to merge the missing keys. (notes at the bottom)
0. value of env1 is exactly the same to env2. (copy and continue) 1. value of env1 is a string, value of env2 is a string 1.1 they equal (copy and continue) 1.2 they differ (build the template structure) 2. value of env1 is a string, value of env2 is a dict 2.1 env2 is not a template style dict (notify and stop) 2.2 env2 is a template style dict 2.2.1 env1 does not exist (add) 2.2.2 env1 does exist but with different value (update) 2.2.3 env1 does exist and with the same value (update) 3. both values of env1 and env2 are dicts. 3.1 env1 is a template style dict (stop and report) 3.2 env2 is a template style dict (stop and report) 3.3 env1 and env2 are both normal dicts (recursive call on itself)
With those points sorted out, the program is not difficult to write.
Known features/bugs that not yet categorized:
1. When env1 has some missing keys and merge with a template json, such keys will be lost in merged.json
. (may be we do want to delete that key in the template?)