Introducing envhero

Intro#

At ShipHero, we try to give back to the community as much as we can. Every time we build an internal tool that we know could benefit other devs outside the company, we package it in a way that can be shared. Sometimes it ends up being useful, sometimes not, but it’s important for us to make those available to leave that door open.

As with any company with some years behind it, we have a variety of code that ranges from new and up-to-date with the latest good practices to ancient and in need of improvement. Being a live company with a product that is very much alive and constantly evolving, we are quite sure that most of our code gets updated over time (we actually check this, and when something is not scheduled to be modified relatively soon and we think it doesn’t meet a minimal quality standard, we do ad-hoc refactors).

One long-standing issue we have is environment variable management. Throughout the years, we have used environment variables for many things—mostly configurations and flagging.

As systems grow and specialize, we end up having sets of environment variables that are common to all, sets that are common only to subgroups, others that are unique to a single service, and a final group that is common to many but with very different values in each case.

Environment variables are not inherently bad. Sure, there are other ways to distribute and store this information, but we are a running company with a large customer base that we do not want to disrupt, so controlled and careful gradual changes are the norm.

Python, given its “interpreted language” nature, introduces an issue here: environment variables will not pose a problem until accessed. The access can be done early in the process or weeks after introducing the change, depending on the code path being executed. This is what we tried to solve.

Our approach#

Working example#

For this example, we will create a small sample app in Python:

class A:
    def __init__(self):
        self.foo = os.getenv("FOO")
        self.bar = os.environ.get("BAR")

a = A()
a.foo = os.getenv("FOO", "foo")

This will live in a simple project folder like so:

/my_project
 /_________src
           /___my_python.py

Inventory count#

The first thing to do before taking any action is to measure. Our initial intention was to build a tool to catalog all our environment variable usage. With this in mind, we created the first iteration of envhero.

In our initial approach, while in /my_project/src, we call:

$ envhero create -o env_var_catalog.json --exclude-dir tests --exclude-dir .venv
Scanning codebase for os.environ.get or os.getenv calls...
Found 3 unique environment variables
Found 3 total environment variable references
Catalog written to env_var_catalog.json

This gives us our initial catalog:

[
  {
    "name": "FOO",
    "has_default": false,
    "default_value": null,
    "packages": [
      "my_python.py"
    ],
    "tags": [
      ""
    ],
    "locations": [
      {
        "file": "src/my_python.py",
        "line": 3
      }
    ],
    "inferred_type": ""
  },
  {
    "name": "BAR",
    "has_default": false,
    "default_value": null,
    "packages": [
      "my_python.py"
    ],
    "tags": [
      ""
    ],
    "locations": [
      {
        "file": "src/my_python.py",
        "line": 4
      }
    ],
    "inferred_type": ""
  },
  {
    "name": "FOO",
    "has_default": true,
    "default_value": "foo",
    "packages": [
      "my_python.py"
    ],
    "tags": [
      ""
    ],
    "locations": [
      {
        "file": "src/my_python.py",
        "line": 7
      }
    ],
    "inferred_type": "str"
  }
]

As you can see, it lists all the variables, treating those with default values differently from those without.

This led us to our next step.

Catalog Organization#

We now knew what variables were used across all our code and which ones had default values.

But data for data’s sake—even if a fun endeavor—is not useful on its own. Our interest was to ensure all environments had the right values set for the desired variables, and also that we could fail early if this was not the case.

Our next step, therefore, was to know which environment had which variables.

At first, we thought this could be achieved by a simple evaluation by an engineer and some annotations in the tags section… but no. Our catalog file is significantly larger than that. So we created an auto-tagger for our cataloger.

We now had this information for the baseline: our current environments have all variables set and set to the right values, which is a great starting point.

For our particular case, our environments are defined in AWS tasks, but we considered that this might not be the case for everyone, so we added support for the following scenarios:

You have an environment with all the variables set
You have a task definition in a YAML file
You have a running environment in AWS

So our tool supports auto-tagging.

In this context, tagging has the same meaning as in Ansible, where a tag more or less maps to a service.

We can run:

AWS_PROFILE=a_profile envhero tags_from_env -c env_var_catalog.json -d service_folder -t aservice
Loaded catalog with 3 environment variables
Successfully saved catalog with 3 variables to 'env_var_catalog.json'

In this case, it will fetch the defined environment tags from AWS and match them against the catalog. For the ones that are found both in the environment and in the catalog, the tag aservice is added.

The same can be done against a task definition in JSON or even from a local shell environment.

Profit#

And now, the useful part: how do you use this?

Well, there are many ways.

On the dev side#

A simple way to ensure it is kept up to date is by adding checks to your pre-commits that ensure the catalog is updated after changes are made to environment variable usage.

Another simple check is flagging changes that do not add a tag.

On the ops side#

This is where envhero shines the most.

You can do the simplest use case, which is running it standalone to certify an environment is apt to run a given workload:

envhero verify -c env_var_catalog.json -t one_tag -t another_tag

You can also wire this into your code’s entry point startup.

from envhero.catalog import load_catalog, filter_vars_by_tag
from envhero.environment import RequiredVariableMissingError, DefaultUsedAsErrorError, must_pass_check

try:
    # Load your environment variable catalog
    catalog = load_catalog("env_var_catalog.json")
    
    # Filter variables by service tags if needed
    service_vars = filter_vars_by_tag(catalog, ["one_tag", "another_tag"])
    
    # Check required environment variables before starting application
    must_pass_check(service_vars, warning_as_error=True)
    
except RequiredVariableMissingError as e:
    print(f"ERROR: Missing required environment variable: {e.var_name}")
    sys.exit(1)
except DefaultUsedAsErrorError as e:
    print(f"ERROR: Using default value for {e.var_name} ({e.default_value}) but warnings are treated as errors")
    sys.exit(1)

# If we get here, all required environment variables are set
start_application()

A little extra#

As a little experiment, we added an extra subcommand, inject_proxy, which replaces all invocations of environment access functions with calls to a method of a singleton. The singleton holds a cache of variables to avoid roundtrips, but it also adds the possibility of performing an action on first access of a variable, such as a trace or log.

Given its experimental nature, it is still undocumented in the official README, but it will get a proper set of docs once we’ve tested and used it enough to determine whether it’s in its final form or still needs refinement.