Building Pull Request-based ephemeral Preview environments on Kubernetes

Building Pull Request-based ephemeral Preview environments on Kubernetes

A CTO of a company calls you. They just migrated from Heroku to AWS on EKS. He's happy with the migration but wants you toย build Heroku's "Ephemeral Preview Apps" on Kubernetes.
You know you can use ArgoCD for this, but you're in for some surprises and complications!
He wants to build ephemeral preview apps for both frontend and backend repos.
  1. Frontend repos are simple Single Page Apps using Vue.js
  1. Backend repos are Python+Django and use PostgreSQL, Redis, and MongoDB.
He lists down some more asks, which complicate things a bit.
He wants you to handle:
  • dependency management for services
  • database migrations & seed data for backend services
  • automated deletion of envs to save costs
  • integration with Jira and GitHub Deployments
  • and much more
All this while using existing tools as much as possible.

Your Documentation-driven approach

You sign up for this work and start creating a doc listing all requirements and identifying the unknowns. You've built preview environments before. However, handling dependency management, database migrations, seed data, etc., often requires custom solutions as it is contextual. Couple with that, some constraints to use existing tools, and now you have some interesting engineering work!
You list down existing tools and processes. They are:
  • Kustomize
  • GitHub Actions for CI
  • ArgoCD
  • Versioned DB scripts
  • AWS Secrets Manager for secrets, etc.
The next step is to try out some POCs to convert theย "known unknowns" into "knowns". You know that the preview envs can easily be created for the frontend repos. For backend apps, you'll need to find out answers to some questions.
  1. Do we create PostgreSQL, Redis, MongoDB for each PR, or can these be shared?
  1. Do we need the ability to point a preview service to another preview service? Or does it always point to staging env?
You'll need to design the system based on answers to these and other questions.
So you do the grunt work, write down all questions, discuss the trade-offs with the CTO and other engineering leads, and finally, you come up with a solution that handles all these cases. Getting to this solution requires some POCs, trial and error, but it's part of the process.

Ephemeral PR based Preview environment workflow

Here's how you design the workflow.
notion image
  1. A Developer creates a "Preview" labeled PR
  1. Start CI workflow
  1. ArgoCD watch the PR
  1. ArgoCD creates application deployment in K8s
  1. Preview env public endpoint is made available to devs and QA
  1. ArgoCD deletes the resources when PR is merged
This flow works well for both frontend and backend repos.


Here are three main challenges you handle along the way:
  1. Seed data management
  1. Dependency management for services
  1. Keeping costs low for the Preview environments
Let's expand on the challenges further.
  1. Seed data management
You create a custom PostgreSQL image already loaded with seed data. This seed data is version-controlled in Git. That way, devs can easily update the PostgreSQL image when some new data needs to be loaded.
  1. Dependency management for services
You run the database containers in the same preview namespace for each PR. Thus, they are isolated from other PRs. By default, service A's PR will point to service B's staging env (if service A depends on service B) but can be easily overridden by devs by a config change.
  1. Keeping costs low for the Preview environments
To keep Preview env costs in check, you suggest running it on Spot instances. Obviously, you're also deleting all resources of the preview environment if it's not actively being used. This concludes your work. The CTO is super happy and wants to work with you further.

Are you such a CTO or engineering leader looking to supercharge developer productivity?
If you're looking for a reliable engineering partner for all things Infra, DevOps, Observability, and Reliability, reach out to me on LinkedIn or Twitter.
We do Pragmatic Software Engineering - on Production. That's it!

I write such stories on software engineering. There's no specific frequency as I don't make up these. If you liked this one, you might love -
Taming GCP networking cloud costs
Follow me on LinkedIn and Twitterย for more such stuff.