Pulumification

Experience and learnings during project that migrated the infra of a company to IaC using Pulumi

I was working with Pulumi to migrate AWS cloud resources of a company to IaC. In this article, we shall explore why I chose Pulumi, how the migration was, and the standards and processes defined along the way.

Looking for an IaC solution

Terraform is the first IaC solution that comes to anyone’s mind. Although it is widely used in the industry, I found it a little intimidating to use HCL, as it reinvented some of the basic programming constructs such as loops. I will take an example straightly from the Terraform Docs to illustrate my point.

variable "users" {
  type = map(object({
    is_admin = bool
  }))
}

locals {
  admin_users = {
    for name, user in var.users : name => user
    if user.is_admin
  }
  regular_users = {
    for name, user in var.users : name => user
    if !user.is_admin
  }
}

In a general-purpose programming language, say TypeScript, we can write this code as:

type User = {
  is_admin: boolean,
}

let users: Map<string, User> = {} // The value is coming from somewhere, say a
config or input

let admin_users = new Map(
  [...users].filter(
    ([name, user]) => user.is_admin
  )
)

let regular_users = new Map(
  [...users].filter(
    ([name, user]) => !user.is_admin
  )
)

While anyone comfortable with JavaScript or TypeScript can understand the below code, it takes some time to understand HCL code and get used to its syntax. I preferred general-purpose programming languages for IaC for a few reasons:

Developers can write IaC in the language that they are used to write applications, this eliminates the need to learn a new language
A developer with sufficient knowledge can maintain the IaC codebase; no need to have dedicated personnel for this
One can use the best practices of the programming language to organize the code, such as refactoring into modules, etc.
In some cases, debugging IaC is also possible by using common debuggers by stepping into the code and inspecting variables!

When I saw a Fireship video about Terraform, a comment mentioned that he should make a video of Pulumi as well; that’s how I got to know about Pulumi! Going through their website and docs, it seemed they offered the exact needs that I wanted. After I read the article “Imperatively Declarative” by Christian Nunciato, I had a clear idea of what Pulumi does, and why it is flexible over other solutions.

While Pulumi offers support for many languages (latest: Pulumi now supports JVM languages and YAML as well!), I chose to go ahead with TypeScript + NodeJS for Pulumi for a few reasons:

Project management with npm is easy, it just takes doing npm install to set up the project on a new machine
At that time, Pulumi’s awsx and eks packages were only available on npm; more about them later!
Going with TypeScript over JavaScript is natural, due to the benefits offered by static type checking. Plus, Pulumi’s packages are also in TypeScript, providing excellent typing support.

The Highlights

In the following sections, I will pick some interesting points about how Pulumi helped write a clean IaC project. Let’s use the following AWS architecture for illustration purposes.

VPC with public and private subnets, replicated for production and staging
An EC2 jumpbox VM in the public subnet of staging VPC, which is in its own security group
VPC peering connection between production and staging VPCs
An RDS instance placed in the private subnet of the VPC, with connections allowed from the jumpbox machine, replicated for production and staging

Stacks, Outputs, and References

In Pulumi, a stack consists of a group of cloud resources. From one codebase, we can create as many stacks as we wish. One typical use case of the stacks would be to have staging and production environments, with nearly identical resources. Apart from the environments, the stacks also help to break down the infrastructure into several units using Stack Reference Architecture.

We can break down our architecture into network and database units, with stage and prod environments. We shall use <unit>.<env> syntax to name the stacks. Each unit sits in its own repository and has a separate stack file.

Let’s focus on the jumpbox in the network.stage stack. Since we want the RDS to be accessible from the jumobox, we export its security group ID as the stack output.

// network/index.ts
import * as aws from "@pulumi/aws"

function createJumpbox(name: string) {
  const securityGroup = new aws.ec2.SecurityGroup(
    `${name}-sg`,
    {
      ingress: [
        {
          cidrBlocks: ["0.0.0.0/0"],
          fromPort: 22,
          toPort: 22,
          protocol: "tcp",
          description: "Allow SSH connection",
        },
      ],
      egress: [
        {
          cidrBlocks: ["0.0.0.0/0"],
          fromPort: 0,
          toPort: 0,
          protocol: "all",
          description: "Allow all outgoing connections",
        },
      ],
    }
  )
  const vm = new aws.ec2.Instance(name, {
    ami: "<add-some-ami-here>",
    instanceType: "t3.medium",
    vpcSecurityGroupIds: [securityGroup.id],
    // Add other options here, such as SSH Key, subnets
  })
  return { securityGroup, vm }
}

const { securityGroup, vm } = createJumpbox("jumpbox")

// Export the security group ID as the stack output
export const jumpboxSecurityGroupId = securityGroup.id

On the database unit, to allow only the connections from jumpbox at port 5432 (which is the port at which PostgreSQL listens), we can do it this way:

// database/index.ts

import * as pulumi from "@pulumi/pulumi"
import * as aws from "@pulumi/aws"

// If we are in the stack `db.stage`, environment is `stage`
const getEnvironment = () => pulumi
  .getStack()
  .split(".")
  .slice(-1)[0]

const environment = getEnvironment()
// Create a Stack Reference to the `network` stack in the same environment
const networkStack = new pulumi.StackReference(
  `network.${environment}`
)
// Get the jumpbox security group ID
const jumpboxSecurityGroupId = networkStack.requireOutput(
  "jumpboxSecurityGroupId"
) as pulumi.Output<string>
// Create Security group and allow incoming connection from jumpbox security
group
const securityGroup = new aws.ec2.SecurityGroup("db-sg", {
  ingress: [
    {
      securityGroups: [jumpboxSecurityGroupId],
      fromPort: 5432,
      toPort: 5432,
      protocol: "tcp",
      description:
        "Allow incoming connection from jumpbox at port 5432",
    },
  ],
})
// Create RDS Instance
const rdsInstance = new aws.rds.Instance("db", {
  instanceClass: "db.m3.medium",
  vpcSecurityGroupIds: [securityGroup.id],
  // Associate subnet group etc
})

The requireOutput() method of the StackReference class is the key here. Using the stack reference pattern, one can easily split a large infrastructure into smaller units and link them.

Note: When using the stack reference, if the output of a stack changes, we have to update other stacks manually by selecting the stack pulumi stack select <stack name> and `pulumi update.

Sometimes, not doing so might break the things elsewhere.

The stack reference can also be used to refer to the stacks in the same project as well. As an example, in network.prod, we create VPC Peering connection to network.stage and export jumpboxSecurityGroupId reusing it from network.stage

// network/index.ts

import * as pulumi from "@pulumi/pulumi"
import * as aws from "@pulumi/aws"

// Get the current environment, getEnvironment() defined earlier
const environment = getEnvironment()

const isStaging = environment === "stage"

let securityGroupId = null

if (isStaging) {
  // Function createJumpbox() defined in earlier example
  const { securityGroup: jumpboxSecurityGroupId, vm } =
    createJumpbox("jumpbox")
  securityGroupId = jumpboxSecurityGroupId.id
} else {
  const stagingEnvironment = new pulumi.StackReference(
    "network.stage"
  )
  // Create VPC peering
  securityGroupId = stagingEnvironment.requireOutput(
    "jumpboxSecurityGroupId"
  )
}

export const jumpboxSecurityGroupId = securityGroupId

Config File, Secrets

In the above examples, you may have seen that we have hardcoded the EC2 instnace type in the code. Whenever we want to change the instance type, editing the code does not seem to be a good idea. Instead of that, Pulumi offers us configs, which can be different for different stacks in the same project. In the network project in the above example, we can refactor the instance type to Pulumi.network.<env>.yaml, using pulumi CLI.

pulumi config set jumpboxInstanceType t3.medium

We can also check the config file to verify that the value has been added.

# Pulumi.test.stage.yaml
config:
  test:jumpboxInstanceType: t3.medium

Then, in the code, we can access config values using the Config class.

import * as pulumi from "@pulumi/pulumi"
import * as aws from "@pulumi/aws"

const config = new pulumi.Config()

const instanceType = config.require("jumpboxInstanceType")

function createJumpbox(
  name: string,
  args: { instanceType: string }
) {
  const { instanceType } = args
  // same as above
  // const securityGroup =
  const vm = new aws.ec2.Instance(name, {
    ami: "<add-some-ami-here>",
    instanceType,
    vpcSecurityGroupIds: [securityGroup.id],
    // Add other options here, such as SSH Key, subnets
  })
  return { vm, securityGroup }
}
// Create jumpbox instance
const { securityGroup, vm } = createJumpbox("jumpbox", { instanceType })

Pulumi can also store sensitive values such as API keys and passwords in the config via secrets. We can pass the --secret flag to pulumi config set command to store a secret.

pulumi config set --secret secretTest someValue

It will add the encrypted "someValue" to the YAML file. Note that the encrypted value will be different based on your key.

# Pulumi.test.stage.yaml
config:
  stack-test:secretTest:
    secure: v1:ZweLtoIrBL/w8QjD:EGbnlgUGg1RcOb+lrQK7o7yXsE0oAh2Hqw==

To access the decrypted value, we use Config.requireSecret() function in the code.

import * as pulumi from "@pulumi/pulumi"

const config = new pulumi.Config()

const secretValue = config.requireSecret("secretTest")

// secretValue is pulumi.Output<any>, so we log it once we have a value for it
secretValue.apply(value => {
    console.log(value)
    pulumi.log.debug(value)
})

export const secretOutput = secretValue

While the secretValue in the above code is accessible by other resources as the plain string, it is not printable. The log statements above just print [secret], instead of giving out the value. This value is stored encrypted even in the state file!

Note: Although Pulumi ensures that the secret value is not stored in plaintext from its side, it is still your responsibility to store them securely on the cloud side.

As an example, let’s say we are using an API key as an environment variable to AWS Lambda. While we can store the API key as a secret in Pulumi config, it is still viewable plaintext at Lambda-side.

Crosswalk Packages

The Pulumi CrossWalk packages encapsulate commonly used infrastructure patterns as a ComponentResource. While we can create VPC using @pulumi/aws, by using Vpc and Subnet classes along with RouteTable class to connect the subnets, and NatGateway and InternetGateway to configure the network access, the creation of the same using @pulumi/awsx package is as simple as,

// network/index.ts
import * as awsx from "@pulumi/awsx";

type SubnetArgs = awsx.ec2.VpcSubnetArgs;

const publicSubnetCount = 2;
const privateSubnetCount = 2;

// Returns array with n elements
const range = (n: number) => [...Array(n).keys()];

const publicSubnets: SubnetArgs[] = range(
  publicSubnetCount
).map(() => ({ type: "public" }));
const privateSubnets: SubnetArgs[] = range(
  privateSubnetCount
).map(() => ({ type: "private" }));

const vpc = new awsx.ec2.Vpc("vpc", {
  cidrBlock: "10.0.0.0/16",
  subnets: [...publicSubnets, ...privateSubnets],
  numberOfNatGateways: 1,
});

Lambda Magic

When we write the IaC in TypeScript, we are in the realm of code. Pulumi offers excellent ways to utilize this realm and provides some helper functions with callbacks, which are then magically converted into AWS Lambdas. As a concrete example, we can listen to an event on an S3 bucket by writing a callback function in the IaC.

import * as aws from "@pulumi/aws";

const bucket = new aws.s3.Bucket("media-bucket");

bucket.onObjectCreated("bucketEvent", (args) => {
  console.log("Got ObjectCreated event");
  // You can do something useful with args
  // This will be executed in the Lamdba
});

Such functionality is available for a variety of event sources. I implemented this functionality for MQ by overriding the source code in the project. I plan to test this functionality thoroughly using LocalStack and open a PR near soon.

Defining the Standards

Pulumi does not enforce any restrictions on project organization. While nothing is wrong with creating every resource of the stack in the index.ts file, keeping the code organized helps to maintain it. Here are some of the standards I defined and followed while writing the IaC:

Stack names should follow the projectName.environment syntax.
- orgName/projectName is not supported in the S3 backend, thus this naming convention.
- Thanks @briggsl for pointing that out!
Every source file should be under the src/ directory. By default, Pulumi’s aws-typescript template creates index.ts in the project root.
No resource should be directly created in index.ts, which is not directly relevant to the main resource that the stack creates. They should be rather refactored into resource.ts exporting createResource() function.
- If the resource creation takes only one call, it can still be in the index.ts, as there is no point in wrapping it with other function.
- Mostly, these were related to IAM; Creating roles and policies were refactored into a separate file.
No function should decide the resource name on its own, rather it should be passed top-down from index.ts.
- This makes it easy to follow the path from which a resource is coming.
- The functions can prepend some identifier to the name while passing it to sub-functions
- The name should be the first argument of the function, for example, createResource(name, args). This is in line with Pulumi’s convention of creating resources.
No hard-coded values in the code, they should be rather in the config.
- The config values are stored under an object, whose type is declared in the index.ts as ProjectConfig.
- Then, one can use config.requireObject<ProjectConfig>("projectConfig") to retrieve the config.
- No other file than index.ts should read the config! This would make it easy to track the flow of the config values.
No call to Config.getX() and StackReference.getOutput() methods!
- These methods will silently return an empty value if the key is not present
- The require methods are rather strict and will throw an exception if the key is not present, thus, eliminating a few errors

CI/CD for the Infra

While Pulumi Service provides a nice way to preview PR changes and look at the current state of the stack, I went with S3 to store the state file, and used GitLab CI/CD pipelines for deploying infrastructure changes. The rules of the pipeline looked as:

Pipeline contains two stages:
- preview, to preview infrastructure changes
- deploy, to deploy the changes
deploy is only available for the main branch, and contains manually triggered jobs
After deploy is complete, a Slack notification is triggered
There can be multiple jobs in the preview and deploy stages, depending on the environments

Then, the developer workflow would be:

Create a branch for proposed infrastructure changes
Commit and push the code, verify if changes look OK
Start a PR, the reviewer looks at the code along with the preview job output to ensure nothing is breaking
Once the PR is accepted and merged, a maintainer can trigger deploy
Team is notified on Slack about new changes 🎉

Overall Experience

Pulumi was my first experience with IaC, and it helped me to get started with AWS more easily, thanks to the nice APIs in the code. Whenever I was struck with something, I would go to the Pulumi Slack Community, and people are very helpful there.

Pulumi brought IaC to General Purpose Languages, which also can benefit from existing good practices regarding reusability, as I experienced it with TypeScript. I used Pulumi at a beginner level, and there is still a lot to learn regarding ComponentResource, stack functions, and Policy as a Code.

TL;DR

I was working with Pulumi to bring the entire AWS infrastructure of a company to IaC
I used NodeJS+TyepScript to work with Pulumi
The entire infrastructure was divided into several units, which were linked using the Stack Reference pattern
The Pulumi CrossWalk packages, awsx and eks helped to quickly reuse the standard patterns
The migration of the infrastructure to Pulumi was 0-downtime, mostly done behind the scenes of the DNS
I also defined several standards for the IaC, to make the code maintainable and understandable
I set up a CI/CD pipeline to deliver infrastructure changes, making the process feel like a breeze
There is still a lot for me to explore about Pulumi, including Component Resources and Policy as a Code