How we built it: Saving time and money with self-hosted runners on EC2 on-demand / spot instances

Mahdi Torabi·Feb 2, 2024

TL;DR

In Unblocked’s early days, our dependence on GitHub’s hosted runners became a bottleneck for both cost and performance. In response, we created a custom GitHub Action that automatically deploys self-hosted ephemeral runners to EC2 using Spot or On-Demand instances.

In this article, we share why and how we’ve done this — and why it might help you too.

Familiar with GitHub Actions? Skip to examples

Why use self-hosted runners?

By default, GitHub Actions jobs are executed on machines hosted and managed by GitHub. However, when GitHub’s machines do not meet the minimum hardware requirement for the app, teams often turn to the self-hosted runners. That was the scenario we encountered.

When we began building Unblocked, GitHub only provided machines with 2 cores, 7GB RAM, and no flexibility to configure instances. The limited computing capacity significantly slowed us down.

A year into development, we found ourselves routinely waiting 2 hours for the main pipeline to install packages, build, test and deploy changes. Mistakes became costly as bug fixes took longer to reach production. Coordinating releases and sequencing major changes became painful. Everyone dreaded small fixes that could waste half their day. Code changes and pull requests were harder to review as they bloated up. Temptations for bypassing tests grew, feeding the vicious cycle.

We also had a large number of nightly maintenance tasks that required us to run parallel workflows. Given the GitHub pricing, the routine soon became expensive.

Frustrated with the inefficiency, we set out to look for a customizable solution to our hardware, OS, software, and security requirements.

Why use AWS EC2?

As we already ran our infrastructure on AWS, we were familiar with EC2’s reliable compute. However, 2 particular factors made EC2 a straightforward choice for us.

  1. EC2 offers better cost savings

  2. While GitHub eventually announced larger hosted runners in mid-2023, we found the On-Demand instances offers 30-70% savings compared to GitHub.

    OSvCPUGH Price/MinuteEC2 Price/Minute
    Linux2$0.008$0.001284
    (c5a.large)
    Linux4$0.016$0.00257
    (c5a.xlarge)
    Linux8$0.032$0.0114
    (c5a.2xlarge)
    Linux16$0.064$0.0114
    (c5a.x4large)
    Linux32$0.128$0.02054
    (c5a.x8large)
    Linux64$0.256$0.041067
    (c5a.x16large)
  3. EC2 offers flexibility to leverage both On-Demand and Spot instances

  4. Using EC2, we could configure On-Demand instances for high priority jobs, and take advantage of spot instances for low priority jobs.

    Spot instances are spare compute capacity in the AWS cloud offered at a steep discount (up to 90%) compared to On-Demand prices. The only catch is AWS can take them back at any point, causing a Spot interruption. Spot instances are suitable for more flexible and fault-tolerant jobs.

    We could also extend the feature to obtain much larger spot instances (for even faster build time) for the same price as the On-Demand instance.

Optimized solution for self-hosted runners on EC2

Using EC2 would reduce cost significantly, but we still needed a strategic way to deploy, manage, and decommission instances for various use cases. Most existing solutions we found were limited in instance customization and required extra work to de-provision.

To solve the full range of challenges, we developed a custom GitHub Action. This action automates the deployment, management, and decommissioning of ephemeral self-hosted runners on AWS EC2.

Here’s what it helped us achieve:

  • Significant cost reduction: Up to 90% savings when using Spot instances, and 30-70% savings when using On-Demand. Lower data transfer cost to AWS services (e.g. ECR and S3).
  • Flexible instance usage: Automatically switch between Spot and On-Demand based on availability, performance, and a pre-configured spending limit.
  • Custom machine images: Pre-load images with tools to save time and ensure a consistent build environment.
  • Security advantages: Leverage AWS EC2 security monitoring tools, run builders within our own infrastructure network, and avoid exposing internal APIs to the internet.

Code example

Check out our GitHub repository for more information about how to setup and configure this action.


jobs:
start-runner:
    timeout-minutes: 5                  # normally it only takes 1-2 minutes
    name: Start self-hosted EC2 runner   
    runs-on: ubuntu-latest
    permissions:
        actions: write        
    steps:      
        - name: Start EC2 runner
        id: start-ec2-runner
        uses: NextChapterSoftware/ec2-action-builder@v1
        with:
            aws_access_key_id: ${{ secrets.DEPLOY_AWS_ACCESS_KEY_ID }}
            aws_secret_access_key: ${{ secrets.DEPLOY_AWS_SECRET_ACCESS_KEY }}
            aws_iam_role_arn: "arn:aws:iam::REDACTED:role/REDACTED"  # Optional: Used for cross-account IAM access only 
            aws_region: "us-west-2"
            github_token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
            github_action_runner_version: "v2.300.2"                 # Optional (default is latest release)
            ec2_instance_type: "c5.4xlarge"
            ec2_ami_id: "ami-008fe2fc65df48dac"
            ec2_subnet_id: "SUBNET_ID_REDACTED"
            ec2_security_group_id: "SECURITY_GROUP_ID_REDACTED"
            ec2_instance_ttl: 40                          # Optional: (default is 60 minutes)
            ec2_spot_instance_strategy: MaxPerformance    # Optional: other choices are 'None' (default), 'BestEffort', 'SpotOnly' 
            ec2_instance_tags: >                          # Optional: Only required when using cross-account IAM access
            [
                {"Key": "Owner", "Value": "deploybot"}
            ]

# Your job that runs on the self-hosted runner 
run-build:
    timeout-minutes: 1
    needs:
        - start-runner
    runs-on: ${{ github.run_id }}           # Do NOT change   
    steps:              
        - run: env
        
                

Conclusion

Many startups treat a dependable and robust CI pipeline as a “nice-to-have” rather than a necessity. However, we learned that it is as critical as the customer-facing elements of the system. While bugs are an inevitable part of building software, the inability to ship fixes quickly can permanently damage our customer relationships. The importance of development velocity, while emphasized theoretically, is often only fully understood in situations like the one we experienced.

As a start up dedicated to developer productivity, we share our mistakes and lessons along our development journey in hopes to help other teams facing similar problems.

We published the open source tool to GitHub Action marketplace. Give it a spin and let us know how it works for you!