Elixir CI on Speed

How we Achieved 3x Faster Builds for Free

Continuous Integration (CI) speed is crucial for developer productivity. Slow CI pipelines lead to context switching, delayed feedback, and frustrated developers. Here's how we significantly improved our Elixir project's CI performance without increasing costs.

Before:

GitHub CI action workflow before optimizing (PR only)

After:

GitHub CI action workflow after optimizing (PR only)

1. Optimize CI Flow

Combine CI Jobs

Parallelization is great, right? But we found that having too many separate CI jobs was actually slowing us down. Each job requires:

  • Setting up a new environment
  • Installing dependencies
  • Restoring/saving caches

By combining related jobs, we reduced overhead while maintaining clear separation of concerns.

⚠️ GitHub charges for started runner minutes per job (rounds up). So having several shorter jobs artificially bumps your price tag compared to a combined larger one.

The main downside is that the workflow overview shows less clearly what failed. Instead, you have to look at the job logs to find the issue.

Optimal Runner Size

We moved from standard GitHub-hosted 2-core runners to GitHub-hosted 8-core runners. While this might seem like it would cost more, the reduced run time often means you actually use fewer total compute minutes.

ℹ The quota of free standard runner minutes is not wasted but is still used for other job types such as deployment.

You need to experiment with combinations of runner size and --max-cases concurrency setting. This might be different for unit tests vs. feature tests. Browser-based feature tests require more compute resources and run best with a lower --max-cases setting.

2. Smart Test Execution

Selective Feature Testing

Feature tests, especially those using browser automation, are typically the slowest part of the test suite. We used to run feature tests on every update of a pull request.

We now run feature tests only when:

This approach dramatically reduced the average CI time while maintaining confidence in our deployment pipeline.

Defer Dependency Installation

This is a follow up to the previous point: Not all tests need all dependencies.

We deferred installation of libraries required for features tests only. We moved libraries like pdftk to be installed only when running feature tests, reducing setup time for most CI runs.

Replace Slow Dependencies

We replaced wallaby with phoenix_test_playwright for browser testing, which provided better performance and stability. When evaluating test dependencies, consider both feature completeness and performance characteristics.

3. Leverage Graphite Stacks

The CI vs. Code Review Tradeoff

There's traditionally been a tradeoff between optimal code review and optimal CI:

  • Small PRs are great for review but trigger redundant CI runs
  • Large PRs are efficient for CI but terrible for review

Graphite stacks solve this dilemma by allowing you to create small, focused Pull Requests for better code review while still optimizing CI runs.

Optimize CI in Stacks

Graphite Optimize CI

Instead of running CI on every commit, we run CI only on:

  • The bottom commit (to verify the foundation)
  • The top commit (to verify the final state)

This significantly reduces our CI runs while maintaining high confidence in our code quality.

4. Ensure ExUnit Concurrency

Elixir's built-in test framework, ExUnit, has powerful concurrency features that are often underutilized. Here's how to leverage them:

Mark Tests as Async

The simplest win is marking test cases as async when possible:

use ExUnit.Case, async: true

You always do this anyway? Are you sure - or have some tests slipped through the net? You can use credo to automatically identify test cases that could be run asynchronously. This alone can provide a significant speed boost, especially in projects with many independent tests.

# .credo.exs
# ...
  enabled: [
    {Credo.Check.Refactor.PassAsyncInTestCases, []},

Use ExUnit 1.18's Concurrency Groups

ExUnit 1.18 introduced concurrency groups, a game-changing feature for tests that contend for the same resources. Instead of running all tests sequentially or risking race conditions, you can group related tests:

use ExUnit.Case, async: true, group: :commanded
test "in-memory-event-store-dependent test" do
  # ...
end

Tests within the same group run sequentially, while different groups run concurrently. This provides the perfect balance between safety and speed.

5. Address Technical Debt

Fix Flaky Tests

Flaky tests are worse than no tests - they waste time, reduce confidence, and often mask real issues. We prioritized fixing flaky tests by:

  • Adding better assertions
  • Removing timing-dependent logic
  • Improving test isolation

Optimize Slow Tests

We identified and optimized particularly slow tests, focusing on:

  • Reducing unnecessary setup/teardown
  • Improving database interaction patterns
  • Optimizing background job handling (particularly with Commanded)

You can identify slow tests using these mix commands:

mix test --slowest         # Shows 10 slowest tests
mix test --slowest 20     # Shows 20 slowest tests
mix test --trace          # Detailed output per test

Results

These changes reduced our average CI time from 15 minutes to 4 minutes, without increasing our CI costs. The biggest wins came from:

  1. Optimizing the CI flow
  2. Smart test execution strategies
  3. Proper use of async tests and concurrency groups
  4. Fixing and optimizing slow/flaky tests

Here are some more GitHub workflow examples, this time including deployment (build: docker image, deploy: via terraform). The Build (docker image) and Deploy (terraform) steps still use 2-core runners because they are less compute bound and less time critical for the normal development flow.

Before:

GitHub CI action workflow before optimizing (with deploy)

After:

GitHub CI action workflow after optimizing (with deploy)

Next Steps

To implement these improvements in your project:

  1. Audit your test suite for async opportunities
  2. Implement concurrency groups for related tests
  3. Review and combine CI jobs
  4. Profile your slowest tests
  5. Consider selective feature testing
  6. Consider using stack-based code review
  7. Try out larger runner sizes and --max-cases concurrency settings

Remember, CI optimization is an ongoing process. Regular monitoring and adjustment of your CI pipeline will help maintain these performance gains over time.