DevOps Case Study: How We Drastically Improved Our CI at FundGuard with Modest Effort

May 28, 2025

In this post, I’ll share how we significantly improved our continuous integration (CI) – by a double-digit percentage – thereby reducing delivery times, cutting cloud costs, and making developers’ lives better. This post is intended primarily as a guide for readers looking to make meaningful improvements to their CI without derailing their quarterly plans. While implementation details will vary across organizations, I hope you’ll find ideas here that you can apply in your own environment.

About FundGuard’s CI

In the financial domain, reliability and accuracy are especially critical. To ensure both, we test our code at multiple levels: unit tests, integration tests, and several types of end-to-end tests. End-to-end tests are particularly important – not only because of the broad coverage they provide, but also because they allow us to demonstrate the system’s outputs to clients across versions and environments. This builds confidence in our system and proves that no regressions have been introduced. To minimize the risk of costly regressions, we decided early on to run most of our tests on most of our pull requests (PRs).

The challenge, however, of end-to-end tests and a strict testing policy is the overhead they add to our CI. So, we set out to improve it, focusing on speed (for better delivery times and developer experience) and cost-efficiency (for reduced expenses), without compromising accuracy or reliability.

How We Improved Our CI

Mapping the Issues

Naturally, we focused first on the most time-consuming bottlenecks and cost inefficiencies, and started with the low-hanging fruit. Once we had a clear understanding of our main bottlenecks, in terms of both cost and CI duration, we brainstormed and devised plans that evolved as we implemented solutions.

Ultimately, we tackled the issue using a three-pronged approach:

Reducing CI Time
Improving CI Stability
Minimizing “requeues after service deployment.”

Some of the methods below are simple, and others, such as the smart test selection initiative and latest stable concept, are more complex. Together, they constitute a CI improvement playbook that made a huge difference for our team at FundGuard.

1) Reducing CI Time

We greatly reduced our CI runtime using the following techniques:

1A) Parallelizing Testing

We’ve had a test parallelization framework in place for a long time, but we were still leaving a lot on the table. We improved this by:

Splitting large end-to-end tests into smaller ones, maintaining the same coverage but enabling parallel execution.

Eliminating dependencies between steps in the pipeline and between tests, allowing them to start sooner and run independently when possible. For example, our client end-to-end tests depend on the backend but only require a subset of the backend tests. By splitting backend testing into two parallel phases, we were able to run client tests – previously a CI bottleneck – sooner.

1B) Reducing Test / Pipeline Duration

We shortened our CI duration time through:

Test optimizations – Implementing this will vary across organizations. For example, we were able to reduce the client automation test time by ~30% by “manually” resetting state after each test instead of hard-refreshing the page. Aware of the reduced isolation between tests, we designed the implementation to mitigate the risk, ensuring thorough cleanup and extensive testing before integration.

Pipeline optimizations – we were able to significantly optimize our pipeline, by parallelizing our build process, doing sparse git checkouts, caching dependencies, and making various cleanup phases non-blocking on the pipeline’s completion.

System optimizations – To boost performance and scalability in production, we implemented efficiency improvements like better caching and business logic optimizations. Though not aimed at CI speedups, these changes improved it as a much desired side effect.

1C) Testing Smarter

We aimed to run fewer tests without compromising coverage, approaching the problem in two directions:

Removing Redundant Coverage: Our system consumes and generates large datasets, and so do our heavier tests. To identify redundant coverage, we wrote scripts to analyze the inputs and expected values of our end-to-end tests, categorizing them by the coverage they provide. This helped us eliminate duplicated coverage while still testing a wide range of business logic permutations.

Smart Test Selection: The idea is to only run a subset of the tests for each PR – those that are relevant to the changes introduced in it. The hard part about doing this is figuring out automatically, in a large system with thousands of end to end tests, which tests are relevant to which PR. While we tested some products that do this, they did not provide the desired savings and were too expensive. As a result, we decided to implement a custom solution.

To evaluate potential test reductions, we asked these questions for each optimization idea:

How many tests will it enable us to skip?
What percentage of PRs will it be applicable for?
What’s the risk of a false negative (i.e., skipping a necessary test)?

We began with simple heuristics, such as:

Not running client automation tests on every backend change – instead, we started running them only when the changes in the PR could affect our internal gateway (our main client-facing service) – we identified these through build-time dependency analysis.

Only running the tests in the same test suite for PRs that change only test files.

These heuristics saved significant time – but we wanted to do more. We realized that for some of our heaviest tests, we could retrospectively determine whether they were truly necessary by analyzing completed PRs and their respective pipelines. To capitalize on this insight, we trained a simple ML classifier using past PRs as a dataset, where the changed files serve as the input (features for the model) and whether the tests were ultimately necessary serves as the label. We then translated the model’s insights into easy-to-understand heuristics used in our pipelines:

Sensitive code areas – Since the model was trained on changed files, we analyzed its coefficients to identify system “hotspots” – the files most relevant to the heavy test class. We combined these insights with our domain knowledge to define sensitive code areas.

Smoke testing (canaries) – Canaries are lighter versions of more intensive tests that provide a strong indication of whether running the full test is necessary. Although we didn’t initially formalize the concept, we noticed when analyzing the models’ coefficients that in cases where a canary test existed implicitly – meaning the heavy test had a corresponding lighter version – the model identified a strong correlation between the canary and the need to run the full test. Building on this insight, we formalized and implemented canaries for all our heavy tests.

1D) Preventing the Introduction of New CI Bottlenecks

Improvements won’t last without enforcement. Developers under pressure to ship features can’t always prioritize CI efficiency if it isn’t enforced. So we’ve introduced guardrails – like limiting the runtime of specific tests and the overall pipeline duration, and we’re continuously monitoring our CI duration to detect regressions.

1E) Incremental Builds

When build and unit test times are low, rebuilding the entire project for each pipeline run is manageable – and the simplicity of doing so has its advantages. But as the system grows larger and more complex and build times increase, rebuilding everything for every PR becomes inefficient – especially since most PRs don’t change most modules. To address this, we implemented incremental builds: caching build artifacts and reusing them instead of rebuilding and retesting when no changes have been made to a given module (with help from open source tools).

1F) Reusing and Sharing Testing Infrastructure

Sharing infrastructure (like databases) or even microservices across pipelines can save time and cost – but adds complexity and may affect stability, as different PRs can interfere with each other. We haven’t implemented this yet, but it’s on our roadmap. When we do, we’ll proceed carefully.

2) Improving CI Stability

A stability issue is defined as a false negative in the pipeline – meaning a certain PR that should have passed the pipeline, but didn’t due to some issue that’s unrelated to it. Almost every stability issue that fails a pipeline forces a requeue (rerunning the pipeline), leading to slower deliveries, higher costs, and developer frustration.

After analyzing many stability issues, we realized that they mainly stem from:

Collisions between parallel pull requests – Since our CI takes a while, and we have many PRs, PRs get tested and integrated in parallel to each other. This means that not every PR gets tested with every other PR that gets integrated. These logical collisions between “competing” pull requests sometimes cause stability issues.

Probabilistic issues (flakiness) – e.g., issues that only manifest occasionally, and therefore might slip through the pipeline.

To address these issues, we introduced the concept of the “latest stable” branch. The core idea is simple: we wanted to create a small buffer between the latest code (which may be unstable for the reasons mentioned above) and the developers, ensuring that the code they work with is stable.

Here’s how it works: The “latest stable” branch is always slightly behind the main development branch. Developers create new feature branches from it and rebase on top of it, ensuring they work with a stable version of the code. Feature branches eventually get merged to the main development branch – not to the latest stable branch (this works because the latest stable branch is always behind the main development branch – they never diverge). The latest stable branch is promoted to a newer version by running multiple instances of our main pipeline on the latest commit in the main development branch. If all the pipelines pass, we mark that commit as stable and promote the latest-stable branch to it. If not, the “on-call” shift is alerted to address the stability issue.

There is, of course, a trade-off when determining what qualifies as a stable commit. Stability is relative. For our purposes, we only need sufficient stability to ensure CI pipelines pass – clearly, we conduct more extensive testing before rolling out the system to clients.

On one hand, we want to provide developers with stable code. On the other hand, testing stability incurs time and cost – and requiring a higher level of stability increases the time and cost of verification. After some trial and error, we agreed to run three parallel CI pipelines every three hours to determine stability. This strategy offers a good balance: it’s stable enough for developers, provides relatively recent code, and is financially feasible.

This system comes with obvious pros and cons:

Pro: Stability issues don’t immediately affect all developers
Con: Devs don’t get the absolute latest code when rebasing

This con is mitigated by the fact that this is a backwards-compatible solution – developers can still rebase from the main branch if they must work with the latest code (in which case, they run the risk of getting the latest stability issues).

The latest-stable concept is also invaluable for detecting and investigating regressions. When a regression is introduced into the main branch, the three stability pipelines begin to fail, making it easier to pinpoint when the issue was introduced. This, in turn, helps quickly identify and revert the problematic change.

In addition to the latest-stable concept, we also introduced test burn-in – running new tests, especially the most error-prone ones, multiple times before integrating them to the pipeline. This significantly reduced test flakiness. We also ensure that our test infrastructure is designed for stability. For example, we have tools in place that enforce best practices, such as using predicates to await results when polling in our end-to-end tests, which helps prevent stability issues caused by eventual consistency.

3) Minimizing “Requeues After Service Deployment”

Once the unit and integration tests pass in a certain pipeline, we deploy a dedicated environment for end-to-end testing of the PR. This is when the CI can start to get expensive.

A requeue after service deployment – re-running the pipeline after deploying and running end-to-end tests – can be challenging. It means duplicated efforts, slower feedback, and higher costs. While we can’t eliminate these entirely, we reduced them by:

3A) Improving CI Stability

As discussed above – more stability means fewer requeues.

3B) Failing Fast

We moved some coverage earlier in the pipeline by converting key end-to-end tests into smarter, more focused unit tests (what’s often referred to as “shift left”). This shift provides developers with faster feedback, makes tests easier to run locally, and helps us detect bugs before deploying costly services. While end-to-end tests are invaluable, unit tests often suffice and are faster and easier to run locally. By improving our unit test infrastructure and educating developers and code reviewers, we’ve shifted more coverage to these faster tests, resulting in fewer end-to-end tests. Additionally, we’re exploring AI tools to further enhance our unit and integration test coverage.

3C) Reducing Merge Conflicts

Requeues often follow merge conflicts, so we worked to reduce them by:

Shortening CI time, as described above (faster merges mean fewer conflicts)
Refactoring large, frequently edited files into smaller logical units
Enforcing consistent formatting via a code formatter

Summary

By applying the methods above, we significantly improved our CI – cutting both runtime and costs by tens of percent and boosting stability. But, continuous integration requires a continuous effort. There’s always room to improve, and many more gains to discover.

Got more ideas for CI improvements? I’d love to hear them: michael.shachar@fundguard.com

Want in on the action? Check out our open positions: fundguard.com/careers

About the Author

Michael Shachar

Learn More >