mastra-ai/mastra

How Mastra made their slowest test suite 6× faster and cut queue time by 8× with StarSling.

Mastra's slowest test suite dropped from 30 minutes to 5, and under load the wait for a runner fell from 15 minutes to under 2 minutes, all of it identified and shipped by StarSling agents.

“At Mastra we move so fast that the bottleneck becomes reviews and CI. Time spent compounds, and StarSling helps you realize how much time you're losing.”

Abhi Aiyer·Co-founder & CTO, Mastra

5.87×1
Faster Combined store Tests: 8.2×2
Shorter job queue

Series A·$35M raised·San Francisco

Intro

25K stars, 400+ contributors, and a test suite that grows every week

Mastra is the most popular open-source TypeScript framework for building AI agents with over 25,000 stars on GitHub, over 400 contributors, new releases of packages going out daily with 600 to 700 PRs merged every month. Mastra lets anyone building AI agents wire up memory, vector stores, RAG, evals, and more, which means there's tons of test surface area. A framework's whole job is to be fast for the developer building on it, so Mastra proves it on every PR: five test suites fanning out against a couple dozen vector and storage backends. Mastra built and tuned all of it themselves, but the matrix grows every time the framework does, so keeping it fast is a moving target, work that never actually finishes.

Problem

A 30-minute test suite that grew with the framework

Mastra's slowest test suite, Combined store Tests, ran against 22 different vector and storage backends, and its slowest runs took 30 minutes. And on the busy days, part of that wasn't testing at all: with all 22 fanning out at once against a shared pool of GitHub-hosted runners, the ones at the back of the line waited about fifteen minutes for a runner before a single test could start. The suites had also collected the workarounds a fast-moving project accumulates: fixed sleep calls standing in for real readiness checks, services that raced the test runner on startup, a flaky ClickHouse race in the delete path. Mastra had built and tuned this matrix themselves, and each of these was real work to find and fix. But the suite grows every time the framework does, so the list never stopped getting longer, and that tax got paid more and more often.

“On a busy day, a job could sit close to fifteen minutes in the queue before a single test even ran, and across the matrix that was real time gone every day. Now it's a minute or two.”

Abhi Aiyer

Solution

CI that scoped and shipped its own optimizations

Mastra turned on self-driving CI on Feb 24, 2026, and let it run itself. Over the next 73 days StarSling agents opened fourteen PRs on their own, no engineer triaged what to fix next. They replaced fixed-duration sleeps with polling across MongoDB, Chroma, and Couchbase. Added Docker healthchecks where services were racing the test runner. Sharded E2E kitchen-sink across three parallel jobs. Fixed the ClickHouse race. Migrated workflows to StarSling runners in batches as each round of optimizations made the next migration profitable. Mastra's engineers reviewed and merged each PR, but what to fix next was the agents' call.

“Our test suite grows every week. It used to mean someone had to stop and make it fast again; now that just happens in the background while we keep shipping.”

Abhi Aiyer

Results

29m 56s → 5m 06s

In the weeks after migration, once the runner swap and the agents' optimizations had landed, Combined store Tests measured 29m 56s → 5m 06s at the slow-tail p95, 5.87× faster, with the other four suites pulled along behind it. Where the old shared pool left jobs queued about fifteen minutes at the p95 tail under load, StarSling held it to under two minutes, no matter how many contributors were pushing at once. And the bigger win is who stopped doing this work. The fixed sleep calls standing in for real readiness checks, the services racing the test runner on startup, the flaky ClickHouse race in the delete path, StarSling's agents found each one, opened the PR, and handed Mastra's engineers a clean diff to review and merge. The never-ending tuning of a suite that grows with the framework became work that handles itself.

“Our team is really loving StarSling. The runners are just handled, so nobody on my team thinks about CI infrastructure anymore, and the agents keep optimizing for the one thing I care about, minutes saved.”

Abhi Aiyer

Timeline

When each speedup landed

Migration on 2026-02-24. Every dot is a merged starsling/* PR. Hover for the title.

2026-02-24

2026-05-08

← Migration

Workflow speedups

Where the time went

Measured at p95 wall-clock across all branches. Pre-migration: GitHub-hosted runners. Post: StarSling runners. Dates in the methodology note below.

40m 31s

of slow-tail CI removed per commit, across the four suites

End-to-end wall-clock saved at p95, summed across the 4 parallel suites. Every StarSling agent PR, rolled up into one number.

SuiteWall-clock saved / runFaster

Combined store Tests (vector+storage)24m 50s5.87×
Workspace Cloud Tests6m 22s3.69×
Memory Tests5m 54s1.94×
E2E Tests3m 25s1.32×

Combined store Tests (vector+storage)

5.87×

Before29m 56s

After5m 06s

Workspace Cloud Tests

3.69×

Before8m 44s

After2m 22s

2026-04-30#16015ci: migrate slow workflows to StarSling Runners

Memory Tests

1.94×

Before12m 11s

After6m 16s

E2E Tests

1.32×

Before14m 16s

After10m 50s

Methodology

¹Wall-clock p95 of the Combined store Tests workflow, across all branches. Before: GitHub-hosted runners, a 3-day window before migration. After: StarSling runners with the agents' optimizations merged, April 1 to 4.

²Job queue: p95 of each job's wait for a runner (job.started_at − run.created_at) across the Combined store Tests matrix jobs, success jobs only, over the same windows as above.