mastra-ai/mastra
How Mastra made their slowest test suite 6× faster and cut queue time by 8× with StarSling.
Mastra's slowest test suite dropped from 30 minutes to 5, and under load the wait for a runner fell from 15 minutes to under 2 minutes, all of it identified and shipped by StarSling agents.

“At Mastra we move so fast that the bottleneck becomes reviews and CI. Time spent compounds, and StarSling helps you realize how much time you're losing.”
Series A·$35M raised·San Francisco
Intro
25K stars, 400+ contributors, and a test suite that grows every week
Mastra is the most popular open-source TypeScript framework for building AI agents with over 25,000 stars on GitHub, over 400 contributors, new releases of packages going out daily with 600 to 700 PRs merged every month. Mastra lets anyone building AI agents wire up memory, vector stores, RAG, evals, and more, which means there's tons of test surface area. A framework's whole job is to be fast for the developer building on it, so Mastra proves it on every PR: five test suites fanning out against a couple dozen vector and storage backends. Mastra built and tuned all of it themselves, but the matrix grows every time the framework does, so keeping it fast is a moving target, work that never actually finishes.
Problem
A 30-minute test suite that grew with the framework
Mastra's slowest test suite, Combined store Tests, ran against 22 different vector and storage backends, and its slowest runs took 30 minutes. And on the busy days, part of that wasn't testing at all: with all 22 fanning out at once against a shared pool of GitHub-hosted runners, the ones at the back of the line waited about fifteen minutes for a runner before a single test could start. The suites had also collected the workarounds a fast-moving project accumulates: fixed sleep calls standing in for real readiness checks, services that raced the test runner on startup, a flaky ClickHouse race in the delete path. Mastra had built and tuned this matrix themselves, and each of these was real work to find and fix. But the suite grows every time the framework does, so the list never stopped getting longer, and that tax got paid more and more often.
“On a busy day, a job could sit close to fifteen minutes in the queue before a single test even ran, and across the matrix that was real time gone every day. Now it's a minute or two.”
Solution
CI that scoped and shipped its own optimizations
Mastra turned on self-driving CI on Feb 24, 2026, and let it run itself. Over the next 73 days StarSling agents opened fourteen PRs on their own, no engineer triaged what to fix next. They replaced fixed-duration sleeps with polling across MongoDB, Chroma, and Couchbase. Added Docker healthchecks where services were racing the test runner. Sharded E2E kitchen-sink across three parallel jobs. Fixed the ClickHouse race. Migrated workflows to StarSling runners in batches as each round of optimizations made the next migration profitable. Mastra's engineers reviewed and merged each PR, but what to fix next was the agents' call.
“Our test suite grows every week. It used to mean someone had to stop and make it fast again; now that just happens in the background while we keep shipping.”
Results
29m 56s → 5m 06s
In the weeks after migration, once the runner swap and the agents' optimizations had landed, Combined store Tests measured 29m 56s → 5m 06s at the slow-tail p95, 5.87× faster, with the other four suites pulled along behind it. Where the old shared pool left jobs queued about fifteen minutes at the p95 tail under load, StarSling held it to under two minutes, no matter how many contributors were pushing at once. And the bigger win is who stopped doing this work. The fixed sleep calls standing in for real readiness checks, the services racing the test runner on startup, the flaky ClickHouse race in the delete path, StarSling's agents found each one, opened the PR, and handed Mastra's engineers a clean diff to review and merge. The never-ending tuning of a suite that grows with the framework became work that handles itself.
“Our team is really loving StarSling. The runners are just handled, so nobody on my team thinks about CI infrastructure anymore, and the agents keep optimizing for the one thing I care about, minutes saved.”
Timeline
When each speedup landed
Migration on 2026-02-24. Every dot is a merged starsling/* PR. Hover for the title.
2026-02-24
2026-05-08
- #134662026-02-24migration
Migrate Combined store Tests to StarSling Runners
- #136072026-02-28
Migrate workflows to StarSling runners
- #136102026-02-28
test(mongodb): optimize vector test suite by replacing fixed sleeps with polling
- #138082026-03-05
test(mongodb): replace fixed sleeps with polling
- #139372026-03-06
perf(memory): limit vitest parallelism to prevent OOM kills on CI
- #138662026-03-07
test(chroma): reduce fixed 2000ms waitForIndexing sleep to 200ms and add Docker healthcheck
- #139652026-03-07
test(couchbase): optimize vector test suite by replacing fixed sleeps with bucket.ping() and reducing wait times
- #140982026-04-13
chore(ci): add timeout-minutes to all StarSling-hosted workflow jobs
- #145202026-04-28
test(mssql): replace TCP healthcheck with sqlcmd, simplify pretest, pin Docker image
- #158882026-04-29
ci: shard E2E kitchen-sink across 3 parallel jobs
- #158952026-04-29
fix(clickhouse): make deleteTask/deleteTasks await mutation completion
- #140442026-04-29
test(clickhouse): stop merges before TRUNCATE, use tmpfs and Docker healthcheck
- #160152026-04-30
ci: migrate slow workflows to StarSling Runners
- #163302026-05-08
ci(e2e): migrate remaining matrix jobs to starsling-ubuntu-24.04
← Migration
Workflow speedups
Where the time went
Measured at p95 wall-clock across all branches. Pre-migration: GitHub-hosted runners. Post: StarSling runners. Dates in the methodology note below.
of slow-tail CI removed per commit, across the four suites
End-to-end wall-clock saved at p95, summed across the 4 parallel suites. Every StarSling agent PR, rolled up into one number.
- Combined store Tests (vector+storage)24m 50s5.87×
- Workspace Cloud Tests6m 22s3.69×
- Memory Tests5m 54s1.94×
- E2E Tests3m 25s1.32×
Combined store Tests (vector+storage)
5.87×- 2026-02-24#13466migrationMigrate Combined store Tests to StarSling Runners
- 2026-02-28#13610test(mongodb): optimize vector test suite by replacing fixed sleeps with polling
- 2026-03-05#13808test(mongodb): replace fixed sleeps with polling
- 2026-03-07#13866test(chroma): reduce fixed 2000ms waitForIndexing sleep to 200ms and add Docker healthcheck
- 2026-03-07#13965test(couchbase): optimize vector test suite by replacing fixed sleeps with bucket.ping() and reducing wait times
- 2026-04-29#14044test(clickhouse): stop merges before TRUNCATE, use tmpfs and Docker healthcheck
- 2026-04-13#14098chore(ci): add timeout-minutes to all StarSling-hosted workflow jobs
- 2026-04-28#14520test(mssql): replace TCP healthcheck with sqlcmd, simplify pretest, pin Docker image
- 2026-04-29#15895fix(clickhouse): make deleteTask/deleteTasks await mutation completion
Workspace Cloud Tests
3.69×Memory Tests
1.94×E2E Tests
1.32×Methodology
1Wall-clock p95 of the Combined store Tests workflow, across all branches. Before: GitHub-hosted runners, a 3-day window before migration. After: StarSling runners with the agents' optimizations merged, April 1 to 4.
2Job queue: p95 of each job's wait for a runner (job.started_at − run.created_at) across the Combined store Tests matrix jobs, success jobs only, over the same windows as above.