Rebuilding the core platform of a Series B vertical SaaS without a feature freeze

Sector

Growth-Stage & Scale-ups

Engagement type

Fractional CTO — 14 months

Systems & platforms

AWSGoPostgreSQLTemporalKafkaTerraformDatadog

Business context

A Series B vertical SaaS company serving a regulated industry — healthy growth, healthy NPS, healthy ARR curve. What wasn’t healthy was the platform. The original codebase had served the company through the first three years and several thousand customers, and was now the source of real operational pain: onboarding each new customer took six weeks of configuration and manual data migration, every release was a two-day ceremony, and the engineering team had developed the fatalistic patience of people who know every branch merge will require a ritual sacrifice.

The founding CTO had transitioned out amicably the previous quarter. The investors wanted the architecture fit for Series C diligence. The CEO wanted a platform that did not cost a senior engineer six weeks per new enterprise logo.

The challenge

“Rebuild the platform” is straightforward. “Rebuild the platform while the business keeps growing 80% year-on-year, the roadmap keeps shipping, existing customers don’t experience disruption, and we don’t spend the runway doing it” is the actual challenge.

The technical debt was a distributed monolith with a Postgres schema that had been evolving organically since 2019, a background job system that was part Sidekiq, part cron, and part people remembering to run scripts on Tuesdays, and a configuration layer where adding a new customer involved editing seven files in three repositories. None of it was catastrophically broken. All of it compounded.

My role

Fractional CTO at three days per week, reporting to the CEO, with direct accountability for the engineering organisation (28 engineers across four squads), the architecture, and the technology component of the Series C raise.

What I did

Diagnosis before prescription (weeks 1–4). Spent the first month listening rather than rebuilding. Mapped the platform end-to-end, shadowed the customer onboarding team, read three years of post-mortems, and sat in on customer support escalations. This produced an artefact the team called the constraint map — a visual document showing where the platform’s current design was actively preventing growth versus where it was merely inelegant. We only planned to fix the former. The latter stayed on the shelf where it belonged.

The strangler pattern, executed seriously. Rather than a rewrite, adopted a strict strangler-fig approach. New capabilities were built as event-driven services around the existing monolith. Existing capabilities were only migrated when the business had a concrete reason to touch them. Introduced Temporal for workflow orchestration and Kafka as the event backbone — not because they were fashionable, but because the domain was genuinely event-driven and the existing synchronous plumbing was the root cause of most scaling pain.

Onboarding as the first cut. The single highest-value migration target was the customer onboarding flow, which was both the worst of the existing experience and the clearest business cost. Extracted the onboarding orchestration into a Temporal-backed workflow service, exposed customer configuration as data rather than code, and built a self-service provisioning path for standard customer shapes. The six-week onboarding SLA became a four-day one for 80% of new customers. Everyone else was still measurably faster.

Engineering practice upgrade. Introduced continuous deployment (up from biweekly releases), real observability (up from “grep the logs and hope”), and architecture decision records. Moved the team from four feature squads to three feature squads plus one platform capability squad — a necessary shift to maintain the new foundations long-term.

Series C enablement. Worked with the CEO and CFO on the technology due-diligence pack. This meant building the architecture narrative in the language investors and their technical advisors actually use — scalability, reliability, security posture, team maturity — and backing every claim with evidence from the observability stack rather than engineer testimony.

Outcomes

The platform was re-architected onto event-driven foundations without a product freeze — new features shipped continuously throughout the 14-month engagement, and no customer experienced a migration-related outage. Onboarding time dropped from six weeks to four days for the majority of customers, removing the single biggest operational bottleneck on revenue growth. Infrastructure cost per customer fell 52% as workloads moved off the overprovisioned monolith onto right-sized services. Engineering velocity, measured in sprint-over-sprint story point delivery, more than doubled.

The Series C round closed nine months in with the technology due diligence going through cleanly — no architecture red flags, no surprise findings, no “we’ll fix that after the round” debt items. The permanent CTO hire started the month the engagement transitioned to monthly advisory.

Outcomes

Core platform re-architected onto event-driven services without a product freeze

Onboarding time for a new customer reduced from 6 weeks to 4 days

Infrastructure cost per customer down 52% post-migration

Engineering velocity (story points delivered per sprint) up 2.3x

Platform supported Series C diligence cleanly with no architecture red flags