It’s a Tuesday afternoon in late April. I’m watching a buyer at a mid-tier Australian furniture retailer drop a product image into ProductMatch for the first time. A pendant lamp — modern, brass, the kind of thing that costs anywhere between four hundred and twelve hundred dollars depending on which competitor you’re matching to.
Ninety seconds later, she has a recommendation. Floor $649. Sweet spot $799. Ceiling $899. Five visually similar products from Temple & Webster, Freedom, West Elm, Castlery, and Beacon Lighting, with similarity scores in the high eighties, current prices fetched within the last twenty-four hours, and a paragraph explaining why this one sits where it does.
She prices the SKU at $799 in the next twenty minutes. Two days later, it converts at full margin.
That’s the moment I stopped thinking of ProductMatch as a project I built and started thinking of it as a product I run. The product she’s using is v2. The version that first went live — two weeks from a sketch on a Linear ticket — was simpler, but it was already a complete commercial SaaS. Real signup. Real billing. Real multi-tenancy. Real AI matching on real Australian retailers.
The two-week sprint is the headline. The iteration is the actual story. And here’s the part I want to be honest about: neither of those numbers is the interesting bit. The interesting bit is what they prove.
The pain ProductMatch solves
I’ve spent years sitting next to retail buyers as they make pricing calls. The big retailers — the Temple & Websters and Freedoms of the world — run competitive pricing as a standing function. They have BI teams, they have weekly cadences, they have analysts whose job it is to look at every category, every hero SKU, every sale event and answer the question: where do we sit?
The mid-market doesn’t have any of that. Same decisions. Same margin consequences. Different infrastructure.
What actually happens in the mid-market is a buyer or merchandiser opens fifteen browser tabs on a Tuesday afternoon, eyeballs products that might be comparable, copies prices into a spreadsheet, and second-guesses themselves the whole way through. Is that dress really the same dress? Is that lamp really comparable? The existing pricing-intelligence tools match on keywords and SKU codes — which works fine for grocery and electronics, where every product has a barcode and a canonical name, but breaks completely for furniture, homewares, fashion, anything where similarity is visual.
I’d watched smart people guess when they shouldn’t have to. So I built the tool I wished existed when I was watching them.
“The big retailers reprice in real time. The mid-market makes the same calls on gut feel. Not because the people are worse — because the infrastructure isn’t there.”
The first decision: image-first, not SKU-based
Most pricing tools start with a SKU. You upload a spreadsheet, they fetch competitor data, you get a report.
ProductMatch starts with an image. You drop in a hero shot. The system finds visually similar products at Australian competitors using AI visual reasoning, scores the similarity numerically, fetches live prices, and produces a recommendation with the reasoning attached.
The image-first choice was the first non-obvious decision and it shaped everything downstream. SKU-based matching is fast and cheap but it can’t see the product. Image-based matching is slower and more expensive but it sees what the buyer sees. For furniture and homewares, where two beds with completely different SKUs might be functionally identical and two beds with similar names might be entirely different products, this is the difference between a tool that works and a tool that lies to you.
The decision came with a constraint: anything visual is computationally expensive. Which led to the second decision.
The second decision: multi-model AI pipeline
A naive build would use one model for everything. A good build uses the right model for each step.
ProductMatch runs Claude Haiku at the top of the funnel — fast, cheap, catches obvious mismatches before they hit the expensive models. Then Claude Sonnet for the actual visual similarity scoring, where reasoning depth matters more than throughput. Then Sonnet again with extended thinking enabled for the final pricing recommendation, where I want the model to actually consider the full set of competitor matches and produce a defensible number.
Three different models, three different jobs, three different cost profiles. The result is a pipeline that runs in ninety seconds and costs single-digit cents per scan, which means the unit economics work at A$79/month for 500 scans. A single-model build would have cost five to ten times as much per scan and the margin would have been gone.
This is the part of AI product development that the demos and the threads on X don’t talk about. The model isn’t the product. The pipeline is the product. The pipeline is where the engineering judgment lives.
The third decision: trust architecture before features
Here’s the thing nobody tells you when you start building AI products: the AI is the easy part.
The hard part is convincing a buyer to act on what the AI says. If a recommendation comes out as just a number — “$799” — nobody trusts it. They’ll go back to opening tabs.
So ProductMatch shows everything. Every recommendation lists the matched competitor products. Every match has a similarity score from zero to one hundred. Every score has the reasoning behind it. The buyer can flag any single match as wrong with one click, and that correction sticks for their workspace and informs future scans. The headline price has a floor, a sweet spot, and a ceiling, with the rationale for each.
This isn’t decoration. This is the product. A buyer who can defend a price to their CEO is a buyer who’ll keep scanning. A buyer who has to trust a black box won’t. The trust architecture got built before the third feature. The third feature got built because the trust architecture already existed.
“The AI is the easy part. The hard part is convincing a buyer to act on what the AI says without flinching.”
The fourth decision: multi-tenant SaaS from day one
A lot of AI products start as a demo and get retrofitted into a SaaS later. That retrofit is where most of them die.
ProductMatch was multi-tenant from the first commit. Clerk Organisations for auth, so a retailer can invite their team. Supabase row-level security policies enforce tenant isolation at the database layer, so one retailer’s scans can never leak into another’s. A credit ledger lives in Postgres with server-enforced atomic decrement on every scan, so credits can’t be cheated by a clever client. Stripe-hosted Checkout and the Billing Portal handle subscriptions, so plan changes and cancellations never touch my code. There’s no admin dashboard to babysit — customers sign themselves up, change their own plans, and cancel from the Stripe portal in two clicks.
This sounds like overkill for a two-week v1. It’s the opposite. Doing it on day one cost maybe two days of the two weeks. Retrofitting it later would have cost six months and a full rebuild of the data model.
The commercial discipline matters as much as the technical discipline. If you’re building an AI product, you’re building a business. The business architecture and the technical architecture have to land together. That’s why v1 had Stripe-hosted Checkout live alongside the AI pipeline — not as a stretch goal for v2.
The discipline that made two weeks real
I want to be specific about this because “two weeks” is the kind of claim that gets thrown around as marketing.
The two weeks didn’t happen because AI wrote the code for me. AI wrote a lot of the code — Claude Code did the implementation work on probably eighty percent of the codebase, and Claude Design produced the marketing site and most of the in-app layouts. But Claude Code doesn’t know what to build. Claude Design doesn’t know what the product should feel like. Those decisions are still mine, and they’re the ones that take the time when you get them wrong.
The way I made two weeks real was by treating Linear as an architectural notebook. Every meaningful decision — image-first matching, multi-model pipeline, trust mechanics, tenant isolation, credit ledger semantics, pricing tiers — got its own Linear ticket with the decision written down before any code was written. Why this choice, why not the alternatives, what we’re trading off, what we’d need to revisit if assumptions changed.
This sounds like overhead. It’s the opposite. The artefacts are how I moved fast. When Claude Code asked “should I use server-side or client-side credit decrement,” I didn’t have to think about it — the answer was already in Linear, with the reasoning. When I came back to the codebase the next morning, I didn’t have to reconstruct what I’d decided. The decisions were durable.
This is the working method that compresses six months into two weeks. Not the AI. The discipline that makes AI useful.
“AI wrote the code. The architecture was still my job. Two weeks came from making that split explicit — not from letting the AI drive.”
What polish and v2 added
Two weeks shipped a commercial product. It didn’t ship the right commercial product yet.
The thing nobody tells you about shipping fast is that v1 in front of real customers teaches you things no architecture document can. The first week of polish was edge cases — products with unusual aspect ratios that broke the visual similarity scorer, competitor sites that started rate-limiting the live price fetch, a credit ledger race condition that surfaced once two users on the same plan scanned simultaneously. Boring problems. Real problems. The kind of problems that only exist once your AI pipeline meets the actual internet.
v2 is where the product started feeling finished. The trust mechanics tightened: similarity scores got more honest (the original scoring was over-confident on near-misses), the reasoning blocks got specific enough to actually defend a price, the flag-as-wrong correction got smart enough to influence future scans rather than just file the complaint. The pricing tiers landed where the unit economics actually worked. The marketing site got rewritten with conviction once I’d watched real buyers describe their pain.
This is the part of building an AI product that the two-week sprint can’t do for you. v1 proves the loop. Polish proves the engineering. v2 proves the product.
The compression isn’t just in the build. It’s in the iteration loop. Each cycle — v1 to polish, polish to v2 — was days, not months. Because the architectural decisions were durable and the code was AI-assisted, every iteration touched the right files and didn’t break the rest. That’s the unlock.
What this means if you’re building an AI product
The barrier to shipping AI products has collapsed. Not because the models got smarter. Because the toolchain around them — Claude Code, Claude Design, the AI gateways, the hosting platforms — got coherent enough that a single engineer with a clear architectural vision can do what used to require a small team.
The barrier that hasn’t collapsed is taste. Knowing what to build. Knowing where the trust boundaries are. Knowing which model to use for which step. Knowing which decisions are durable and which can be deferred. Knowing when a “ninety-second user experience” is the product and when it’s a constraint to design around. Knowing what to harden in polish and what to add in v2.
That’s the part you can’t outsource to an AI. It’s the part founders should be doing themselves. And it’s the part where two weeks of focused work, plus a few weeks of iteration with the right discipline, lands you a working product that real customers will pay for.
What I’m doing with this now
ProductMatch is live at productmatch.com.au. Australian retailers can sign up today, get fourteen days free, and run real pricing scans. It’s a real business with real customers and a real subscription model, not a portfolio piece.
The working method I built it with — the multi-model pipeline thinking, the trust architecture, the Linear-as-architect-notebook discipline, the two-week-v1-then-iterate sprint rhythm, the Claude Code and Claude Design toolchain — is now the basis of a new consulting service I’m offering. Build with me is a 6 to 10 week scoped engagement that takes a founder from “I have an AI-shaped idea” through to a working commercial product in front of customers — v1 fast, then iterated to land.
If you’re a founder or operator who can see an AI-shaped opportunity in your business and you’d rather work with someone who’s shipped it than someone who’ll learn on your dollar, that’s the conversation I’m having now. The economics of building AI products have changed. The companies that figure out how to ship at this pace in the next eighteen months are going to look structurally different from the ones that don’t. I’d rather you be on the right side of that.
If you want to see what ProductMatch actually does, run a scan yourself. If you want to talk about building one for your business, start a conversation.