A design for an AI-multiplied O-G tutoring product — where the AI handles daily practice, and certified human tutors stay in the prescription seat.
One in five kids has some form of dyslexia. The standard intervention — Orton-Gillingham tutoring — works, but it's expensive (often $600+ a month for one-to-one), capped by tutor hours, and most families who could benefit can't access it.
Passive video courses don't fill the gap, because O-G is diagnostic-prescriptive in real time. The tutor has to watch what each kid does, hear the specific sound they missed, and decide what to do next. A linear video can't.
In 2026, AI can absorb the real-time observation layer at scale — listening to a kid name a letter sound, blend two phonemes, or read a word, and catching exactly where it broke down. The certified tutor stays the prescriptive brain. That's the wedge.
A parent of a six-year-old dyslexic kid pays around $249 a month. The kid sits down for fifteen minutes a day on an iPad with an AI reading buddy. Early on, the buddy shows the letter b and asks, "What sound does this make?" The kid says /b/. The AI hears the phoneme, marks it learned, moves on. When the kid says /d/ instead, the AI gently models the correct sound and tries again.
Over weeks, the kid moves from naming individual letter sounds, to blending two and three sounds together, to reading their first CVC words ("cat," "sit," "mop"), to digraphs and beyond. The AI never advances them until they're ready — and never bores them by stalling when they are.
Once a week, a certified tutor opens a dashboard. They watch a 90-second highlight reel of the kid's errors, click a few labels to confirm the AI flagged the right things, and pick the next module from a curated short list. Thirty minutes per kid per week.
One tutor goes from eight students at one-to-one to twenty-five to forty students on supervised AI. The kid progresses two to three times faster than passive video, at a similar pace to one-to-one, at roughly forty percent of one-to-one cost.
Decompose a 30-minute O-G session and you find six things happening:
Steps 1, 2, 3 multiply cleanly. Step 4 multiplies if the family encodes their judgment as a decision tree. Step 5 is engagement design. Step 6 stays human.
The family's intuition — "every kid is different" — is correct as an objection to linear video, wrong as an objection to adaptive AI with a human in the loop. Honor the intuition by making the AI obviously adaptive in the UI, not by giving up.
Three surfaces, one shared data spine.
Big friendly UI, one card at a time. Early cards show a single letter: "What sound does this make?" Later cards move to two-sound blends, then to whole words — always at the kid's actual stage, never ahead of it. The mic captures, the system transcribes at the phoneme level, and feedback comes back in real time. Correct → green and next. Wrong → "Let's try that sound again," with the right sound modeled clearly.
A per-student timeline. A 90-second highlight reel of just the audio where the AI flagged errors. One-click labels — agree, disagree, unclear — which double as training data. Re-prescribe with top three module recommendations. Cross-student pattern view: "These four kids are stuck on the same phoneme."
Plain-English email every Sunday — what their kid worked on, what landed, what's next. No O-G jargon. A 15-minute video call once a month with the family tutor. That's the relationship layer that justifies the price.
The data spine ties it all together: every utterance becomes a phoneme-aligned transcript, becomes an error classification, becomes a session report. Tutor labels accumulate into a per-student model and a global classifier that gets better over time. The data flywheel is the moat.
Start where the kid actually is. Three students. Four weeks of trial. Then a hard decision: did the loop work, or didn't it?
The early-skills slice covers three progressive stages:
Single-letter sound recognition. Consonants first (m, s, t, b…), then short vowels. The AI shows a letter, asks for the sound, listens for the right phoneme.
Two-sound then three-sound blends. /m/ + /a/ = "ma." /s/ + /i/ + /t/ = "sit." The kid says the sounds, then says them together. The AI hears each phoneme separately.
Short-a, short-e, short-i CVC words. "cat," "bed," "sit." The kid reads the whole word; the AI catches errors at the phoneme level and routes back to Stage 1 or 2 if a specific sound is the problem.
Total focused build budget for weeks 1–6: roughly 100–140 hours. Doable evenings and weekends for someone whose stack is already in place. The phoneme classifier is the hardest part — budget 30 of those hours there.
Three people, around $80 a month in services, and a short list of legal gates before going public.
100–140 hours of focused work spread over 90 days. Evenings and weekends. Bun, TypeScript, Cloudflare Workers, an ASR vendor integration, and the kid PWA.
5–8 hours upfront to author the early-skills module library in YAML. 30 minutes per student per week during the 4-week trial for dashboard reviews. Recruit 3 trial students with parental consent.
Writes the bulk of the code alongside Phill. Drafts the YAML schema, the LLM correction-language prompts, the tests, and the data plumbing. Available at every hour Phill is.
Total: ~$80/month to run the MVP for 3 students. Scales roughly linearly with active users.
Phoneme-level ASR — speech recognition that goes beyond words and tells us which sound the kid actually made — is the technical pivot point. Three options:
| Vendor | Pros | Cons | MVP cost |
|---|---|---|---|
| Whisper large-v3 self-hosted |
Cheapest. Full control. Can fine-tune on dyslexic kid voices later. | Needs an extra phoneme-alignment layer. Kid-voice accuracy unknown. Weeks of dev. | ~$10/mo |
| Speechmatics education tier |
Native phoneme output. Tuned for kids' voices. | Vendor lock-in. More expensive at scale. | ~$35/mo |
| AssemblyAI Universal-2 | Phoneme timestamps. Good docs. Fast integration. | Less customizable than self-hosted Whisper. | ~$30/mo |
The play: a one-week bake-off in week 2 with the same five recordings from a real dyslexic seven-year-old. Highest phoneme-level precision wins. Don't decide on price — at three students the cost difference is noise. Migrate later only if scale forces it.
Anchoring: one-to-one O-G is $80–150/hour, typically two sessions a week, so $640–1,200 a month. Online passive video is $20–50 a month and most parents don't believe it works. The AI-multiplied tier sits in the middle.
At 50 kids that's $2,500/month net margin to the business plus $8,000/month in tutor-time payments. Each additional tutor adds about 25 kids — another $1,250/month margin and $4,000/month in wages. Real numbers, but family-business scale, not venture-scale. Which is exactly the point.
Positioning: "Your kid practices daily with an AI reading buddy that listens and corrects in real time. Every week, a certified Orton-Gillingham specialist reviews their progress and adjusts the plan."
NOT: "AI-powered phonics platform." That sounds like ed-tech vaporware. Lead with the certified-human supervision. The AI is the workhorse, not the headline.
Existing one-to-one students grandfather in at current price. The AI tier is for new students who would have been priced out of one-to-one. Over time, some current students who are ready can graduate to lighter-touch AI-supervised tutoring, freeing tutor hours for new full-price one-to-one work. The AI extension never replaces — it expands the bottom of the funnel.
Two design forks need a decision before any code gets written.
Option A: start with strangers — paid ads, organic, dyslexia parent communities. Higher customer-acquisition cost but tests the business model purely.
Option B: start by supplementing the family's existing tutoring waitlist. Zero acquisition cost, warm relationships — but masks whether the product works for people who don't already trust the family.
Option A: yes — when the AI detects an error, it says "let's try that sound again" and models the correct phoneme immediately.
Option B: no — the AI only encourages and moves forward. Errors are flagged silently for the tutor to address later.
Honest risk taxonomy. The first one kills it. The rest slow it down.
The technical risks resolve in the first two weeks. The product and operational risks surface during the trial. The legal risks gate the leap from trial to public launch.