← Sieg Labs
A Product Pitch · Dyslexia · Orton-Gillingham

Multiplying Orton-Gillingham

A design for an AI-multiplied O-G tutoring product — where the AI handles daily practice, and certified human tutors stay in the prescription seat.

One in five kids has some form of dyslexia. The standard intervention — Orton-Gillingham tutoring — works, but it's expensive (often $600+ a month for one-to-one), capped by tutor hours, and most families who could benefit can't access it.

Passive video courses don't fill the gap, because O-G is diagnostic-prescriptive in real time. The tutor has to watch what each kid does, hear the specific sound they missed, and decide what to do next. A linear video can't.

In 2026, AI can absorb the real-time observation layer at scale — listening to a kid name a letter sound, blend two phonemes, or read a word, and catching exactly where it broke down. The certified tutor stays the prescriptive brain. That's the wedge.

A product that meets the kid where they are — from naming letter sounds to reading their first words to fluent decoding — supervised weekly by a certified Orton-Gillingham specialist. Sitting between $30/month passive video and $600/month one-to-one tutoring.

01.The Vision

A parent of a six-year-old dyslexic kid pays around $249 a month. The kid sits down for fifteen minutes a day on an iPad with an AI reading buddy. Early on, the buddy shows the letter b and asks, "What sound does this make?" The kid says /b/. The AI hears the phoneme, marks it learned, moves on. When the kid says /d/ instead, the AI gently models the correct sound and tries again.

Over weeks, the kid moves from naming individual letter sounds, to blending two and three sounds together, to reading their first CVC words ("cat," "sit," "mop"), to digraphs and beyond. The AI never advances them until they're ready — and never bores them by stalling when they are.

Once a week, a certified tutor opens a dashboard. They watch a 90-second highlight reel of the kid's errors, click a few labels to confirm the AI flagged the right things, and pick the next module from a curated short list. Thirty minutes per kid per week.

One tutor goes from eight students at one-to-one to twenty-five to forty students on supervised AI. The kid progresses two to three times faster than passive video, at a similar pace to one-to-one, at roughly forty percent of one-to-one cost.

The surprise upside: the tutor catches patterns across students they'd never have seen with eyes on one kid at a time. The dashboard makes them better tutors, not obsolete ones.

02.What the Tutor Actually Does

Decompose a 30-minute O-G session and you find six things happening:

  1. Present a target word or sound — fully scriptable, AI does this trivially.
  2. Listen to the kid's response — AI does this better than a human at scale, because it never gets tired.
  3. Diagnose what specifically went wrong at the phoneme level — AI does this well if the classifier is good.
  4. Decide what to do next: drill, advance, back up — this is where O-G expertise lives. Family-only.
  5. Encourage and emotionally regulate — AI can do warm and specific okay, but rapport with a real human is different.
  6. Re-prescribe for next session — pure pedagogical judgment. Family-only.

Steps 1, 2, 3 multiply cleanly. Step 4 multiplies if the family encodes their judgment as a decision tree. Step 5 is engagement design. Step 6 stays human.

The family's intuition — "every kid is different" — is correct as an objection to linear video, wrong as an objection to adaptive AI with a human in the loop. Honor the intuition by making the AI obviously adaptive in the UI, not by giving up.

03.The Architecture

Three surfaces, one shared data spine.

For the kid

Daily iPad session

Big friendly UI, one card at a time. Early cards show a single letter: "What sound does this make?" Later cards move to two-sound blends, then to whole words — always at the kid's actual stage, never ahead of it. The mic captures, the system transcribes at the phoneme level, and feedback comes back in real time. Correct → green and next. Wrong → "Let's try that sound again," with the right sound modeled clearly.

For the tutor

Weekly dashboard

A per-student timeline. A 90-second highlight reel of just the audio where the AI flagged errors. One-click labels — agree, disagree, unclear — which double as training data. Re-prescribe with top three module recommendations. Cross-student pattern view: "These four kids are stuck on the same phoneme."

For the parent

Weekly email + monthly call

Plain-English email every Sunday — what their kid worked on, what landed, what's next. No O-G jargon. A 15-minute video call once a month with the family tutor. That's the relationship layer that justifies the price.

The data spine ties it all together: every utterance becomes a phoneme-aligned transcript, becomes an error classification, becomes a session report. Tutor labels accumulate into a per-student model and a global classifier that gets better over time. The data flywheel is the moat.

04.The 90-Day MVP

Start where the kid actually is. Three students. Four weeks of trial. Then a hard decision: did the loop work, or didn't it?

The early-skills slice covers three progressive stages:

Stage 1

Letter sounds

Single-letter sound recognition. Consonants first (m, s, t, b…), then short vowels. The AI shows a letter, asks for the sound, listens for the right phoneme.

Stage 2

Blending

Two-sound then three-sound blends. /m/ + /a/ = "ma." /s/ + /i/ + /t/ = "sit." The kid says the sounds, then says them together. The AI hears each phoneme separately.

Stage 3

First CVC words

Short-a, short-e, short-i CVC words. "cat," "bed," "sit." The kid reads the whole word; the AI catches errors at the phoneme level and routes back to Stage 1 or 2 if a specific sound is the problem.

Weeks 1–2
Family Author the early-skills module library — single-letter sounds (consonants first, then short vowels), simple two-sound and three-sound blends, and the first set of CVC words. 5–8 hours of work, drawing on existing teach-the-teachers material.
Phill Stand up the Bun monorepo, PWA scaffold, basic auth, Stripe sandbox. Run a one-week bake-off on three ASR vendors with five sample recordings from a real student.
Weeks 3–4
Phill Build the phoneme classifier, kid PWA with playable Stage 1 and Stage 2 modules, tutor dashboard with the session report view.
Family Review and approve the AI's correction-language tone across twenty sample utterances. Iterate until it sounds warm, specific, never "wrong" — instead "let's try that sound again."
Weeks 5–6
Phill Add Stage 3 (CVC words). Tutor label widget, re-prescribe action, parent weekly email cron.
Family Identify three trial students — ideally at different starting points (one on letter sounds, one on blending, one on CVC). Get parental consent including audio recording.
Weeks 7–10
Trial. Three students × four weeks × five sessions a week = sixty captured sessions. Family reviews each one weekly via dashboard, labels error flags. Crucially, each kid starts at their stage — not everyone at Stage 1.
Weeks 11–12
Analysis and decision. Tutor agreement below 80%? Iterate the classifier. Above 80%? Expand the module library forward (digraphs, blends, multisyllabic) and open a paid waitlist.

Total focused build budget for weeks 1–6: roughly 100–140 hours. Doable evenings and weekends for someone whose stack is already in place. The phoneme classifier is the hardest part — budget 30 of those hours there.

05.What It Actually Takes

Three people, around $80 a month in services, and a short list of legal gates before going public.

The people

Phill

The builder

100–140 hours of focused work spread over 90 days. Evenings and weekends. Bun, TypeScript, Cloudflare Workers, an ASR vendor integration, and the kid PWA.

Family

The pedagogy

5–8 hours upfront to author the early-skills module library in YAML. 30 minutes per student per week during the 4-week trial for dashboard reviews. Recruit 3 trial students with parental consent.

AI pair

The co-pilot

Writes the bulk of the code alongside Phill. Drafts the YAML schema, the LLM correction-language prompts, the tests, and the data plumbing. Available at every hour Phill is.

The monthly bill

~$35/mo
Speechmatics edu tier
~$20/mo
Correction language
~$11/mo
ElevenLabs kid voice
~$15/mo
CF + Hetzner + Stripe

Total: ~$80/month to run the MVP for 3 students. Scales roughly linearly with active users.

Gates before public launch (beyond MVP)

06.The One Gating Decision

Phoneme-level ASR — speech recognition that goes beyond words and tells us which sound the kid actually made — is the technical pivot point. Three options:

Vendor Pros Cons MVP cost
Whisper large-v3
self-hosted
Cheapest. Full control. Can fine-tune on dyslexic kid voices later. Needs an extra phoneme-alignment layer. Kid-voice accuracy unknown. Weeks of dev. ~$10/mo
Speechmatics
education tier
Native phoneme output. Tuned for kids' voices. Vendor lock-in. More expensive at scale. ~$35/mo
AssemblyAI Universal-2 Phoneme timestamps. Good docs. Fast integration. Less customizable than self-hosted Whisper. ~$30/mo

The play: a one-week bake-off in week 2 with the same five recordings from a real dyslexic seven-year-old. Highest phoneme-level precision wins. Don't decide on price — at three students the cost difference is noise. Migrate later only if scale forces it.

07.The Business Math

Anchoring: one-to-one O-G is $80–150/hour, typically two sessions a week, so $640–1,200 a month. Online passive video is $20–50 a month and most parents don't believe it works. The AI-multiplied tier sits in the middle.

$249/mo
per kid, monthly
~$40/mo
ASR + LLM + infra at scale
~$160/mo
2 hr/mo @ $80/hr
~$50/mo
per kid

At 50 kids that's $2,500/month net margin to the business plus $8,000/month in tutor-time payments. Each additional tutor adds about 25 kids — another $1,250/month margin and $4,000/month in wages. Real numbers, but family-business scale, not venture-scale. Which is exactly the point.

Positioning: "Your kid practices daily with an AI reading buddy that listens and corrects in real time. Every week, a certified Orton-Gillingham specialist reviews their progress and adjusts the plan."

NOT: "AI-powered phonics platform." That sounds like ed-tech vaporware. Lead with the certified-human supervision. The AI is the workhorse, not the headline.

Protecting the existing business

Existing one-to-one students grandfather in at current price. The AI tier is for new students who would have been priced out of one-to-one. Over time, some current students who are ready can graduate to lighter-touch AI-supervised tutoring, freeing tutor hours for new full-price one-to-one work. The AI extension never replaces — it expands the bottom of the funnel.

08.Two Open Questions

Two design forks need a decision before any code gets written.

Question 1 · Go-to-market

Direct-to-parent, or family-roster-first?

Option A: start with strangers — paid ads, organic, dyslexia parent communities. Higher customer-acquisition cost but tests the business model purely.

Option B: start by supplementing the family's existing tutoring waitlist. Zero acquisition cost, warm relationships — but masks whether the product works for people who don't already trust the family.

Recommendation: B for MVP, A for growth. Use the existing roster to validate the loop without spending on ads. Once tutor agreement hits 80% and three students complete four weeks, open the gates.
Question 2 · Real-time feedback

Does the AI ever talk back negatively to the kid in v1?

Option A: yes — when the AI detects an error, it says "let's try that sound again" and models the correct phoneme immediately.

Option B: no — the AI only encourages and moves forward. Errors are flagged silently for the tutor to address later.

Recommendation: a guarded yes. The AI corrects only on high-confidence errors — phoneme classifier confidence above 0.9. Lower-confidence flags go to the tutor's review queue silently. This protects against ASR misreads triggering false corrections while still giving the kid the real-time feedback that's the whole point.

09.Where It Could Break Down

Honest risk taxonomy. The first one kills it. The rest slow it down.

Technical — the gating bet

  1. Phoneme ASR fails on dyslexic kid voices. Precision drops below 80% on real recordings. False positives erode tutor trust. Product becomes a glorified video player. Resolved or killed in week 2 by the bake-off.
  2. Latency chain exceeds 2 seconds per turn. ASR + LLM + TTS round-trip too slow; kid gets bored or confused mid-response. Mitigation: stream wherever possible, cache TTS for fixed prompts, run ASR on partial audio.
  3. iPad Safari microphone quirks. Autoplay restrictions, permission flows, background-audio limits — friction in the first session kills retention. Mitigation: smoke-test on a real iPad in week 2, not week 6.

Product

  1. Dashboard doubles family workload instead of multiplying. Reviewing 25 students' sessions takes longer than tutoring 5 in person. Mitigation: ruthless scannability — the 90-second highlight reel is the entire review surface.
  2. Kid finds the AI boring after 2–3 weeks. Pedagogy is right; engagement design is wrong. Mitigation: tune the reward layer during the trial; treat engagement as a separate problem from pedagogy.
  3. Real-time corrections feel patronizing. The kid disengages from the AI buddy. Mitigation: tune correction frequency thresholds; let some small errors pass without comment.

Business and market

  1. $249/mo doesn't clear. Anchor feels too high vs. passive video, too low vs. trust in human one-to-one. Mitigation: price-test with the family's existing roster before opening to strangers.
  2. Cannibalization without net new revenue. Existing 1:1 students convert to the cheaper tier; total revenue flat. Mitigation: grandfather existing students at current price; AI tier explicitly for new families.

Operational

  1. Recruiting trial students takes 4–6 weeks instead of 1–2. Parental consent and scheduling slip; trial doesn't start on time. Mitigation: identify candidates in week 1, not week 5.
  2. Tutor labeling never gets done. One missed weekly review breaks the data flywheel. Mitigation: make labeling a 15-minute commitment, not a 60-minute one — that's the dashboard's job.
  3. Phill burnout. 100–140 hours of evenings and weekends is real and finite. Mitigation: hard 90-day stop. If MVP isn't shipping by week 10, ship what's working and reassess.

Legal and ethical

  1. COPPA / FERPA exposure. Audio of minors carries storage, consent, deletion, and access-control obligations. Mitigation: MVP runs on handwritten consent for 3 family-roster students; public launch needs a lawyer-reviewed flow.
  2. Data breach scenario. Kids' audio recordings leaked = career-ending event for the family business. Mitigation: encrypt at rest, no third-party training on session audio, parent-owned export, breach response plan written before launch.

The technical risks resolve in the first two weeks. The product and operational risks surface during the trial. The legal risks gate the leap from trial to public launch.