ShipSet is for existing Product Managers (any seniority) who want to upskill into AI Product Manager roles. The curriculum assumes PM fundamentals (PRDs, working with engineers, stakeholders, sprints) and teaches the AI-specific layer on top. For people new to product management, ShipSet includes a separate PM Primer (7 lessons, always free, not part of the paid program) that covers PM fundamentals before the main 90-day program.

How is ShipSet different from Udemy, Lenny, Maven, or Reforge?

ShipSet is builder-first. By Day 28 you ship a working AI prototype with a live URL. By Day 90 you have a live AI feature with eval evidence, a validated cost model, a 90-second demo Loom, and a 10-piece portfolio. Other AI PM courses are lecture-first with optional projects. ShipSet inverts that ratio: real builds first, concepts taught as they become relevant.

Do I need to code to become an AI Product Manager through ShipSet?

No. Light coding helps but is not required. The build lessons use no-code AI builders (v0, Lovable, Cursor, Claude Code) to ship working prototypes. Learners read prompts and modify them; deep coding is not part of the curriculum.

How much does ShipSet cost?

Founding members (first 50) pay ₹2,499 ($79) one-time for lifetime access. Annual is ₹3,499 ($99) per year. Monthly is ₹699 ($14.99) per month. INR via Razorpay, USD via card (coming after LLP). All plans include the same content. The first 5 main lessons are free for everyone, no card required. The separate PM Primer (7 lessons) is always free regardless of plan. Every paid plan has a 7-day money-back guarantee.

How long does ShipSet take to complete?

The program is 90 lessons of about 15 minutes each, for a total of roughly 22 hours over 90 days. There is no calendar pressure: lessons unlock as completed, so learners move at their own pace. Most working PMs complete in 90 to 120 days at 3 hours per week.

Can ShipSet be completed on a phone only?

Partly. About 75 percent of lessons (reading, journaling, critique, reflection) work well on a phone. The other 25 percent are build sessions (writing specs, building prototypes, validating costs with real API calls, recording Loom walkthroughs) and need a laptop. Plan for roughly one laptop session per week.

What does the ShipSet portfolio include at Day 90?

The Day 90 portfolio includes ten artifacts: a live working AI feature with a public URL, a 90-second Loom walkthrough, an AI-native PRD, a 20-row eval suite with measurable scores, a validated cost model, an AI UX flow with placement decisions, an AI product metrics framework, a business case / ROI document, a launch plan with monitoring strategy, and a compiled portfolio document. Plus a public ShipSet certificate with verification URL.

How a PM Should Pick an AI Model in 2026: The 5-Variable Decision Matrix

The PM picks the model. This surprises most candidates we talk to. They assume engineering picks it. In production AI features the model choice is a PM call because it has price, latency, and quality trade-offs that the PM owns. The PM who outsources it to engineering ships features that quietly cost 3-5x what they should.

This post is the model-selection framework we teach in ShipSet Lesson 50 ("Model routing and selection"). Five variables to evaluate, three concrete decision trees, and the trap most PMs fall into when they default to "we use GPT-4 because everyone does." No model worship. No vendor evangelism. The decisions a working AI PM actually makes in 2026.

Why the model decision is a PM call, not an engineering call

Engineering owns the integration. The PM owns the trade-off. The trade-off is between four things that move when you swap models:

Quality: pass rate on your eval suite
Cost: dollars per request
Latency: ms per request
Capability: what shapes of task the model can even do

Engineering can tell you "this model is faster" or "this is cheaper." Only the PM knows which trade-off is acceptable for the specific feature, and only the PM can defend that trade-off to leadership when the bill arrives.

If your engineer picked the model unilaterally, you cannot answer "why this model" in an interview or a senior review. That answer is the PM's job.

The five variables to evaluate

1Task shape

Some tasks fit a tiny cheap model. Some need a frontier model. Most PMs assume frontier; most tasks do not need it.

Task type	Right model class	Why
Classification (10-20 categories)	Small/cheap (Haiku 4.5, GPT-5 nano, Gemini Flash)	High precision possible on small models. Cost matters at scale.
Extraction (pull fields from text)	Small/cheap	Same. Tight output format works well on small models.
Summarization	Small if quality bar is decent; mid-tier if executive-grade	Quality scales with model size here.
Creative generation (marketing copy, ideas)	Mid-tier (Sonnet 4.6, GPT-5 standard)	Small models are too repetitive. Frontier overkill.
Multi-step reasoning (math, code, agents)	Frontier (Fable 5, Mythos 5, GPT-5 reasoning)	Reasoning is where size matters most.
Tool use / agents	Frontier or specifically tuned (Fable 5, Claude Code)	Tool calling reliability degrades on smaller models.
Vision (read images, charts, docs)	Multimodal frontier (Fable 5, Gemini 2.5 Ultra)	Multimodal still wants size for accuracy.

The single biggest PM cost error in 2026: using a frontier model for classification. A Haiku 4.5 classifier matches Fable 5 on most classification eval suites and costs 1/20th as much.

2Quality bar from the eval suite

You picked a candidate model. Now prove it.

Run your eval suite (50+ rows, 20/15/10/5 split) on two candidates. Score each row. Compare pass rates. The model selection becomes a number, not an opinion.

A useful concrete table from a real feature (support ticket auto-router):

Model	Happy path pass	Edge case pass	Adversarial pass	Cost/req
Haiku 4.5	94%	78%	65%	$0.0007
Sonnet 4.6	96%	82%	70%	$0.004
Fable 5	96%	85%	73%	$0.018

Reading this: Haiku is good enough for happy path. Sonnet is the right choice if edge-case handling matters. Fable buys 3 percentage points for 4.5x the Sonnet cost. Unless that 3 points unlocks a different product behavior, Sonnet wins.

The PM who walks into a review with this table closes the model conversation in 2 minutes. The PM without it argues vibes for an hour.

3Latency budget

User-facing features have a latency budget. Backend / batch features do not.

Feature type	Budget	Implication
Chat response (streaming)	500ms to first token, 30 tok/s	Need streaming + smaller model + Anthropic's prompt caching
Form auto-fill	< 800ms total	Small model + structured output
Background tagging	seconds, even minutes	Use any model. Optimize for cost.
Agentic workflow	5-60s per turn	Use any model. Show progress UI.

Latency is where frontier-vs-fast gets non-obvious. Fable 5 is slower per request than Haiku 4.5 for the same task. If you have a UI that streams chat responses, that latency difference is felt. A 1.2 second first-token vs 400ms first-token kills perceived responsiveness even when the actual content is the same.

Engineering will quote latency at p50. You should ask for p95 and p99. Power users hit those.

4Cost per request and at-scale

Calculate the per-user-per-month cost using the seven-variable workbook (see our AI cost modeling post). Compare across model candidates.

Trap: the per-request cost numbers are decimal pennies and easy to dismiss. Multiply by your DAU and you find features that cost $80K/month differ from features that cost $4K/month for the same eval pass rate. The PM who notices this gets promoted.

Worked sanity check for a 10K-user feature, 30 requests/user/month:

Model	Cost/req	Cost/user/month	Cost/month at 10K users
Haiku 4.5	$0.0007	$0.021	$210
Sonnet 4.6	$0.004	$0.12	$1,200
Fable 5	$0.018	$0.54	$5,400

The cost difference between Haiku and Fable is $5,190/month. If your eval shows Haiku is within 5% of Fable's pass rate on your task, the savings buy a junior PM or a senior engineer. The model choice IS a hiring decision in disguise.

5Capability shape (multimodal, tool use, long context)

Some features need things only specific models do well in 2026:

Vision: only multimodal models. As of mid-2026: Claude Fable 5, Claude Sonnet 4.6, Gemini 2.5 Ultra, GPT-5 Vision. Quality varies; eval on your specific images, not on benchmarks.
Tool use / function calling: Fable 5 and GPT-5 are the most reliable for multi-tool, multi-turn calls. Haiku 4.5 can do basic tool calls but degrades with 5+ tools. Llama 3.3 70B works for simple cases.
Long context: Fable 5 (1M tokens), Gemini 2.5 Ultra (2M), GPT-5 (400K). For most PM features you don't need this; if you do, the model choice narrows to two or three.
Structured output: Anthropic models with the strict JSON schema setting are most reliable. GPT-5 with response_format works. OSS models still hallucinate JSON sometimes.

If your feature needs any of these capabilities, the selection narrows hard. If it doesn't, you have the full menu.

Three decision trees for common PM features

Decision tree 1: Classification or extraction feature

Is your eval pass rate target above 90%?
├── No (75-90% acceptable) → Haiku 4.5 or GPT-5 nano
└── Yes (90%+) →
    Is the input typically under 1000 tokens?
    ├── Yes → Haiku 4.5 is likely fine. Eval to confirm.
    └── No (long documents) → Sonnet 4.6 or Gemini Flash with long context

Default to Haiku unless eval forces you up. Most classifiers do not need frontier.

Decision tree 2: User-facing chat / Q&A feature

Does the response need to be conversational (streaming, chat-style)?
├── Yes →
│   Does it need multi-step reasoning (math, code, planning)?
│   ├── Yes → Fable 5 or GPT-5 reasoning
│   └── No → Sonnet 4.6 (sweet spot: quality + speed + cost)
└── No (Q&A with single response) →
    Does it need grounded retrieval (RAG)?
    ├── Yes → Sonnet 4.6 + prompt caching for the retrieved context
    └── No → Haiku 4.5 if the answer is fact-lookup; Sonnet if it's open-ended

User-facing features in 2026 mostly land on Sonnet 4.6. Fable 5 is for the cases where the reasoning gap is product-defining.

Decision tree 3: Background or batch feature

Is throughput more important than latency?
├── Yes (batch processing, tagging at scale) →
│   Cost is the dominant variable. Run eval suite on Haiku 4.5 and Llama 3.3 70B.
│   Pick the cheaper one whose eval pass rate meets your target.
└── No (need fast turnaround but server-side) → Sonnet 4.6 or Gemini Flash

Background features should default to the cheapest model that meets the eval bar. They are where cost optimization compounds.

The trap: defaulting to GPT-4 or Fable 5 because you read about it

Most PMs in 2026 default to whatever model they read about most recently. In 2024 that was GPT-4. In 2026 it is Claude Fable 5 (because it launched in June and is everywhere). This is a tax on your roadmap.

The right discipline: every new feature gets a 1-day model evaluation before the prompt is committed. Run the eval suite on three candidates: one cheap (Haiku 4.5 or Flash), one mid-tier (Sonnet 4.6 or GPT-5 standard), one frontier (Fable 5 or GPT-5 reasoning). Build the comparison table from variable 2 above. Pick the cheapest that meets your eval bar.

This discipline saves real money. It also stops you from over-relying on any single vendor — if Anthropic deprecates a model, you already have a comparison table for the migration.

What to put in the PRD

The model section of an AI PRD is four lines:

Model selection

Primary: [model name] because [1-sentence justification]

Fallback: [different model] (used at high-load or deprecation)

Eval pass rate vs alternatives: [link to comparison table]

Decision criteria: cost was [primary/secondary/blocking]; latency was [primary/secondary/blocking]

Four lines. Reviewers can interrogate the comparison. The PRD does not pretend the decision was obvious.

What changes between now and end of 2026

Three model trends to watch:

1. Cheap mid-tier is closing the gap with frontier. Sonnet 4.6 in mid-2026 matches GPT-4 from 2024 on most evals at 1/10th the cost. The "you need frontier for X" claims keep getting falsified. Re-evaluate your model choice quarterly.

2. Prompt caching changes the math. Anthropic and OpenAI both ship aggressive prompt caching now. If your system prompt is 1000+ tokens (most production features), you save 30-50% on input cost by structuring requests to maximize cache hits. The PM who specs this in the PRD saves real money.

3. Self-hosted is becoming viable for specific shapes. Llama 3.3 70B and Qwen 3 70B are real options in 2026 for high-volume classification and extraction. If you have an in-house ML team and a feature with 100K+ requests/day, build the cost comparison. For most PMs, sticking with API models is still the right call. But the option exists now in a way it didn't in 2024.

TL;DR

The model choice is a PM call. Five variables: task shape, eval pass rate, latency budget, cost at scale, capability requirements.
The biggest PM cost error in 2026: using frontier models for classification. Use Haiku 4.5 or similar for classifier and extractor features.
Default to Sonnet 4.6 for most user-facing features in 2026. Reach for Fable 5 only when the reasoning gap is product-defining.
Always run a 3-candidate eval comparison (cheap / mid / frontier) before committing the model choice. The discipline pays for itself.
Trap: defaulting to whatever model you read about most. Re-evaluate every quarter.
The PRD has a 4-line model section with the comparison table linked.

In ShipSet Lesson 50 ("Model routing and selection") you build a 3-candidate comparison for the feature you are shipping. By Day 90 the comparison table is one of the 10 portfolio artifacts. Hiring managers cite the comparison table specifically in offer-stage interviews.

If you have a feature in production right now and have not re-evaluated the model in 6 months: do it tomorrow. There is a meaningful chance you can drop one tier and save 3-5x without quality loss. That is the highest-leverage PM hour you'll spend this quarter.