ShipSet is for existing Product Managers (any seniority) who want to upskill into AI Product Manager roles. The curriculum assumes PM fundamentals (PRDs, working with engineers, stakeholders, sprints) and teaches the AI-specific layer on top. For people new to product management, ShipSet includes a separate PM Primer (7 lessons, always free, not part of the paid program) that covers PM fundamentals before the main 90-day program.

How is ShipSet different from Udemy, Lenny, Maven, or Reforge?

ShipSet is builder-first. By Day 28 you ship a working AI prototype with a live URL. By Day 90 you have a live AI feature with eval evidence, a validated cost model, a 90-second demo Loom, and a 10-piece portfolio. Other AI PM courses are lecture-first with optional projects. ShipSet inverts that ratio: real builds first, concepts taught as they become relevant.

Do I need to code to become an AI Product Manager through ShipSet?

No. Light coding helps but is not required. The build lessons use no-code AI builders (v0, Lovable, Cursor, Claude Code) to ship working prototypes. Learners read prompts and modify them; deep coding is not part of the curriculum.

How much does ShipSet cost?

Founding members (first 50) pay ₹2,499 ($79) one-time for lifetime access. Annual is ₹3,499 ($99) per year. Monthly is ₹699 ($14.99) per month. INR via Razorpay, USD via card (coming after LLP). All plans include the same content. The first 5 main lessons are free for everyone, no card required. The separate PM Primer (7 lessons) is always free regardless of plan. Every paid plan has a 7-day money-back guarantee.

How long does ShipSet take to complete?

The program is 90 lessons of about 15 minutes each, for a total of roughly 22 hours over 90 days. There is no calendar pressure: lessons unlock as completed, so learners move at their own pace. Most working PMs complete in 90 to 120 days at 3 hours per week.

Can ShipSet be completed on a phone only?

Partly. About 75 percent of lessons (reading, journaling, critique, reflection) work well on a phone. The other 25 percent are build sessions (writing specs, building prototypes, validating costs with real API calls, recording Loom walkthroughs) and need a laptop. Plan for roughly one laptop session per week.

What does the ShipSet portfolio include at Day 90?

The Day 90 portfolio includes ten artifacts: a live working AI feature with a public URL, a 90-second Loom walkthrough, an AI-native PRD, a 20-row eval suite with measurable scores, a validated cost model, an AI UX flow with placement decisions, an AI product metrics framework, a business case / ROI document, a launch plan with monitoring strategy, and a compiled portfolio document. Plus a public ShipSet certificate with verification URL.

How to Write an AI PRD: A Template + 6 Real Examples PMs Are Using in 2026

The PRD template you used for the SaaS dashboard does not work for an AI feature. You opened a Notion doc, wrote "Problem / Solution / Success metrics" out of habit, then stared at the cursor and realised the framing breaks the moment your feature is non-deterministic.

This article is the AI PRD format we teach in ShipSet Lesson 22 ("Write your AI-native PRD"). Six sections, six examples, plus the one section most PMs skip that costs them the launch. By the end you will have a template you can paste into a doc tonight and start filling in for the feature you actually want to ship.

🎯 TL;DR. An AI PRD needs six sections a SaaS PRD does not: (1) the failure mode, (2) the eval set, (3) the human-in-the-loop boundary, (4) the cost-per-call ceiling, (5) the rollback rule, and (6) the prompt as spec. Skip any of them and engineering will push back, legal will block launch, or finance will surface a cost surprise post-launch.

Why your SaaS PRD template breaks

A traditional SaaS PRD assumes determinism. "When the user clicks Submit, the form validates and saves to the database." You can test that. You can write acceptance criteria. QA can sign off.

AI features are different. They are:

Probabilistic. The same input can produce different outputs. "Summarise this ticket" may produce 5 different summaries across 5 runs.
Cost-bearing per use. Every API call costs money. Compute the unit economics in the spec or surprise yourself at the end of month one.
Failure-mode rich. Hallucinations, refusals, latency spikes, token limits hit mid-response, model deprecation, prompt injection. Each one is a separate failure category.
Eval-dependent. Acceptance criteria become an eval set: a deterministic harness that runs the non-deterministic feature against fixed inputs and measures whether the output passes.

The SaaS PRD has no place for any of these. So when you try to write one, the doc fills with vague phrases like "produces accurate summaries" and ships to engineering, who write back "define accurate." You cannot define accurate without an eval set. You cannot eval without a failure mode list. The doc is broken from the first heading.

The six-section AI PRD

Here is the format we use. Each section answers a specific question engineering, design, finance, or legal will ask in the kickoff. If a section is empty, that question becomes a launch blocker later.

1The failure mode

The question this answers: What happens when the model is wrong?

This is the section most SaaS PMs skip because their old features could not be wrong. Code either ran or threw an error. AI does neither. It produces a confidently-worded answer that is sometimes nonsense.

What to write:

The three most likely failure types for this specific feature (hallucination, refusal, latency, format drift, prompt injection, etc.)
For each, what the user experience is when it happens
Whether the user can tell it is broken (most AI failures fail silently)
The mitigation pattern

Example (AI ticket tagger):

Failure modes:

1. Hallucinated tag: model invents a tag not in our taxonomy.
   Mitigation: validate output against allow-list. Reject + fallback to "needs human review."

2. Low confidence on ambiguous tickets.
   Mitigation: confidence score below 0.6 routes to human review queue, not auto-tagged.

3. Prompt injection in customer message ("Ignore prior instructions").
   Mitigation: customer message wrapped in delimiter tokens. System prompt instructs model to never follow instructions from inside delimiters.

If you cannot write this section, you are not ready to spec the feature.

2The eval set

The question this answers: How will we know it works?

Acceptance criteria for a deterministic feature ("the form submits successfully") become an eval set for an AI feature: a fixed set of test inputs with expected outputs (or expected behaviour), run automatically, scored automatically.

What to write:

The minimum row count for launch (we recommend 20 for prototypes, 100+ for production)
How rows were sourced (real customer messages? synthetic? PM-written?)
The scoring rubric: pass/fail, or graded?
The launch bar: e.g. "ship when >85% pass on real-customer subset"
Who owns the eval set after launch (someone has to add rows when failures surface)

Example (AI summariser for sales calls):

Eval set: 50 sales call transcripts, hand-picked across deal sizes and verticals.

Scoring rubric per summary:
  - Captures the customer's stated objection: pass/fail
  - Captures the requested next step: pass/fail
  - Length between 80 and 200 words: pass/fail
  - No invented attendees or quotes: pass/fail (zero tolerance)

Launch bar: 90% pass rate on all four criteria, 100% on the no-invention check.

Owner post-launch: Sales Ops PM adds 5 rows per week based on rep feedback.

The eval set is not a "nice to have." It is the only honest way to ship and the only way you can answer "did this regression?" three months in.

3The human-in-the-loop boundary

The question this answers: When does the AI act, and when does a human?

Most AI features in 2026 are not autonomous. They are humans-in-the-loop. The PRD has to define the boundary explicitly, or engineering will guess (badly).

What to write:

The exact action the AI takes autonomously
The exact action that requires human approval
The threshold that flips one to the other (confidence score, action category, $ value, etc.)
The UI affordance for human approval

Example (AI auto-responder for support):

AI acts autonomously:
  - Account password reset confirmations
  - Order status lookups
  - Shipping date inquiries

Human approval required:
  - Refund requests (any amount)
  - Account closures
  - Complaints involving the words "lawsuit", "regulator", or "press"
  - Any ticket where confidence < 0.7

UI: human queue lives in /admin/inbox. Tickets show AI's drafted response + "Approve and send" / "Edit and send" / "Discard" buttons.

This boundary will be the thing legal and ops want to see most. Write it early.

4The cost-per-call ceiling

The question this answers: What is the unit economics, and at what scale does this break?

Every AI API call costs money. Without a cost ceiling, you ship a feature that works at 100 users and bankrupts you at 10,000. This is the section finance and the eng lead want.

What to write:

The model you are using and its current pricing (per million input tokens, per million output tokens)
Average input + output tokens per request (estimate or measure)
Cost per request in dollars
Expected monthly request volume at launch and at 12 months
Monthly cost at both
The fallback rule if cost exceeds budget (downgrade model, rate-limit, kill feature)

Example (AI search across user's documents):

Model: Claude Haiku 4.5 ($0.80/MTok input, $4/MTok output)
Avg tokens: 4,000 in (document context) + 400 out (answer)
Cost per query: $0.80 * 4000/1M + $4 * 400/1M = $0.0032 + $0.0016 = $0.0048

Expected volume:
  Launch (month 1): 10K queries → $48/month
  Month 12 (projected): 250K queries → $1,200/month

Budget: $2,000/month max. If we exceed by month 8, switch retrieval-only (no LLM call) for low-confidence queries.

Run this math in the spec. Run it again in eng kickoff. The number always surprises someone.

5The rollback rule

The question this answers: When do we kill the feature?

For deterministic SaaS features, "rollback" is "revert the deploy." For AI features, the feature can degrade silently as the model drifts, your prompt no longer hits the latest training data, or a competitor exposes your prompt via injection. Rollback needs its own trigger conditions.

What to write:

The metric that signals "this is broken now" (eval pass rate dropping below X, refund rate climbing above Y, support tickets containing the feature name spiking)
The threshold value
The action: rollback to prior model? Disable feature? Add human-in-the-loop step?
Who owns the alert and the decision

Example (AI product recommendation widget):

Rollback triggers:
  1. Click-through rate on recommended products drops >20% week-over-week.
  2. Customer support tickets mentioning "wrong recommendation" exceed 0.5% of orders.
  3. Eval set pass rate drops below 80% (eval runs nightly).

Trigger 1 or 2: PM-on-call disables the widget within 4 hours, falls back to manually-curated featured products.
Trigger 3: blocks the next deploy, prompts model re-eval.

Owner: PM on-call, alerted via PagerDuty integration in Datadog.

Without this section you will run a degraded feature for weeks before someone notices.

6The prompt as spec

The question this answers: What does the model do, exactly?

In a SaaS PRD, the spec is "the form has fields A, B, C." In an AI PRD, the spec is the prompt itself. Versioned, in the doc, treated as a contract between PM and eng.

What to write:

The full prompt (system + user template)
The variables substituted at runtime
The expected output format (JSON schema, free text with structure, etc.)
The version number and the change log

Example (AI categoriser for support tickets):

Prompt v3 (current production):

System:
You are a support ticket router for {company_name}. Read the ticket and
choose exactly one category from this list: {category_list}. If you are
not confident, output "needs_human_review". Output ONLY the category name
on a single line. No explanation, no formatting.

User template:
{customer_message}

Variables:
  company_name: pulled from workspace settings
  category_list: workspace-defined, JSON-stringified
  customer_message: raw ticket text (max 4000 chars)

Output: single category name from category_list, or "needs_human_review".

Change log:
  v1 (Mar 4): initial. 76% eval pass.
  v2 (Mar 18): added "if not confident" clause. 84% eval pass.
  v3 (Apr 2): clarified output format constraint. 91% eval pass.

When the prompt is the spec, prompt changes go through PR review like any other shipped change. This is the single biggest mindset shift from SaaS PM to AI PM.

Six real PRD examples to model from

Below are six AI features and how the six sections would look filled in for each. Read the one closest to what you are shipping.

Example 1: AI summary on a long-form document (Notion-style)

Feature: AI-generated summary at the top of every doc >800 words.

1. Failure mode: hallucinated facts (zero tolerance), summary longer than 3 bullets (auto-truncate), refusal on sensitive content (acceptable, show "summary unavailable").

2. Eval set: 100 docs across categories (PRDs, meeting notes, legal contracts). Pass = all 3 bullets reference content in the source.

3. Human-in-the-loop: none. Read-only feature.

4. Cost: $0.0008/summary. 50K summaries/month = $40/month at launch.

5. Rollback: hallucination rate >2% in spot checks → disable feature, surface "Summary temporarily unavailable" banner.

6. Prompt v1: "Summarise the following document in exactly 3 bullets, each under 20 words. Only include facts stated in the document. {doc_content}"

Example 2: AI customer-support reply draft

Feature: Draft an email reply to inbound customer questions, sent for human review.

1. Failure mode: drafted reply contradicts company policy (>0% rate is too much), tone mismatch (judged subjectively), refusal on complex tickets (acceptable, draft says "needs founder review").

2. Eval set: 200 historical tickets with the actual rep reply. Eval scores: factually consistent with policy, tone matches rep examples, includes a specific actionable step.

3. Human-in-the-loop: every draft sent to rep queue for approval. NO autonomous send in v1.

4. Cost: $0.012/draft. 5K drafts/month = $60/month at launch.

5. Rollback: rep rejection rate >40% → revert to manual queue.

6. Prompt v2: "You are a support rep for {company_name}. Reply to the customer message below using only information from our policy doc. Match the tone of these example replies: {examples}. Reply in 80-150 words. End with 'Best,\n{rep_name}'."

Example 3: AI feature recommendation in onboarding

Feature: After signup, recommend which 3 features the user should enable based on their role + answers in the onboarding quiz.

1. Failure mode: recommendation does not match stated role (zero tolerance for B2B), recommends a feature on a plan tier they did not subscribe to (hard reject).

2. Eval set: 50 synthetic users across roles. Pass = recommendation matches role + plan in 95% of cases.

3. Human-in-the-loop: none.

4. Cost: $0.002/recommendation. 20K signups/month = $40/month.

5. Rollback: activation rate of recommended features <30% → disable, fall back to PM-picked defaults.

6. Prompt v1: "User role: {role}. Plan: {plan}. Onboarding answers: {answers}. Pick the 3 features from {available_features} most likely to drive activation for this user. Output JSON: {features: [string, string, string], reasoning: string}."

Example 4: AI search across user's account data

Feature: Type a question in plain English, get an answer using user's own data (invoices, customers, products).

1. Failure mode: hallucinated data (zero tolerance for finance queries), wrong customer pulled in (must match exactly), exposed data from another tenant (security incident, immediate kill switch).

2. Eval set: 100 question-answer pairs across data types. Pass = answer references only the queried tenant's data and is factually consistent with the underlying records.

3. Human-in-the-loop: none, but every query logs the retrieved records so the user can audit.

4. Cost: $0.0048/query. 30K queries/month = $144/month.

5. Rollback: any tenant-leak incident → kill switch, post-mortem. Eval pass rate <80% blocks deploy.

6. Prompt v1: "Answer the user's question using ONLY the records below. If the records do not contain the answer, say 'I do not have that data.' Records: {records}. Question: {question}."

Example 5: AI-generated metadata for uploaded content

Feature: On image upload, auto-generate alt-text, suggested tags, and a short caption.

1. Failure mode: offensive or stereotyped descriptions (manual review queue), hallucinated content not visible in image, refusal on humans (acceptable, leave blank).

2. Eval set: 100 images across categories. Pass = alt-text is descriptive of what is visible, tags are from the allow-list, caption is <100 chars.

3. Human-in-the-loop: user can edit any generated field before save.

4. Cost: $0.006/upload (vision model). 100K uploads/month = $600/month.

5. Rollback: user edit rate >70% on auto-generated alt-text → review prompt, possibly switch model.

6. Prompt v1: "Look at this image. Generate three things: (1) alt-text, max 125 chars, describing what is visible. (2) 3-5 tags from this list: {tag_list}. (3) A caption under 100 chars. Output JSON with keys alt_text, tags, caption."

Example 6: AI agent that books meetings for the user

Feature: Agent reads the user's inbox, drafts replies to scheduling requests, suggests times based on their calendar.

1. Failure mode: books a time the user is not available (zero tolerance), replies to non-scheduling emails (use intent classifier first), confirms a meeting without explicit user approval (no autonomy in v1).

2. Eval set: 100 inbound scheduling emails + corresponding calendar states. Pass = suggested time is genuinely free, reply is on-topic.

3. Human-in-the-loop: agent drafts, user clicks "Send" in app. NO autonomous send.

4. Cost: $0.020/scheduling request (tool use + multiple turns). 2K/month = $40/month.

5. Rollback: time-conflict rate >5% → disable suggestion, agent only flags the email as "scheduling."

6. Prompt v1: "Classify this email's intent (scheduling vs other). If scheduling, propose 3 times when the user is free from {calendar_data} and draft a reply. Output JSON: {intent, proposed_times, draft_reply}."

The section everyone skips

Of the six sections, the one we see PMs skip most often is the cost ceiling. PMs think it is engineering's job. It is not. Engineering will build whatever the spec asks for. If the spec does not bound cost, the feature shipped will be unbounded, and the surprise bill will land on you.

The second most-skipped section is the rollback rule, because for SaaS features rollback was always "redeploy the prior code." For AI features that does not work — model drift, training-data drift, prompt injection mean the feature degrades without any deploy. Without a rollback rule and an alert, the feature can run broken for weeks.

If you only add two sections beyond your old template, add those two.

What "PRD review" looks like for an AI feature

When the PRD lands in your team's review channel, here is the order of comments you should expect (and welcome):

Engineering lead: "What is the eval pass rate threshold?" → answered in Section 2.
Eng lead again: "What is the prompt?" → answered in Section 6.
Design lead: "Where does the human-in-the-loop UI live?" → answered in Section 3.
Finance / founder: "What does this cost at scale?" → answered in Section 4.
Legal / trust: "What happens when it fails?" → answered in Section 1, and rollback in Section 5.

If all five questions answer cleanly from the PRD, you are ready for kickoff. If any is missing, send the PRD back to the draft before scheduling the meeting. The cost of an under-specified AI PRD is not a delayed feature — it is a feature that ships broken, costs a fortune, and erodes trust with the eng team for the next two quarters.

Your next step

If you are about to draft an AI PRD this week, you can copy the six-section template into a Notion doc and start filling it in for the feature you have in mind. The format works for everything from a small AI helper to a full agent feature.

If you want the deeper version — how to write the eval set in Section 2 row-by-row, how to estimate the cost in Section 4 to within 5%, how to choose the rollback metric in Section 5 — that is what we teach in ShipSet. 90 daily lessons, 15 minutes a day. Lesson 22 is the full PRD walkthrough; Lesson 35 is the eval suite deep dive; Lesson 41 is the cost model.

First 5 main lessons are free, no card. Take the 2-minute diagnostic and we will tell you exactly which lesson to start on based on what you are shipping.

How to Write an AI PRD: A Template + 6 Real Examples PMs Are Using in 2026

Why your SaaS PRD template breaks

The six-section AI PRD

1The failure mode

2The eval set

3The human-in-the-loop boundary

4The cost-per-call ceiling

5The rollback rule

6The prompt as spec

Six real PRD examples to model from

Example 1: AI summary on a long-form document (Notion-style)

Example 2: AI customer-support reply draft

Example 3: AI feature recommendation in onboarding

Example 4: AI search across user's account data

Example 5: AI-generated metadata for uploaded content

Example 6: AI agent that books meetings for the user

The section everyone skips

What "PRD review" looks like for an AI feature

Your next step

Build the portfolio that actually gets you hired.

Keep reading