How I Built a Multi-Step AI Wizard With Next.js Server Actions

The first AI tool I built took 3–4 days. The ninth took 4–8 hours. The difference wasn't experience — it was a 6-step pattern I locked in after tool #1 that turned every subsequent Next.js Server Actions AI wizard into a mechanical build: Zod schema, TypeScript types, service function, Server Action, optional API route, and UI. I used this exact pattern across all 9 AI tools on an affiliate marketing SaaS I shipped, and in this post I'll walk through the full architecture using the most complex tool on the platform — a 5-step Facebook Page Starter wizard — as the case study.

How I Built a Multi-Step AI Wizard With Next.js Server Actions
The 6-step pattern used to build 9 AI tools — Zod schema to working wizard — on a Next.js 16 SaaS platform

Why Building AI Tools Takes Too Long (And the Pattern That Fixed It)

Most AI tool implementations are slow to build because every tool reinvents the same plumbing — auth, validation, rate limiting, error handling. The fix is a shared 6-step pattern where the plumbing is written once and every new tool slots in.

The affiliate marketing SaaS I built has 9 AI-powered tools, all running on Next.js 16 App Router with Google Gemini 2.5 Flash — the same platform I wrote about in how to integrate AI into your business. Tool #1 — the Facebook Page Starter — took 3–4 days to build. Not because it's complex (though it is), but because I was figuring out the architecture in parallel: where does auth live? Where does validation happen? How should errors propagate back to the client? Every decision was made ad hoc, inline, in the file I happened to be working in.

By the time I finished tool #1, those decisions had settled into something repeatable. I extracted them into a pattern, then built tool #2 against the pattern. It took about half the time. By tool #4, each new tool was taking 4–8 hours end-to-end. By tool #9, the only non-trivial part was writing the prompts.

The pattern has 6 steps, always in this order:

  1. Zod schema — Define input and output schemas before writing a single line of logic.
  2. TypeScript types — Derive types from schemas with z.infer. No manual interface duplication.
  3. Service function — Pure business logic and AI call. No auth, no rate limiting, no HTTP concerns.
  4. Server Action — Four sequential gates: auth, rate limit, validate, then call the service.
  5. Optional API route — Only created if the tool needs a public or externally callable endpoint.
  6. Page and UI — React component with wizard steps or a single-step form, calling the Server Action directly.

Every one of the 9 tools follows this sequence. The codebase is ~280 files and ~58,600 lines of TypeScript. The pattern is the reason it's navigable.

The 6-Step Pattern: From Zod Schema to Working AI Tool

Every tool on the platform follows: schema → types → service → server action → optional API route → page/UI — in that order, every time.

Step 1 & 2: Schema First, Types Derived

The schema is the contract between the UI, the Server Action, and the service function. Define it before writing anything else, and everything downstream stays consistent. I use Zod 4 for both input validation and output parsing from the AI — the same schema that validates user input also validates the JSON Gemini returns.

Here's what the schema definition looks like for the first step of the Facebook Page Starter:

// lib/tools/facebook-page-starter/schemas.ts
import { z } from "zod";

// Step 1: Zod schemas — define the contract first
export const pageNamesInputSchema = z.object({
  niche: z.string().min(2).max(100),
  targetAudience: z.string().min(2).max(150),
  style: z.enum(["professional", "casual", "fun", "authoritative"]),
  count: z.number().int().min(3).max(8).default(5),
});

export const pageNameVariantSchema = z.object({
  name: z.string(),
  rationale: z.string(),
  tone: z.string(),
});

export const pageNamesOutputSchema = z.object({
  variants: z.array(pageNameVariantSchema).min(1),
  recommendation: z.string(),
});

// Step 2: TypeScript types derived from schemas — never written manually
export type PageNamesInput = z.infer<typeof pageNamesInputSchema>;
export type PageNameVariant = z.infer<typeof pageNameVariantSchema>;
export type PageNamesOutput = z.infer<typeof pageNamesOutputSchema>;
Tip

Schema-first is not just discipline — it's load-bearing. The schema defined here is used three times: by the Server Action for input validation via safeParse, by the service function as the Zod schema passed to generateTextWithSchema for AI output parsing, and by the client component for TypeScript autocompletion. Change the schema once and the type error surfaces immediately at every consumption site. Write the schema before anything else.

Steps 3–6: Service, Action, Route, UI

Step 3 is a pure TypeScript function: takes typed input, builds a prompt, calls Gemini, returns typed output. No auth, no HTTP, no side effects beyond the AI call. Step 4 wraps the service function in the 4-gate Server Action pipeline (covered in detail in the next section). Step 5 only exists for tools that need a public endpoint — in this codebase, that's auth routes, a cron sync, and the public link redirect. Step 6 is the React page component, which holds wizard state and calls the Server Action per step.

The Server Action Pipeline: Auth → Rate Limit → Validate → Execute

Every AI Server Action on this platform follows exactly 4 gates — authentication, rate limiting, Zod validation, and the service call — in that order, every time.

The 4-Gate Flow

The order matters. Auth first because there's no point rate-limiting an unauthenticated request. Rate limit second because there's no point parsing and validating input for a user who is over quota. Validation third because there's no point calling Gemini on malformed input. This fail-fast ordering keeps each gate cheap and ensures the expensive AI call only runs when all preconditions pass.

// lib/tools/facebook-page-starter/actions.ts
"use server";

import { auth } from "@/lib/auth";
import { checkRateLimit } from "@/lib/rate-limit";
import { generatePageNames } from "./service";
import { pageNamesInputSchema, type PageNamesOutput } from "./schemas";

type ActionResult<T> =
  | { success: true; data: T }
  | { success: false; error: string; retryAfter?: number };

export async function generatePageNamesAction(
  rawInput: unknown
): Promise<ActionResult<PageNamesOutput>> {
  // Gate 1: Authentication — no session, no access, no further processing
  const session = await auth();
  if (!session?.user?.id) {
    return { success: false, error: "Unauthorized" };
  }

  // Gate 2: Rate limit — sliding window, 30 req/min per user
  const rateLimitResult = await checkRateLimit(session.user.id);
  if (!rateLimitResult.allowed) {
    return {
      success: false,
      error: "Rate limit exceeded. Please wait before trying again.",
      retryAfter: rateLimitResult.retryAfter,
    };
  }

  // Gate 3: Zod validation — reject malformed input before touching the AI
  const parsed = pageNamesInputSchema.safeParse(rawInput);
  if (!parsed.success) {
    return {
      success: false,
      error: parsed.error.issues[0]?.message ?? "Invalid input",
    };
  }

  // Gate 4: Service call — pure logic, typed input, typed output
  try {
    const data = await generatePageNames(parsed.data);
    return { success: true, data };
  } catch (err) {
    const message = err instanceof Error ? err.message : "Generation failed";
    return { success: false, error: message };
  }
}

The return type is a discriminated union: { success: true, data: T } or { success: false, error: string }. Server Actions on this platform never throw — they always return a structured object. This makes client-side error handling identical across all 9 tools: check result.success, branch accordingly, display result.error if false.

Server Actions vs API Routes: The Deliberate Choice

Important

Architecture decision: Server Actions for all AI tools, API routes only for auth, cron, and public endpoints. This was intentional, not default. Server Actions are colocated with the UI that calls them, carry the session context natively, and eliminate the route-file-plus-fetch-call boilerplate. The trade-off — they can't be called from outside the app — is acceptable when the product is a private authenticated web app. If you need the AI endpoint to be curl-able, webhook-able, or callable from a mobile app, use an API route.

Server Actions API Routes
Auth session access Native — session available directly Must re-validate session per request
Boilerplate Minimal — just a function with 'use server' Route file + handler + fetch call
Colocation Lives next to the UI that calls it Separate /api/ directory
External access Not callable from outside the app Public — versioned, curl-able
Testing Harder to unit-test in isolation Easy to test with HTTP client
Best for UI-triggered authenticated AI flows Public endpoints, cron jobs, webhooks
Used for in this project All 9 AI tools Auth, cron sync, public link redirects

The Service Function: Pure AI Logic, Nothing Else

Service functions contain only the AI call and business logic — no auth, no rate limiting. This makes them independently testable and reusable across Server Actions, cron jobs, and API routes.

The generateTextWithSchema Utility

"Pure" here means: given the same input, the function does the same thing. No session reads, no database side effects beyond what the business logic explicitly requires, no HTTP response shaping. The function takes typed input, builds a prompt, calls Gemini, and returns typed output. If the AI call fails, it throws — the Server Action catches it and converts the throw into a structured error return.

Every AI service function on the platform routes through a shared utility called generateTextWithSchema:

// lib/ai/generate-text-with-schema.ts
import { GoogleGenerativeAI } from "@google/generative-ai";
import { type ZodSchema } from "zod";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);

export async function generateTextWithSchema<T>(
  prompt: string,
  schema: ZodSchema<T>,
  model = "gemini-2.5-flash"
): Promise<T> {
  const generativeModel = genAI.getGenerativeModel({ model });
  const result = await generativeModel.generateContent(prompt);
  const rawText = result.response.text();

  // Gemini sometimes wraps JSON in ```json ... ``` fences — strip them
  const jsonString = rawText
    .replace(/^```(?:json)?\s*/i, "")
    .replace(/\s*```$/, "")
    .trim();

  let parsed: unknown;
  try {
    parsed = JSON.parse(jsonString);
  } catch {
    throw new Error(`AI returned non-JSON response: ${jsonString.slice(0, 200)}`);
  }

  const validated = schema.safeParse(parsed);
  if (!validated.success) {
    throw new Error(
      `AI output failed schema validation: ${validated.error.issues[0]?.message}`
    );
  }

  return validated.data;
}

// lib/tools/facebook-page-starter/service.ts
import { generateTextWithSchema } from "@/lib/ai/generate-text-with-schema";
import {
  pageNamesOutputSchema,
  type PageNamesInput,
  type PageNamesOutput,
} from "./schemas";

export async function generatePageNames(
  input: PageNamesInput
): Promise<PageNamesOutput> {
  const prompt = `
You are a brand naming expert. Generate ${input.count} Facebook page name variations
for a ${input.style} brand in the ${input.niche} niche targeting ${input.targetAudience}.

Return a JSON object matching this structure exactly:
{
  "variants": [
    { "name": "string", "rationale": "string", "tone": "string" }
  ],
  "recommendation": "string — which variant you recommend and why"
}

Return only the JSON object. No markdown, no explanation.
  `.trim();

  return generateTextWithSchema(prompt, pageNamesOutputSchema);
}

Image Generation: Sequential, Not Parallel

The platform uses a separate image generation utility for Gemini's image model. It wraps @google/genai with responseModalities: ['IMAGE'], handles base64-to-data-URI conversion, and runs all image requests sequentially — not in parallel. The reason: the Gemini Image API returns rate limit errors when multiple image generation requests hit simultaneously from the same API key, even within quota. Sequential calls with no parallelism eliminate the problem entirely, at the cost of longer wall-clock time for multi-image steps.

Warning

Classify errors before deciding to retry. On this platform, errors fall into two categories: permanent (4xx from the AI provider — bad prompt, content policy violation, invalid request) and retryable (429 rate limit exceeded, 5xx server errors). Permanent errors get returned to the user immediately as a structured error message. Retryable errors trigger an exponential backoff retry, up to 3 attempts. Never retry a 400 — you will get the same error 3 more times and delay the user by several seconds for no reason.

Case Study: The 5-Step Facebook Page Starter Wizard

The Facebook Page Starter chains 5 sequential AI calls where each step passes its output as context to the next — page names → domains → logos → profile images → description.

This is the most complex tool on the platform and the one that stress-tested the 6-step pattern. Here's what each step does:

  • Step A — Page name generation: The user provides their niche, target audience, and preferred style. Gemini returns 5 page name variants with rationale and a recommendation.
  • Step B — Domain suggestions: The user picks their preferred page name from Step A. Gemini receives the selected name and suggests matching .com domain options — checking stylistic fit, not live availability.
  • Step C — Logo generation: Gemini Image API generates a logo at 1:1 ratio. The prompt includes the page name from Step A and the domain choice from Step B to ensure visual and verbal consistency.
  • Step D — Profile and cover images: Two separate image generation calls — profile photo at 1:1, cover image at 16:9. Both receive the page name, domain, and logo description as context.
  • Step E — Page description: Gemini writes a full Facebook page description. By this step, the prompt includes the page name, target audience, domain, and the visual brand direction established in Steps C and D.

Managing State Across Steps

The wizard state lives entirely on the client. A single useState object in the page component accumulates the output from each completed step:

// Simplified wizard state shape in the page component
const [wizardState, setWizardState] = useState<{
  step: 1 | 2 | 3 | 4 | 5;
  pageNames?: PageNamesOutput;
  selectedName?: PageNameVariant;
  domains?: DomainsOutput;
  logo?: GeneratedImage;
  profileImages?: ProfileImagesOutput;
}>({ step: 1 });

When the user completes Step A, wizardState.pageNames is set. When they pick a name and submit Step B, both wizardState.selectedName and the raw Step B inputs go to the generateDomainsAction Server Action. Each Server Action receives exactly what it needs — no more. No global store, no URL params, no server-side session for wizard state.

The Context Injection Pattern

The key architectural detail is how context flows between steps. Step C's Server Action receives a context field in its input that contains the output from Steps A and B:

// The Server Action for Step C receives prior step output as typed context
export async function generateLogoAction(input: {
  context: {
    pageName: string;
    pageNameRationale: string;
    selectedDomain: string;
  };
  style: string;
  colorPreference: string;
}): Promise<ActionResult<GeneratedImage>> {
  // ... 4-gate pipeline ...
  const data = await generateLogoImage(input);
  return { success: true, data };
}

The context fields are part of the Zod schema for that step's input — they're validated at Gate 3 just like any other input. This means the wizard is not just stateful at the UI layer; the schema enforces that context flows correctly and is present before any AI call fires. If the client somehow sends a malformed or missing context object, Zod catches it before a single Gemini token is spent.

Each step is also independently retryable. If Step D fails mid-generation, the user can retry Step D without restarting from Step A. The accumulated state from Steps A–C is still in wizardState and gets passed again to the retry call. This is the practical payoff of keeping wizard state on the client rather than trying to reconstruct it from the server.

The Mistake That Cost a Week: Design the Registry Before the First Tool

The tool registry pattern — config-driven sidebar, routing, and status badges — was added after the first 2 tools were already built. Retrofitting those tools to match the new config interface cost a full week.

What the Tool Registry Does

Every tool on the platform is registered as a config object in a single file. The sidebar, the tool grid, and the status badges are all generated from this registry at render time. Adding a new tool means adding one config object and creating the 6 files the pattern prescribes — nothing else requires touching. Here's the simplified shape:

// lib/tools/registry.ts
export type ToolStatus = "active" | "beta" | "coming-soon";

export interface ToolConfig {
  id: string;           // unique slug, used as the route param
  name: string;         // display label in sidebar and cards
  description: string;  // tooltip and card body text
  category: "facebook" | "instagram" | "youtube" | "other";
  subcategory?: string;
  order: number;        // sort position within category
  status: ToolStatus;
  route: string;        // /tools/[slug]
  icon?: string;
}

export const toolRegistry: ToolConfig[] = [
  {
    id: "facebook-page-starter",
    name: "Facebook Page Starter",
    description: "Generate page names, domains, logos, and a full page description in 5 steps.",
    category: "facebook",
    order: 1,
    status: "active",
    route: "/tools/facebook-page-starter",
  },
  // ... 8 more tools
];

The sidebar component maps over toolRegistry, groups by category, sorts by order, and renders the appropriate status badge. New tool: one object in the array, six files on disk, done.

The Refactor That Shouldn't Have Happened

When I introduced the registry after tools #1 and #2 were complete, those tools had hardcoded route strings, inconsistent category labels, and sidebar entries written directly in the layout component. Bringing them into the registry required touching the layout, the route configuration, the tool cards, and the status badge logic — none of which were designed with a registry in mind. It took a week because each change revealed another inconsistency.

The fastest way to build 9 AI tools is to build the pattern once, and the registry before the first tool. I did both — just in the wrong order.

The lesson is specific: design the config schema for what a "tool" is — its fields, its types, its constraints — before writing a single tool. The registry interface is the most important piece of architecture in a multi-tool platform. If you define it up front, every tool you build is automatically consistent with every other tool. If you define it after tool #2, you have two tools to refactor.

I'd also add Redis-backed rate limiting from the start. The current in-memory sliding window (30 req/min per user) is documented in the code as a known limitation — it works correctly for a single Vercel instance but breaks under horizontal scaling because each instance has its own counter. Setting up Redis at the beginning takes the same amount of time as the in-memory implementation; the difference is that it scales. That's the next change coming in v2.

The full architecture I've described here — 6-step pattern, 4-gate Server Actions, context-injecting wizard state, Gemini structured output, and a config-driven tool registry — is what I use at hassanr.com to build production AI SaaS products. Hassan Raza's engineering blog covers these patterns in depth because they're the ones that actually ship.

Frequently Asked Questions

Server Actions give you native access to the auth session without re-validating, while API routes require you to re-fetch and verify the session on every request. For UI-triggered authenticated AI flows, Server Actions eliminate an entire layer of boilerplate: no route file, no fetch call, no manual session check in the handler. They live colocated with the UI that calls them, so the auth context is available directly. The trade-off is that Server Actions cannot be called from external clients — no curl, no external service, no cron trigger from outside the app. On the affiliate marketing SaaS I built, all 9 AI tools use Server Actions for exactly this reason. The only API routes that exist are for auth callbacks, a cron sync endpoint, and a public link redirect. If a tool needs to stay private and authenticated, Server Actions win. If it needs to be callable from the outside world, an API route is the right choice.

Each wizard step gets its own Server Action, and React state on the client holds the current step index plus the accumulated output from all previous steps. No server-side session storage is needed for wizard state. When the user completes Step 1, the client stores that output in a useState object. When they submit Step 2, both the Step 2 inputs and the Step 1 output are passed together as the Server Action argument. The Facebook Page Starter wizard I built follows this pattern across 5 sequential AI steps: page name generation, domain suggestions, logo generation, profile and cover images, and page description. Step C (logo generation) receives the page names from Step A and the domain choices from Step B as context. Each Server Action is independently callable and retryable — if Step C fails, the user can retry without restarting the wizard from Step A. The entire wizard state lives in the component; the server stays stateless.

Rate limiting sits at Gate 2 inside every Server Action, immediately after the auth check and before Zod validation or any AI call. On the platform I built, each authenticated user gets a 30 requests-per-minute sliding window for text generation tools. Image generation tools have a separate per-tool cooldown because the Gemini Image API is more expensive and slower. When the rate limit is exceeded, the Server Action returns a structured error object containing a retryAfter timestamp — the client reads this and shows a countdown timer so the user knows exactly when they can try again. The current implementation uses an in-memory sliding window, which works correctly for a single Vercel instance but breaks under horizontal scaling because each instance maintains its own counter. Redis-backed rate limiting is planned for v2. If you are building for multi-instance deployment from day one, set up Redis at the start — it takes the same time to configure initially and eliminates the scaling problem entirely.