HedgeHunt

A platform where creators build treasure hunts and explorers solve them in the real world.

Product

HedgeHunt is a platform for building and solving real-world treasure hunts. Creators design multi-step hunts with GPS challenges, quizzes, and AI-validated missions in a visual builder. Explorers solve them on mobile using location, camera, and audio.

Challenge

Creators need to edit hunts while explorers are actively solving them. Both sides share the same data, but a creator's changes can never break an explorer's experience. The platform also generates complete hunts from a text prompt, so AI failures can never block an explorer mid-hunt.

Approach

A versioning model isolates drafts from live content. Publishing creates an immutable snapshot that explorers play against. A server-side transform strips answer keys before anything reaches the client. All four apps share types generated from a single source.

Outcome

Publish and roll back without downtime. Explorers play version-locked sessions that never change mid-hunt. Live at hedgehunt.app.


Overview

The platform has three main pieces: a Builder where creators design hunts, an Explorer where players solve them on mobile, and a Backend API that controls game logic and progression. The Builder is a form-heavy React app for designing multi-step hunts. The Explorer is a mobile-first app that walks players through steps one at a time.

Creators start with an empty hunt and add steps. Each step is a different type of challenge.

Clue
Explorers solve a text riddle. No media, no location, just lateral thinking.
Quiz
Multiple choice or free-text answers. The server validates against stored correct answers.
Mission
GPS coordinates, photo capture, or audio recording. The explorer goes somewhere and proves it.
Task
Open-ended submissions that AI validates. No single correct answer.

The Builder includes a live preview, an iframe running the actual Explorer with answer keys stripped by the PlayerExporter before anything reaches the client. When the hunt is ready, the creator publishes it.

Creator Journey

DesignAdd steps in the Builder
PreviewLive iframe with real Explorer
PublishFreeze content into snapshot
ReleasePoint live traffic to version

An explorer opens a hunt link and works through steps one at a time. Walk to a park entrance and photograph the sign. Answer a quiz about local history. Record an audio response explaining what you see. GPS, camera, and microphone all run through the browser. The explorer submits, the server validates, and the next step appears.

The server controls game progression. Each API response tells the Explorer what actions are available next. Explorers follow server-provided links. They never decide what comes next.


Architecture

All four apps live in a TypeScript monorepo. They deploy independently but share types, UI, and the messaging layer through three packages.

@hunthub/shared
Single source of truth for all types and validation schemas. Generated from one OpenAPI spec. If the spec changes and an app doesn't update, CI catches it at build time.
@hunthub/compass
Shared design system. Wraps MUI's createTheme with custom tokens, typography, and component overrides. One import gives any app the full design system.
@hunthub/player-sdk
Communication layer between Builder and Explorer. Sends hunt data to the Explorer iframe through typed messages. A PlayerExporter strips answer keys before anything reaches the Explorer.
Loading diagram…
HuntHub monorepo, 4 apps + 3 shared packages

Type Generation

OpenAPI SpecSingle source of truth
Codegenopenapi-zod-client
Types + ZodShared package
CIBuild-time validation

The generated types define what the API accepts, but the Builder's form state has different needs. A location field that's required in the API needs to be nullable in the form because the creator hasn't chosen a spot yet. Steps without a server ID need stable keys for React to track them. Two transform functions handle the conversion:

  • On load, API data gains form keys and nullable settings
  • On save, form artifacts are stripped and nulls convert back to omissions

UI components only touch form types. The API shape is never exposed to rendering code.

Hunts go through three states: Draft, Published, and Live. Publishing clones the current steps into a versioned snapshot. Releasing points live traffic to that snapshot. Rolling back swaps the pointer to a previous version. No data migration, no downtime.

Loading diagram…
Hunt version lifecycle, Draft > Published > Live

Decisions

Hunt Versioning Model
Context

Creators edit hunts while explorers are actively solving them. If a creator modifies step 3 while an explorer is on step 1, the explorer could eventually reach content that changed mid-session. I needed complete isolation between editing and playing.

Options
  • Single document model. One Hunt document with all steps embedded. Simple to query. But publishing means either deep-cloning the entire document (expensive, hard to track versions) or adding version flags to every field (complexity explosion). Editing and playing would share the same data.
  • Event sourcing. Store every edit as an event, reconstruct state on read. Powerful but overkill at this scale. Makes every read path more expensive for a problem that doesn't need temporal queries.
  • Separate version documents. Hunt holds metadata and a pointer to the live version. HuntVersion holds the frozen content snapshot. Steps are linked per version.
Decision

Three models. Hunt is the master record with a liveVersion field pointing to whichever HuntVersion is currently live. Publishing clones the draft steps into a new HuntVersion. Releasing swaps the pointer. Rolling back swaps it to a different version.

The release itself is a single findOneAndUpdate:

const result = await Hunt.findOneAndUpdate(
  { huntId, liveVersion: currentLiveVersion },
  { liveVersion: version, releasedAt: new Date(), releasedBy: userId },
  { new: true, session }
);
if (!result) {
  throw new ConflictError('Hunt was modified. Retry.');
}

The filter liveVersion: currentLiveVersion acts as optimistic locking. If someone else released between the read and the write, the filter matches nothing and the server throws a conflict. No distributed locks, no queues. The same pattern protects saves, using the client's updatedAt timestamp to catch concurrent edits at both the hunt and step level.

Automatic version pruning protects against unbounded growth. Old published versions are cleaned up, but the currently live version is always protected from deletion.

Tradeoff

More complex schema. Every read that needs step data requires two lookups (version, then steps). Publishing is heavier because it clones step documents. But version isolation is absolute. Creators can edit freely while explorers see consistent, frozen content. Rollback is a pointer swap.

Why MongoDB
Context

Each challenge type has a completely different data shape. A Quiz has options, correct answers, and validation modes. A Mission has GPS coordinates, radius, and media requirements. A Task has AI prompts and audio settings. I needed storage that handles this polymorphism without fighting the schema.

Options
  • PostgreSQL with table-per-type inheritance. A base steps table joined to quiz_steps, mission_steps, etc. Strong typing via Prisma. But adding a new challenge type means a new migration, a new table, and updating every query that touches steps.
  • PostgreSQL with JSONB. One steps table with a challenge JSONB column. Flexible storage, but Prisma doesn't validate JSONB contents. You lose type safety at the database layer.
  • MongoDB with embedded documents. Steps are documents. The challenge field holds sub-objects for each type. Only one is populated. Adding a new challenge type means adding a field. No migrations.
Decision

MongoDB. Challenges are naturally document-shaped with variable structures that map cleanly to embedded objects. Publishing clones step documents with a straightforward insertMany rather than a multi-table transaction.

  • Numeric IDs. A Counter collection with findOneAndUpdate and $inc gives sequential, race-condition-safe, human-readable IDs for URLs and display.
  • No JOINs. Every "join" is an explicit batch query using $in on arrays of IDs, avoiding N+1 problems. withTransaction covers the critical paths: publish, release, and delete.
  • TypeScript tradeoff. MongoDB's TypeScript support isn't as strong as Prisma's. But the OpenAPI spec is the source of truth, so Mongoose schemas just follow along.
Tradeoff

No compile-time query validation like Prisma provides. Some denormalization is needed since you can't JOIN. But the polymorphic challenge structure maps naturally to documents, and the OpenAPI-first approach means type safety comes from the spec, not the ORM.

Explorer Format (PlayerExporter)
Context

The full Step document contains everything: correct answers, AI validation prompts, scoring criteria, admin notes. Explorers must never see any of this. Filtering on the client is not an option because the data has already left the server. One missed field or a browser devtools inspection would leak answers.

Options
  • Client-side filtering. Send the full document, strip fields in the Explorer app. Fast to build. Fundamentally insecure. The data is already on the wire.
  • GraphQL field-level auth. Let the client request only safe fields. Still risky if a field permission is misconfigured. A new sensitive field defaults to exposed unless someone remembers to restrict it.
  • Server-side transformation. A dedicated function that takes a full Step and returns a StepPF (Player Format). The boundary is structural. The server never sends what it shouldn't.
Decision

PlayerExporter runs server-side. It takes a full Step and produces a StepPF containing only what the explorer needs: no answer keys, no AI prompts, no validation criteria. Quiz options are randomized so the order is consistent within a session but unpredictable across sessions.

The same exporter feeds two consumers: the play API and the builder preview. Both receive identical output. The preview always matches exactly what a real explorer would see.

Tradeoff

Extra type definitions and a transformation function to maintain for every schema change. But the security boundary is structural, not behavioral. You can't accidentally leak answers because StepPF physically doesn't carry them. If a new sensitive field is added to Step, the compiler forces a decision about whether StepPF should include it.

Challenge Type System
Context

Four challenge types (Clue, Quiz, Mission, Task) with completely different field schemas. Three things need to work at the same time:

  • The builder lets creators switch a step between types without destroying shared data like title and description
  • The API validates based on type
  • The OpenAPI spec generates clean schemas for all four
Options
  • TypeScript discriminated union. ClueStep | QuizStep | MissionStep | TaskStep. Clean type narrowing with exhaustive switch statements. Each variant carries exactly the fields it needs. But OpenAPI codegen handles unions poorly. MongoDB stores one shape per document, so switching types means replacing the document. The builder form state would need to be destroyed and recreated on type change.
  • Shared object with nullable fields. One Challenge shape with all four type sub-objects as optional. Only one is populated at a time. Switching types means nulling the old field and setting the new one. Same document, same form state.
Decision

Shared object. The schema marks clue, quiz, mission, and task as optional + nullable. A Zod discriminator at the API layer validates that exactly one is populated and matches the type field. The constraint is runtime, not compile-time.

This made three things simpler:

  • OpenAPI generation works with one schema instead of union complexity
  • MongoDB stores one document shape where partial updates work naturally
  • The builder switches types by nulling one field and setting another. No document recreation, no form state loss
Tradeoff

No compile-time exhaustiveness checking. If you access challenge.quiz when the type is actually mission, TypeScript won't stop you. I compensate with strict API validation and consistent access through helper functions that check the type before accessing the specific challenge field.

AI Integration
Context

Two distinct AI use cases. Multiple providers handle different media types, and provider failures can never block an explorer mid-hunt.

  • Validation. Checking explorer submissions for Missions (photos of specific locations or objects) and Tasks (audio recordings) where exact matching is impossible.
  • Generation. Creating complete, playable hunts from a text prompt like "a treasure hunt around Central Park about American history."
Options
  • Validation with a single provider. Simpler integration. But a single point of failure. If OpenAI is down, all Mission and Task validation stops and explorers can't progress.
  • Validation with multi-provider and hard failure. Multiple providers, reject submissions when AI is unavailable. Safe, but frustrating for explorers in the middle of a hunt.
  • Validation with graceful degradation. Multiple providers, each specialized by media type. If all fail, auto-pass with a flag. Explorers never blocked.
  • Generation with free-form JSON. Ask the LLM for JSON, parse, hope it validates. The model hallucinates field names, invents enum values, nests things wrong.
  • Generation with structured outputs. Enforce the output schema during generation. Enrich with external data after.
Decision

Validation. GPT-4o handles text, Google Gemini handles image and audio. If a provider fails or times out at 15 seconds, validation auto-passes with a fallbackUsed flag. Explorers are never blocked by infrastructure. The prompt adapts based on attempt count: after 2 failures, subtle hints. After 3+, direct guidance.

Generation. OpenAI Structured Outputs with a Zod schema derived from the API types. The schema intentionally omits GPS coordinates since the model would hallucinate them. After generation, each step goes through a geocoding pipeline:

  • Google Maps converts location descriptions to coordinates
  • Only high-precision results are accepted
  • If geocoding fails, the step silently converts from a Mission to a Clue

Every generated hunt is immediately playable. If schema validation fails, one retry feeds the specific Zod errors back to the model.

Tradeoff

Auto-pass on validation failure means a bad answer could slip through. But blocking explorers when AI is down is worse for the experience. The generation quality gate trades geographic coverage for accuracy. Some locations become Clues instead of GPS challenges, but nothing requires manual fixes.

AI Hunt Editing
Context

Generation creates a hunt from nothing. A creator types a prompt, the API returns a complete hunt. Editing is a different problem. The creator already has a hunt open in the Builder and wants to modify it with natural language. "Make step 3 harder." "Add a photo mission near the Eiffel Tower." "Remove the last two steps." The model needs to read the current state, decide what to change, and apply changes through the same backend services the Builder uses. I needed an approach that could handle multi-step edits without each tool call adding another round-trip to the model.

Options
  • Traditional agent loop. One model inference per tool call. The model calls getHunt, waits for the result, decides what to change, calls saveHunt, waits again, writes a summary. A three-tool edit needs four inferences. Every tool result feeds back into the context window, so the model re-reads the full hunt JSON on each turn.
  • Client-side diffing. Send the full hunt to the model, get the modified version back, validate, save. No tool calling. But the model must reproduce every field it didn't touch. One dropped field and data is lost silently.
  • Programmatic Tool Calling. The model writes a Python script upfront that handles all tool calls. Each call pauses execution while the backend runs it. Python resumes with the result in a local variable. Creative decisions happen once at code-writing time. The script is mechanical execution.
Decision

Claude's Programmatic Tool Calling. On the first inference, the model writes a Python script. The script calls getHunt to read the current state, modifies steps in memory, calls geocodeLocations for new location-based missions, and calls saveHunt with the complete steps array. Each tool call pauses the script. The backend executes it against real services and resumes with the result. When the script finishes, Claude reads the output and writes a summary on the second inference.

Traditional Agent Loop

Inference 1Decide to call getHunt
ExecuteRun tool, return result to model
Inference 2Decide to call saveHunt
ExecuteRun tool, return result to model
Inference 3Decide to call geocodeLocations
ExecuteRun tool, return result to model
Inference 4Write summary

Programmatic Tool Calling

Inference 1Write Python script with all tool calls
ExecuteScript runs, pauses on each tool call
Inference 2Read stdout, write summary
Decision

Two inferences for any edit, regardless of how many tools it uses. Tool results stay in Python variables and never re-enter the context window. A five-tool edit costs the same as a one-tool edit.

I started with seven tools: getHunt, getStep, addStep, updateStep, removeStep, updateHunt, and geocodeLocation. Most turned out to be wasteful. getStep added a round-trip for no benefit since getHunt already returns all steps. Individual step mutations couldn't handle reordering and required multiple calls for multi-step edits. I consolidated down to four: getHunt, updateHunt, saveHunt, and geocodeLocations. saveHunt accepts the full steps array and saves atomically. Adding, removing, reordering, and updating steps are all one call.

Tradeoff

Tied to Claude's code execution API. The model must produce valid Python in a single pass, so the system prompt needs precise tool documentation and examples. Container startup adds latency to the first call. But for edits that touch multiple steps and need geocoding, the inference reduction is significant. The model makes all creative decisions at code-writing time. Everything after that is execution.


Highlights

Play API

Each API response tells the Explorer what it can do next. The Explorer follows server-provided links, it never decides on its own. Game rules live on the server, so they can change without touching the client.

{
  "step": {
    "id": 3,
    "type": "mission",
    "title": "Find the hidden statue",
    "challenge": { "radius": 50 }
  },
  "_links": {
    "submit": { "href": "/api/play/sessions/abc/submit", "method": "POST" },
    "hint":   { "href": "/api/play/sessions/abc/hint",   "method": "GET"  },
    "skip":   null
  }
}

Session Management

Sessions are locked to the hunt version that was live when the explorer started. Releasing a new version mid-play doesn't interrupt active explorers. localStorage persistence resumes from the last completed step.

Prefetching and LRU Cache

The Explorer prefetches next-step media with async decode() and an LRU cache. Step transitions feel instant because assets are already decoded.

Answer Validation

Six validators, one interface. Each challenge type maps to its own: fuzzy text matching for quiz inputs, GPS distance for location missions, AI analysis for image and audio. Adding a new challenge type means adding one entry to the map.

interface AnswerValidator {
  validate(
    submission: PlayerSubmission,
    challenge: Challenge,
  ): Promise<ValidationResult>;
}
 
const validators: Record<AnswerType, AnswerValidator> = {
  "quiz-choice":      new ExactMatchValidator(),
  "quiz-input":       new LevenshteinValidator(),
  "mission-location": new HaversineValidator(),
  "mission-media":    new AIImageValidator([gemini]),
  "task-audio":       new AIAudioValidator([gemini]),
  "task-text":        new AITextValidator([openai]),
};

Hunt Activity

Creators had no visibility into how players interacted with their hunts. The backend captured session data as players progressed through steps, but none of it was surfaced. Anonymous sessions expire after 24 hours, so any analytics reading raw sessions directly would lose data over time. Every play event writes to two stores: a session record that expires and a running totals document that doesn't. The analytics service reads both, so recent sessions have full detail and lifetime metrics are always accurate.

Loading diagram…
Every play event writes to both stores. Session records expire, but running totals don't. The analytics service reads both.
Activity Overview, step breakdown and insightsSession drill-down with player submissions and AI feedback

Builder Preview

The Builder embeds the Explorer in an iframe with a typed messaging layer. Same PlayerExporter as production. Preview matches exactly what real explorers see.

Hunt Sharing

Three permission levels (Owner, Admin, View) enforced in server-side middleware. The Builder adapts UI per role, but access control never relies on the client.

Form Auto-Save

Before sending an instruction to the AI, the Builder saves the current form state. The model always edits the latest data, not whatever was last persisted. After the edit completes, the form resets to the returned hunt while preserving the creator's step selection.

Prompt Trimming

The system prompt includes four fields per step: ID, type, title, mission type. Full data is available through getHunt. The model gets enough context to decide what to change without processing 50 fields per step on every request.

Compass Design System

@hunthub/compass wraps MUI's createTheme with shared tokens, typography scales, and responsive mixins. One import gives any app the full design system.

Media Processing

Images are compressed client-side before upload. Audio is captured at low bitrate through MediaRecorder. Only compressed files reach S3.


Reflections

The Hunt/HuntVersion separation was the strongest architectural decision. Optimistic locking on release, version-locked play sessions, and instant rollback all fall out naturally from the pointer model. It handles concurrent edge cases without pessimistic locks.

The step storage model is where the tradeoff didn't pay off. Steps as separate documents made publishing straightforward (clone the documents), but every read requires a secondary query. Embedding steps within HuntVersion would make the common read path a single document fetch. The publish operation gets more complex, but reads happen far more often than publishes.

The PlayerExporter boundary was worth every extra line of code. It turns a behavioral contract ("don't send answers to explorers") into a structural one that the type system enforces. Every new sensitive field on Step requires an explicit decision about whether StepPF carries it.

The iframe SDK is the riskiest integration boundary. Builder and Explorer deploy independently. A breaking change in the postMessage protocol would silently break preview with no compile-time warning. This needs contract tests between the two apps.


  • HedgeHunt, live product
  • GitHub, private repo (available on request)