Skip to main content

Summaries

Use the summaries API to generate, read, and edit VLM-derived summaries of a video against a specific persona. A summary row is keyed by the (videoId, personaId) pair; one persona can have at most one summary per video.

Endpoints

GET    /api/videos/:videoId/summaries
GET /api/videos/:videoId/summaries/:personaId
POST /api/summaries # write a hand-crafted summary
PUT /api/videos/:videoId/summaries/:summaryId # edit existing
DELETE /api/videos/:videoId/summaries/:personaId
POST /api/videos/summaries/generate # enqueue VLM job
GET /api/jobs/:jobId # poll the VLM job

GET /api/videos/:videoId/summaries returns only summaries on personas the requester owns.

Generate a summary

curl -X POST http://localhost:3001/api/videos/summaries/generate \
-H 'Content-Type: application/json' --cookie cookies.txt \
-d '{"videoId":"<id>","personaId":"<id>"}'
# {"jobId":"<job>","status":"queued"}

The job runs in the BullMQ summarization queue. The model service performs frame extraction, audio transcription, and VLM captioning; when complete it writes the result back as a VideoSummary row. Poll the job at GET /api/jobs/:jobId; the poll endpoint enforces ownership of the persona that owns the job's data.

Summary row fields

The columns most commonly read by clients:

summary           Json    GlossItem[]; each item is {type, content, refType?, refPersonaId?, refClaimId?} for rich text with inline references
visualAnalysis String? raw VLM output before paragraph splitting
audioTranscript String? transcript text
keyFrames Json? selected frame ids and timestamps
transcriptJson Json? structured transcript (vendor-specific)
audioLanguage String? ISO code from the transcription vendor
audioModelUsed String? vendor adapter id
visualModelUsed String? VLM id (e.g. "qwen-2-5-vl-7b")
fusionStrategy String? "sequential" | "timestamp_aligned" | "native_multimodal" | "hybrid"
claimsJson Json? serialized extracted claims (denormalized)
comment Text? user-authored comment

The full set is in server/prisma/schema.prisma under model VideoSummary and the canonical TypeBox is in server/src/routes/summaries.ts.

Edit a summary

PUT /api/videos/:videoId/summaries/:summaryId replaces the summary field with a new GlossItem[] array (the request body accepts only summary; no other fields are writable through this endpoint). The route loads the existing row and runs request.ability.can('update', subject('VideoSummary', existing)), throwing ForbiddenError if the caller's abilities (built by buildAbilities) do not permit updating that specific row.

Reasoning traces

Summaries produced by a thinking-capable VLM (Qwen3-VL thinking variants) carry an optional ReasonedText block on the model service response, capturing the chain-of-thought trace alongside the visible summary text. The DTOs are documented in Guide > Reasoning traces. The frontend renders the trace inside a collapsible panel when present.