Automated Tracking Workflow
Automated tracking uses AI models to generate bounding box sequences for you. This guide covers how to run tracking, review results, and refine them for production use.
Overview
FOVEA integrates object detection and tracking models to bootstrap annotation workflows. Instead of manually creating keyframes, you can:
- Run an AI tracking model on your video
- Review the generated tracking candidates
- Accept high-confidence tracks as annotations
- Refine accepted tracks by adding or adjusting keyframes
This hybrid approach combines the speed of automation with the precision of manual refinement.
Available Tracking Models
| Model | Use Case | Strengths | Speed |
|---|---|---|---|
| SAMURAI | General object tracking | High accuracy, handles occlusion | Medium |
| SAM2Long | Long video sequences | Maintains identity across long clips | Slow |
| SAM2.1 | Short segments | Fast, good for quick iteration | Fast |
| ByteTrack | Multiple objects | Tracks many objects simultaneously | Fast |
| BoT-SORT | Crowded scenes | Handles overlapping objects | Medium |
| YOLO11n-seg | Real-time detection | Very fast, good for pre-filtering | Very Fast |
Start with SAMURAI for general use. Switch to ByteTrack when tracking 5+ objects simultaneously.
Step 1: Open the Detection Dialog
- Load a video in the annotation workspace
- Click the "Detect Objects" button in the toolbar
- The detection dialog opens with options for detection and tracking
Step 2: Configure Tracking Options
Enable Tracking
- Check the "Enable Tracking" checkbox
- The dialog expands to show tracking options
Select Tracking Model
Choose a tracking model from the dropdown:
┌─ Tracking Model ──────┐
│ ○ SAMURAI │
│ ○ SAM2Long │
│ ○ SAM2.1 │
│ ● ByteTrack │
│ ○ BoT-SORT │
│ ○ YOLO11n-seg │
└──────────────────────┘
Set Confidence Threshold
Adjust the confidence slider to filter low-quality tracks:
- 0.9+: Only high-confidence tracks (fewer false positives, may miss objects)
- 0.7-0.9: Balanced (default, good for most cases)
- Below 0.7: Include uncertain tracks (more false positives, fewer misses)
Choose Frame Range
Select which frames to process:
- Full Video: Process all frames (slowest, most complete)
- Current Segment: Process from current frame to end
- Custom Range: Specify start and end frames (frames 100-500)
Enable Decimation (Optional)
Decimation reduces the number of keyframes by keeping only every Nth frame:
- Check "Enable Decimation"
- Set decimation interval (e.g., 5 means keep every 5th frame)
- System stores fewer keyframes, re-interpolates between them
Benefits:
- Smaller file size on export
- Faster import/export
- Easier to review
Trade-offs:
- Less precise control
- Interpolation may not match tracking exactly
Step 3: Run Tracking
- Click "Run Tracking" button
- A progress indicator appears
- The job runs in the background (model service processes the video)
- You can continue working while tracking runs
Tracking time varies by model and video length:
- YOLO11n-seg: ~1-2 seconds per 100 frames
- SAMURAI: ~5-10 seconds per 100 frames
- SAM2Long: ~15-30 seconds per 100 frames
Step 4: Review Tracking Candidates
When tracking completes, the Tracking Results Panel appears:
┌─ Tracking Results (SAMURAI) ────────────────────────────┐
│ Processed: 300 frames | Found: 5 tracks │
├─────────────────────────────────────────────────────────┤
│ Track #1 [Person] 95% confidence ✓ ✗ │
│ ▓▓▓▓▓▓▓▓░░░░░░▓▓▓▓▓▓ Frames: 1-85, 120-200 │
│ │
│ Track #2 [Car] 88% confidence ✓ ✗ │
│ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ Frames: 1-180 │
│ │
│ Track #3 [Person] 72% confidence ✓ ✗ │
│ ▓▓░░░░░░░░▓▓▓▓▓▓▓ Frames: 1-20, 85-150 │
└─────────────────────────────────────────────────────────┘
Track Row Elements
- Track ID: Unique identifier from tracking model
- Label: Detected object class
- Confidence: Average confidence across all frames
- Frame coverage bar: Visual representation of tracking (▓ = tracked, ░ = gap)
- Frame ranges: Text description of tracked segments
- ✓ button: Accept this track as an annotation
- ✗ button: Reject this track
Confidence Color Coding
- Green (above 90%): High confidence, likely accurate
- Yellow (70-90%): Medium confidence, review recommended
- Red (below 70%): Low confidence, likely needs refinement
Step 5: Preview Tracks
Before accepting or rejecting, preview each track:
- Click a track row to select it
- Video seeks to the track's first frame
- All bounding boxes for that track appear on the video
- Press Space to play through the track with boxes animating
- Look for:
- Box drifting away from object
- Box jumping between frames
- Incorrect object labeling
- Missing frames (gaps in tracking)
Preview Controls
- Space: Play/pause the video
- → / ←: Step through frames manually
- Y: Accept this track (quick keyboard shortcut)
- N: Reject this track (quick keyboard shortcut)
- Esc: Exit preview without deciding
Step 6: Accept or Reject Tracks
For each track, decide whether to keep it:
Accept Track (✓)
Click the ✓ button or press Y to accept a track. The system:
- Converts the track to a bounding box sequence
- Marks tracked frames as keyframes (or decimated frames if enabled)
- Adds interpolation segments between keyframes
- Preserves tracking metadata (source, confidence, track ID)
- Adds the annotation to your current persona
The annotation appears in the annotations list and on the video.
Reject Track (✗)
Click the ✗ button or press N to reject a track. The system:
- Removes the track from the candidates list
- Does not create an annotation
- Frees up UI space for remaining tracks
When to reject:
- Confidence is too low (below 60%)
- Track is clearly wrong (tracking background instead of object)
- Object is not relevant to your annotation goals
- Duplicate tracking (model detected same object twice)
Step 7: Refine Accepted Tracks
Once you accept a track, it becomes a regular bounding box sequence that you can edit:
Common Refinements
-
Add keyframes where tracking drifted:
- Seek to the frame where the box is off
- Press K to add a keyframe
- Adjust the box to match the object
-
Remove keyframes with errors:
- Seek to a bad keyframe
- Press Delete to remove it
- System re-interpolates that segment
-
Adjust interpolation mode:
- Press I to open interpolation mode selector
- Choose a mode appropriate for the object's motion pattern
-
Fix visibility ranges:
- Press V to mark where object truly leaves frame
- Remove tracking gaps that are model errors
-
Change the label:
- Click the annotation in the annotations list
- Edit the label or type assignment
Step 8: Batch Accept Workflow
For videos with many objects, use this workflow:
- Run tracking on the full video
- Sort tracks by confidence (highest first)
- Accept all tracks above 90% confidence without preview
- Preview tracks 70-90% confidence and accept good ones
- Reject all tracks below 70% confidence
- Manually annotate missed objects (tracking didn't detect them)
This hybrid approach balances speed and accuracy.
Hybrid Workflow: Tracking + Manual
The most efficient workflow combines both approaches:
1. Easy Objects: Automated Tracking
- Run tracking for objects with clear visibility and simple motion
- Accept high-confidence tracks (>90%)
- Minimal refinement needed
2. Hard Objects: Manual Annotation
- Manually annotate objects that:
- Move too quickly for tracking
- Are heavily occluded
- Have complex appearance changes
- Were missed by the tracking model
3. Medium Objects: Tracking + Refinement
- Accept medium-confidence tracks (70-90%)
- Add keyframes where tracking drifted (usually 2-5 keyframes)
- Adjust interpolation for smooth motion
Efficiency Metrics
Research on professional annotation workflows shows:
- Automated only: 70% accuracy, 100% speed
- Manual only: 100% accuracy, 10% speed
- Hybrid (recommended): 95% accuracy, 40% speed
The hybrid approach gives you production-quality results at 4x the speed of pure manual annotation.
Decimation Deep Dive
Decimation reduces storage and processing by keeping only every Nth frame as a keyframe.
How It Works
Without decimation:
Tracking outputs: Frame 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Keyframes stored: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 (10 keyframes)
With decimation (interval=5):
Tracking outputs: Frame 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Keyframes stored: 1, 5, 10 (3 keyframes)
Interpolation: 1→2→3→4→5→6→7→8→9→10 (linear between keyframes)
Choosing Decimation Interval
| Interval | Use Case | File Size | Accuracy |
|---|---|---|---|
| 1 | No decimation, store all frames | 100% | 100% |
| 3 | Small reduction, high fidelity | 33% | 98% |
| 5 | Balanced (recommended) | 20% | 95% |
| 10 | Large reduction, lower fidelity | 10% | 90% |
| 30 | Extreme reduction (30 fps = 1 keyframe/sec) | 3% | 80% |
For most cases, an interval of 5 frames provides good balance between file size and accuracy.
When to Avoid Decimation
Do not use decimation when:
- Object motion is erratic or non-linear
- Frame-perfect accuracy is required
- Export will be used for evaluation datasets
- Tracking is already sparse (few frames tracked)
Tracking Metadata
Accepted tracks preserve metadata for provenance:
{
trackId: "track-42",
trackingSource: "samurai",
trackingConfidence: 0.95,
perFrameConfidence: [0.98, 0.96, 0.94, ...],
decimationInterval: 5
}
This metadata allows you to:
- Re-run tracking if results are poor
- Filter annotations by tracking source
- Analyze confidence trends over time
- Audit which annotations came from automation vs manual work
Troubleshooting
Tracking Returns No Results
Problem: Tracking runs successfully but finds 0 tracks.
Solutions:
- Lower the confidence threshold to 0.5 or below
- Try a different tracking model (YOLO11n-seg is most permissive)
- Check that the object is actually visible in the selected frame range
- Verify the model service is running (http://localhost:8000/docs)
Tracking Drifts Off Object
Problem: Track starts correctly but drifts to background or wrong object.
Solutions:
- Accept the track anyway, then add keyframes where it drifts
- Try a different tracking model (SAMURAI designed for occlusion scenarios)
- Use shorter frame ranges (tracking accumulates error over time)
- Manually annotate the difficult segment
Duplicate Tracks for Same Object
Problem: Tracking model creates 2+ tracks for the same object.
Solutions:
- Accept the best track (highest confidence or longest duration)
- Reject the duplicate tracks
- Use ByteTrack or BoT-SORT models (designed for identity preservation)
Tracking Takes Too Long
Problem: Processing time is excessive for video length.
Solutions:
- Use a faster model (YOLO11n-seg or SAM2.1)
- Reduce frame range (track shorter segments)
- Enable decimation with larger interval (10 or 30)
- Check model service logs for errors slowing processing
Accepted Track Has Wrong Label
Problem: Track labeled as "person" but you want "player".
Solutions:
- Click the annotation in the annotations list
- Change the type assignment to the correct entity type
- Add a custom label in the annotation properties
- Labels from tracking are suggestions, not locked
Performance Tips
Optimize Frame Range
Instead of tracking the entire video:
- Identify segments where each object appears
- Run tracking on each segment separately
- Accept tracks per segment
- Combine or link annotations as needed
This approach is faster and reduces tracking drift.
Use Model Service Queue
If tracking multiple videos:
- Submit all tracking jobs at once
- Jobs queue in Redis (BullMQ)
- Monitor progress at http://localhost:3001/admin/queues
- Model service processes jobs in order
- Review results as each job completes
GPU vs CPU Performance
Tracking performance depends on hardware:
| Model | CPU (frames/sec) | GPU (frames/sec) | Speedup |
|---|---|---|---|
| YOLO11n-seg | 10-15 | 60-100 | 6x |
| SAMURAI | 2-5 | 15-25 | 7x |
| SAM2Long | 1-3 | 8-15 | 8x |
| ByteTrack | 5-10 | 30-50 | 6x |
For production deployments with many videos, GPU mode is recommended. See GPU Mode Deployment.
Next Steps
- Bounding Box Sequences Guide: Refine accepted tracks
- Export Annotations: Include tracking metadata in exports
- Model Service Configuration: Customize tracking models
- GPU Deployment: Speed up tracking with GPU acceleration