K

Kinetic.ai

AI Skill Coach — Full System Architecture

Edge + Cloud GPU · Multi-Agent AI · Real-Time Computer Vision · Voice Coaching

1
User Interface
📷Camera
30fps video capture
🎤Microphone
PCM 16kHz audio input
🔊Speaker
24kHz voice output
🖥️Screen
Visual scores + feedback
30fps video + 16kHz audio
2
Frontend
Next.js 14
React + TailwindCSS + shadcn/ui
VercelSSRTypeScript
📡WebSocket /ws/video
30fps camera stream → backend
base64 frames
📡WebSocket /ws/audio
Bidirectional voice stream
PCM 16kHz
📡WebSocket /ws/coaching
Live scores + feedback display
JSON events
📊Score Ring + Joint Analysis + Rep Counter
Real-time visual coaching UI
base64 frames via WebSocket
3
Backend Intelligence — FastAPI
🖥️FastAPI Backend
Python 3.12 — 44 REST routes, 3 WebSocket endpoints
Tiangolouvicornasync
🔍 Computer Vision Pipeline
YOLO11n
Person detection · 5.4MB · 15 FPS
Ultralytics
MediaPipe Pose
33 body landmarks · 5.6MB · 30 FPS
Google
MediaPipe Hands
21 hand landmarks per hand · 30 FPS
Google
ByteTrack
Multi-person tracking
ByteDance
🎯 Triple-Metric Pose Scoring
Gaussian Kernel Scoring
16 joint angles scored individually
Cosine Spatial Similarity
Global body orientation match
COCO OKS
Industry-standard keypoint similarity
Final Score
0.5 Gaussian + 0.3 Cosine + 0.2 OKS
DTW alignmentphase detectrep count
✨ AI Expert Generation (4-Tier)
Tier 1
Semantic alias lookup · 53 aliases · 0ms
Tier 2
Claude semantic mapping · 0.5s
Anthropic
Tier 3
Claude angle generation · 1-2s
Anthropic
Tier 4
DGX Spark + Modal A100 motion gen · 5-15s
NVIDIA
🔁 Coaching Loop (every 10s)
Score Aggregation
Gather scores + reps + trend + corrections
Prompt Builder
Punchy coaching prompt · max 15 words
coaching prompt + data
voice cues
generate motion (HTTP)
4
AI Services
🧠 Anthropic Claude — AI Brain
🧠Claude Sonnet 4
Main orchestrator — routes tasks to sub-agents
AnthropicAgent SDK
👁️Perception Sub-Agent
11 MCP tools — spatial analysis, pose check
MCP
🏋️Coach Sub-Agent
14 MCP tools — form comparison, quality
MCP
📈Progress Sub-Agent
10 MCP tools — goals, memory, plans
MCP
🔧44 MCP Tools
Model Context Protocol — agent ↔ tool
Anthropic
🛡️3 Agent Hooks
Safety guard · Audit log · Session summary
🎙️ OpenAI — Voice AI
🎙️GPT-4o Realtime Preview
Bidirectional voice coaching · alloy voice
OpenAIPCM 16kHz in24kHz out
3-Layer Interruption System
Layer 1: Server VAD
50ms speech detection
Layer 2: State Machine
No overlap — clean turn-taking
Layer 3: Single Voice Source
Proactive + reactive coaching merged
🔊TTS Fallback
Browser speechSynthesis backup
generate squat motion → edge → cloud GPU
5
GPU Compute
⚡ NVIDIA DGX Spark — Edge AI
🟢GB10 Superchip
Grace ARM CPU (20 cores) + Blackwell GPU
NVIDIA
YOLOv8n-pose
17-keypoint pose estimation on device
Ultralytics
POST /predict
Real-time pose from camera frames
POST /generate_motion
Proxies to Modal A100 cloud GPU
⚡ Edge inference — low latency, on-premise
HTTP
3D skeleton
☁️ Modal + NVIDIA A100 — Cloud GPU
☁️NVIDIA A100
40-80GB VRAM · Serverless cloud GPU
Modal$530 credits
HY-Motion 1.0-Lite
SOTA text→3D motion · Dec 2025 · DiT + Flow Matching
Tencent0.46B params1.84GB
Pipeline
Text prompt → SMPL 22-joint 3D → MediaPipe 33-point 2D
☁️ Serverless — scales to zero, pay per use
Technology Stack — Companies & Models
AnthropicClaude Sonnet 4 · Agent SDK · 44 MCP Tools
OpenAIGPT-4o Realtime · Voice API
NVIDIADGX Spark GB10 · A100 GPU
ModalServerless A100 Cloud GPU
TencentHY-Motion 1.0-Lite (0.46B)
GoogleMediaPipe Pose + Hands
UltralyticsYOLO11n · YOLOv8n-pose
1
NVIDIA DGX Spark
Edge AI · GB10 Superchip
2
Modal + A100
Cloud GPU · HY-Motion 1.0
3
Anthropic Claude
3 Agents · 44 MCP Tools
4
OpenAI Realtime
GPT-4o Voice · 3-Layer
5
Google + Ultralytics
MediaPipe + YOLO CV
6
Tencent HY-Motion
0.46B · SOTA Motion Gen
Kinetic.ai Architecture v1.0 — AI Skill Coach with Edge + Cloud GPU