K
Kinetic.ai
AI Skill Coach — Full System Architecture
Edge + Cloud GPU · Multi-Agent AI · Real-Time Computer Vision · Voice Coaching
1
User Interface
📷Camera
30fps video capture🎤Microphone
PCM 16kHz audio input🔊Speaker
24kHz voice output🖥️Screen
Visual scores + feedback30fps video + 16kHz audio
2
Frontend
⚡Next.js 14
React + TailwindCSS + shadcn/uiVercelSSRTypeScript
📡WebSocket /ws/video
30fps camera stream → backendbase64 frames
📡WebSocket /ws/audio
Bidirectional voice streamPCM 16kHz
📡WebSocket /ws/coaching
Live scores + feedback displayJSON events
📊Score Ring + Joint Analysis + Rep Counter
Real-time visual coaching UIbase64 frames via WebSocket
3
Backend Intelligence — FastAPI
🖥️FastAPI Backend
Python 3.12 — 44 REST routes, 3 WebSocket endpointsTiangolouvicornasync
🔍 Computer Vision Pipeline
YOLO11n
Person detection · 5.4MB · 15 FPSUltralytics
MediaPipe Pose
33 body landmarks · 5.6MB · 30 FPSGoogle
MediaPipe Hands
21 hand landmarks per hand · 30 FPSGoogle
ByteTrack
Multi-person trackingByteDance
🎯 Triple-Metric Pose Scoring
Gaussian Kernel Scoring
16 joint angles scored individuallyCosine Spatial Similarity
Global body orientation matchCOCO OKS
Industry-standard keypoint similarityFinal Score
0.5 Gaussian + 0.3 Cosine + 0.2 OKSDTW alignmentphase detectrep count
✨ AI Expert Generation (4-Tier)
Tier 1
Semantic alias lookup · 53 aliases · 0msTier 2
Claude semantic mapping · 0.5sAnthropic
Tier 3
Claude angle generation · 1-2sAnthropic
Tier 4
DGX Spark + Modal A100 motion gen · 5-15sNVIDIA
🔁 Coaching Loop (every 10s)
Score Aggregation
Gather scores + reps + trend + correctionsPrompt Builder
Punchy coaching prompt · max 15 wordscoaching prompt + data
voice cues
generate motion (HTTP)
4
AI Services
🧠 Anthropic Claude — AI Brain
🧠Claude Sonnet 4
Main orchestrator — routes tasks to sub-agentsAnthropicAgent SDK
👁️Perception Sub-Agent
11 MCP tools — spatial analysis, pose checkMCP
🏋️Coach Sub-Agent
14 MCP tools — form comparison, qualityMCP
📈Progress Sub-Agent
10 MCP tools — goals, memory, plansMCP
🔧44 MCP Tools
Model Context Protocol — agent ↔ toolAnthropic
🛡️3 Agent Hooks
Safety guard · Audit log · Session summary🎙️ OpenAI — Voice AI
🎙️GPT-4o Realtime Preview
Bidirectional voice coaching · alloy voiceOpenAIPCM 16kHz in24kHz out
⚡3-Layer Interruption System
Layer 1: Server VAD
50ms speech detectionLayer 2: State Machine
No overlap — clean turn-takingLayer 3: Single Voice Source
Proactive + reactive coaching merged🔊TTS Fallback
Browser speechSynthesis backupgenerate squat motion → edge → cloud GPU
5
GPU Compute
⚡ NVIDIA DGX Spark — Edge AI
🟢GB10 Superchip
Grace ARM CPU (20 cores) + Blackwell GPUNVIDIA
YOLOv8n-pose
17-keypoint pose estimation on deviceUltralytics
POST /predict
Real-time pose from camera framesPOST /generate_motion
Proxies to Modal A100 cloud GPU⚡ Edge inference — low latency, on-premise
HTTP
3D skeleton
☁️ Modal + NVIDIA A100 — Cloud GPU
☁️NVIDIA A100
40-80GB VRAM · Serverless cloud GPUModal$530 credits
✨HY-Motion 1.0-Lite
SOTA text→3D motion · Dec 2025 · DiT + Flow MatchingTencent0.46B params1.84GB
Pipeline
Text prompt → SMPL 22-joint 3D → MediaPipe 33-point 2D☁️ Serverless — scales to zero, pay per use
Technology Stack — Companies & Models
AnthropicClaude Sonnet 4 · Agent SDK · 44 MCP Tools
OpenAIGPT-4o Realtime · Voice API
NVIDIADGX Spark GB10 · A100 GPU
ModalServerless A100 Cloud GPU
TencentHY-Motion 1.0-Lite (0.46B)
GoogleMediaPipe Pose + Hands
UltralyticsYOLO11n · YOLOv8n-pose
1
NVIDIA DGX Spark
Edge AI · GB10 Superchip
2
Modal + A100
Cloud GPU · HY-Motion 1.0
3
Anthropic Claude
3 Agents · 44 MCP Tools
4
OpenAI Realtime
GPT-4o Voice · 3-Layer
5
Google + Ultralytics
MediaPipe + YOLO CV
6
Tencent HY-Motion
0.46B · SOTA Motion Gen
Kinetic.ai Architecture v1.0 — AI Skill Coach with Edge + Cloud GPU