Kinetic
Real-Time Physical Movement Intelligence
A cognitive layer for understanding, improving, and preserving human physical capability. From skill coaching to fall detection to PT rehab — Kinetic perceives, reasons about, and enhances human movement in real-time.
4 Intelligence Modes
One platform, four ways to enhance human physical capability.
AI Skill Coach
Learn any movement from video, text, or AI-generated expert motion with real-time voice feedback
Physical Therapy
Guided rehab exercises with safety boundaries, ROM tracking, and progress monitoring
Autonomous Monitor
Goal-based spatial AI with fall detection, desk watch, and focus tracking → Telegram alerts
Hospital Monitor
Patient fall detection, inactivity alerts, and autonomous caregiver notifications with photos
Demo Video
Demo Video
Loom recording coming soon
6 Infrastructure Pillars
Each pillar represents a major technology powering the real-time coaching pipeline.
NVIDIA DGX Spark
Edge AI — GB10 Superchip
17-keypoint real-time pose estimation at 15+ FPS with <50ms latency. YOLOv8n-pose runs directly on the GB10 Superchip for instant skeleton extraction.
Modal + NVIDIA A100
Cloud GPU — HY-Motion 1.0-Lite
Tencent's SOTA text-to-3D motion model (Dec 2025, 0.46B params) deployed on A100 GPU. Generates 30-frame motion sequences from text in ~26 seconds.
Anthropic Claude Agent SDK
Multi-Agent Orchestration
Claude Sonnet 4 orchestrates 44 MCP tools across 12 categories. 3 sub-agents (Perception, Coach, Communicator) with hooks for safety and audit.
OpenAI Realtime API
Bidirectional Voice Coaching
GPT-4o Realtime API delivers natural voice coaching with 3-layer interruption handling. Context injection feeds live pose scores into voice.
Computer Vision Pipeline
MediaPipe + YOLOv8n-pose
33 body landmarks + 21 hand landmarks tracked in real-time. Phase detection identifies preparation → execution → peak → recovery automatically.
Triple-Metric Scoring
Gaussian + Cosine + COCO OKS
Three complementary metrics for robust pose evaluation: Gaussian joint angles (σ=15°), Cosine spatial similarity, and COCO OKS — the academic standard.
4-Tier AI Expert Generation
No expert video needed. Kinetic generates references through 4 tiers of increasing sophistication.
Canonical Templates
Instant10+ pre-built exercises with biomechanically accurate joint angles per phase
Claude Semantic Mapping
~0.5sNatural language → canonical exercise mapping via Claude
Claude Angle Generation
~1sNovel skill description → per-phase joint angles generated by Claude
HY-Motion on Modal A100
~26sSOTA text-to-3D motion diffusion generates full skeleton sequences
How It Works
→ Tier 1: Alias lookup... miss
→ Tier 2: Claude semantic mapping... no canonical match
→ Tier 3: Claude generates 4-phase joint angles (1.2s)
✓ Expert reference loaded (16 joints × 4 phases)
→ Camera: MediaPipe detects 33 landmarks at 30 FPS
→ Scoring: Triple-metric comparison every frame
→ Voice: "Great hip rotation! Extend your kicking leg more — aim for 160°"
→ Score: 87/100 (angles: 82, cosine: 91, OKS: 88)
10 AI Models
Live Modal A100 Endpoint
HY-Motion 1.0-Lite generating real 3D motion on NVIDIA A100 GPU.
-H "Content-Type: application/json" \
-d '{"prompt": "a person doing a squat", "num_frames": 30}'
"model": "HY-Motion 1.0-Lite (Tencent, SOTA Dec 2025)",
"device": "NVIDIA A100 (Modal)",
"generation_ms": 26525.3,
"num_frames": 30,
"raw_shape": [30, 52, 3],
"keypoints": [... 30 frames of MediaPipe 33-point keypoints ...]
}
Technical Paper
Our approach to AI-driven physical skill coaching.
Kinetic: A Cognitive Layer for Real-Time Physical Movement Intelligence
TreeHacks 2026, Stanford University, Stanford, CA
Abstract
We present Kinetic, the first real-time physical movement intelligence system that perceives, reasons about, and enhances human physical capability across multiple domains: skill coaching, physical therapy rehabilitation, autonomous fall detection, and spatial monitoring. Kinetic introduces a 4-tier expert generation pipeline, a triple-metric scoring engine (Gaussian joint angles + cosine spatial similarity + COCO OKS), voice-first coaching via GPT-4o Realtime, and autonomous monitoring with Telegram alerts. Edge AI on NVIDIA DGX Spark provides sub-50ms latency. 44 MCP tools orchestrated through the Anthropic Claude Agent SDK with 3 sub-agents.
1. Introduction & Motivation
AI has transformed cognitive work. Yet human physical capability remains unaugmented. Movement literacy, rehabilitation, injury prevention, and safety monitoring still depend on expensive human experts or crude apps. 1.7 billion people want to learn physical skills. 55 million elderly Americans need fall monitoring. Millions more need accessible PT rehab.
Kinetic is the first system to unify skill coaching, physical therapy, autonomous monitoring, and hospital safety into a single Physical Movement Intelligence platform — powered by the same CV + AI stack across all four modes.
2. System Architecture
Kinetic's architecture spans 6 infrastructure pillars: (1) NVIDIA DGX Spark for edge AI pose estimation, (2) Modal + NVIDIA A100 for cloud-based motion generation, (3) Anthropic Claude Agent SDK for multi-agent orchestration, (4) OpenAI GPT-4o Realtime API for voice coaching, (5) Google MediaPipe + Ultralytics YOLO for computer vision, and (6) a custom triple-metric scoring engine. Data flows from camera frames through the CV pipeline to the scoring engine, with Claude orchestrating the coaching logic and voice delivering corrections.
3. Triple-Metric Scoring Engine
Traditional pose scoring relies on a single metric (typically joint angle difference), which fails to capture spatial relationships and overall pose shape. We propose a triple-metric approach:
- Gaussian Joint Angles (40%): Each of 16 key joint angles is scored using a Gaussian function with σ=15°, providing smooth, interpretable per-joint scores.
- Cosine Spatial Similarity (30%): Normalized skeleton vectors are compared using cosine similarity, capturing overall pose shape independent of body proportions.
- COCO OKS (30%): Object Keypoint Similarity, the standard metric in academic pose estimation (used in COCO benchmark), provides a weighted evaluation based on joint importance and localization accuracy.
4. AI Expert Generation Pipeline
The key innovation is generating expert references from text alone. Our 4-tier pipeline provides graceful degradation: Tier 1 (canonical templates) serves instant responses for common exercises; Tier 2 (Claude semantic mapping) resolves aliases and variations; Tier 3 (Claude angle generation) handles novel skills through biomechanical reasoning; and Tier 4 (HY-Motion 1.0-Lite on A100) provides state-of-the-art 3D motion diffusion for the most complex cases. Each tier is attempted in order, falling through to the next only when necessary.
5. Autonomous Monitoring
In monitoring mode, Kinetic accepts a goal (fall detection, desk security, posture watch) and runs a continuous perception→reasoning→action loop. Falls are detected via activity classification with temporal smoothing. Claude evaluates alert triggers. Telegram delivers photo alerts to caregivers instantly. The bot is bidirectional: caregivers send /status, /goals, /photo commands.
6. Edge AI & Latency
Real-time coaching demands sub-100ms feedback latency. We achieve this through edge AI inference on the NVIDIA DGX Spark's GB10 Superchip, running YOLOv8n-pose for 17-keypoint estimation at 15+ FPS with <50ms end-to-end latency. This is 4x faster than cloud-based alternatives and enables the tight feedback loop required for movement correction during active exercise.
7. Conclusion
Kinetic demonstrates that the same CV + AI stack that coaches a squat can detect a fall, guide PT rehab, and monitor a hospital room. By unifying four intelligence modes into one platform, Kinetic brings to physical capability what AI has already brought to cognitive work. AI has transformed how we think. Kinetic transforms how we move.
AI has transformed cognitive work. Kinetic brings that transformation to physical capability.
Solo-built in 20 hours at TreeHacks 2026 🌲 Stanford University · 17,000+ lines · 10 AI models