JoyAI VL Live
Disconnected
Dark
Settings
×
Layout
Main Content Order
Choose which element appears at the top
Camera → VLM Output Info
VLM Output Info → Camera
VLM Output on Camera View
Show text overlay directly on video feed
None
At the top
At the bottom
Visual Effects
Pop-in Animation
Scale animation when new VLM response arrives
Green Glow Effect
Border glow on new VLM response
Fade Effect
Gradually fade response after 2 seconds
Visual Style
Colorful UI Accents
Color-coded icons and input focus glows
WebRTC
Max Video Latency (seconds)
Drop old frames if delay exceeds this (0 = no intervention)
Audio Output
Speak VLM output
Play TTS audio for each visible response
Background Model
Enable delegation solver
Run Qwen3.5-122B-A10B-FP8 for delegated questions, visual reasoning, and chart tasks in the background
Frame multiplier
Background frames per second relative to foreground streaming FPS
Max background frames
Recent background frame cache cap; default and maximum are 100
Debug
Show request payload
Include request JSON (image + prompt) under the prompt area; collapsed by default
Show response payload
Include API response JSON under the VLM output; collapsed by default
Show memory state
Display mid-term and long-term memory content below VLM output
VLM API Configuration
▼
API Base URL
Ollama
http://localhost:11434/v1
vLLM
http://localhost:8000/v1
SGLang
http://localhost:30000/v1
OpenAI
https://api.openai.com/v1
NVIDIA API Catalog
https://integrate.api.nvidia.com/v1
Current VLM endpoint
API Key (Optional)
▼
Required for OpenAI and NVIDIA API Catalog, etc.
Model Selection
Loading models...
Video Source
▼
Webcam
RTSP Stream
BETA
Camera Selection
Detecting cameras...
Select camera device to use for VLM analysis
RTSP Stream URL
ℹ
Format:
rtsp://[user:pass@]ip:port/path
Examples:
• rtsp://192.168.1.100:554/stream
• rtsp://admin:password@192.168.1.100:554/h264Preview_01_main
Beta:
Tested with Reolink RLC-811A. Other cameras may work.
Learn more
Processing Interval
Seconds between each VLM inference (default: 1s)
Frames per Batch
Number of frames batched per inference (default: 1)
Start
后置
VLM Output
Ready
Mirror
VLM Output Info
Model:
--
Speaking:
Latency:
--
ms
Avg:
--
ms
Count:
--
Request payload (debug)
Markdown
Response payload (debug)
Mid-term memory
Long-term memory
请实时解说我的表情和动作
Describe my facial expressions and movements in real time
✓
结合画面内容,生成一张水彩风格的图像
Generate a watercolor-style image based on the scene
✓
现在我们玩一个快问快答的游戏,限时30秒,迅速回答问题,结束提醒我
Let's play a rapid-fire Q&A game for 30 seconds; answer quickly and remind me when it ends
✓
每当一个瓶子出现时,介绍它的样子
Whenever a bottle appears, describe what it looks like
✓
按住说话