Vibe Coding XR: Accelerating AI + XR prototyping with XR Blocks and Gemini
- Eddie Avil
- 10 hours ago
- 5 min read
Why XR Has Always Been Hard — Until Now
Building extended reality (XR) apps has historically meant choosing between two bad options: wrestle with Unity or Unreal for weeks to get something barely running, or stitch together fragmented perception pipelines, sensor SDKs, and WebXR primitives by hand. Neither path rewards the kind of rapid, exploratory prototyping that moves ideas forward.
That barrier is now cracking open. Vibe Coding XR — a rapid prototyping workflow published by Google Research in March 2026 — combines the open-source XR Blocks framework with Gemini's reasoning capabilities inside Gemini Canvas to turn plain-English prompts into fully interactive, physics-aware WebXR applications.
What Is XR Blocks?
XR Blocks is an open-source WebXR SDK built on top of three.js, TensorFlow, and Gemini. Its core mission is "minimum code from idea to reality." Instead of writing raw WebXR scene management, you work with high-level, composable modules that handle the hard parts — depth, physics, gestures, spatial UI — so you can focus on what your XR experience actually does.
Presented at ACM UIST 2025, XR Blocks was designed specifically to close the gap between AI research tooling (JAX, PyTorch, TensorFlow with mature benchmarks) and XR development, which has remained fragmented and high-friction by comparison.
Core Modules
USER
Hand tracking & gesture recognition — pinch, grasp, and custom gesture models wired to your scene objects automatically.
WORLD
Environmental perception — depth-aware physics, geometry occlusion, lighting estimation. Your virtual objects behave like they belong in the real space.
INTERFACE
Spatial UI — menus, labels, and HUD elements that anchor correctly in 3D space on both simulated desktop and real headsets.
AI
Gemini integration — embed Gemini Live, LiteRT on-device models, and TensorFlow Lite inference directly into your XR scene with minimal wiring.
AGENTS
Agentic behaviors — context-aware assistants and proactive suggestion engines that respond to user intent in spatial context.
The Vibe Coding XR Workflow
The workflow pairs the XR Blocks framework with a custom Gemini Gem (called XR Blocks Gem) loaded into Gemini Canvas. You describe what you want in natural language; Gemini translates that into structured XR Blocks code; and you get a deployable WebXR app that runs in both a desktop Chrome simulator and on Android XR headsets.
01 - Open Gemini Canvas with the XR Blocks Gem
Go to gemini.google.com, load the XR Blocks Gem, and select "Pro Mode" for best one-shot success rates.
02 - Write your prompt in plain English
Describe the spatial experience, interactions, objects, physics, and any AI behaviors you want. Be specific but natural — this is vibe coding, not boilerplate specification.
03 - Gemini generates XR Blocks code
Gemini maps your intent to XR Blocks modules — world perception, gesture bindings, spatial UI, and physics. The output is clean, readable JavaScript using the XR Blocks API.
04 - Preview in desktop simulator
Test your XR experience immediately in Chrome's WebXR emulator. No headset required for the initial iteration loop.
05 - Deploy to Android XR
The same code ships unchanged to Android XR headsets (Galaxy XR). No separate build pipeline; the abstraction layer handles platform differences.
// Pro Tip
Google's own team says: use Pro Mode for the highest reliability. It consistently outperforms other modes in one-shot success on the VCXR-60 benchmark dataset.
Real Prompt Examples (What Gets Built)
These four examples were all generated by Gemini via the Vibe Coding XR workflow — no hand-written XR code, no game engine setup:
📐
Math Tutor in XR
Euler's theorem visualized in 3D. Pinch to highlight vertices, edges, and faces across multiple geometry examples.
"Visualize Euler's theorem in geometry. Explain vertices, edges, and facets with highlighting using different examples."
⚖️
Physics Lab
Grab and drop labeled weights onto a balance scale. Real physics, real haptic feedback.
"Create an interactive physics experiment: use different objects with weights to balance a scale."
🏐
XR Volleyball
Textured volleyballs launched from a ring, colliding with both your hands and room geometry.
"Let me play volleyball with hands and collide with my environment. Volleyballs launched from a red ring, easy to bounce."
🦕
XR Dino Game
The Chrome dinosaur game rebuilt in mixed reality, voxelized in your space. Went from concept to running app in minutes.
"Create the Chrome Dino game in XR. Voxelized dino in front of user, cacti rushing in, add audio."
Which Gemini Model Should You Use?
Not all Gemini models are equal for XR Blocks code generation. Google evaluated multiple models against the VCXR-60 benchmark dataset in March 2026. Here's what the numbers say:
// VCXR-60 Benchmark · Gemini Preview Models · March 2026
Model | Mode | Success Rate | Avg. Gen Time | Best For |
Gemini 2.5 Pro | Pro Mode | > 95% | ~60–90s | Highest accuracy, complex prototypes |
Gemini 2.5 Flash | Low Thinking | 87.4% | ~17s | Speed priority, rapid iteration |
Gemini 2.5 Flash | Pro Mode | ~91% | ~35s | Good balance of speed + quality |
// Choosing a Model
For complex, novel XR interactions — use Gemini 2.5 Pro in Pro Mode. The 95%+ one-shot success rate means far fewer debug-and-retry cycles. Use Flash when you need quick iteration on simpler prompts and 17-second turnaround matters.
Understanding the XR Blocks Architecture (For Developers)
XR Blocks uses a Reality Model — a set of high-level composable abstractions that sit between your prompt and the raw WebXR/three.js engine layer. Unlike a World Model trained end-to-end, the Reality Model gives you replaceable, auditable modules. This is what makes Gemini-generated code predictable and debuggable rather than opaque.
The central concept is the Script — the narrative and logical heart of any XR Blocks app. A Script wires together input events, AI calls, world state, and UI updates into a coherent experience loop. When Gemini generates XR Blocks code, it's really generating a Script that calls the right modules in the right order.
The architectural philosophy draws explicitly from Python's Zen: readability counts. Every XR Blocks API is designed to be understood at a glance, which is exactly what makes LLM code generation for it so reliable.
⚡ Quick Start: Your First XR Blocks App
Three paths to get hands-on immediately:
Option A — Vibe Coding (No code knowledge needed)
// 1. Go to gemini.google.com
// 2. Load the XR Blocks Gem
// 3. Type your prompt and hit enter
"Create a solar system in XR where I can
grab planets with my hands and see their
orbital data as floating labels."
Option B — Fork the GitHub repo
git clone https://github.com/xrblocks/xrblocks
cd xrblocks
# Browse /templates and /samples
# Each folder is a standalone XR app
npm install && npm run dev
Option C — Start from an XR Blocks template
// Minimal XR Blocks scene
import { XRScene, World, User, AI } from 'xrblocks';
const scene = new XRScene();
const world = scene.addModule(new World({ physics: true }));
const user = scene.addModule(new User({ hands: true }));
const ai = scene.addModule(new AI({ model: 'gemini-2.5-flash' }));
user.on('pinch', (object) => ai.describe(object));
Where This Is Heading
The XR Blocks team has a clear roadmap: the current xrblocks.js web SDK is the first step. Future versions are planned to extend to native platforms via LLM-powered compilers — meaning the same prompt-to-XR pipeline that works in the browser today will eventually compile down to native Android XR and other hardware targets.
The larger vision is closing the virtuous cycle that exists in AI research but not yet in XR: a thriving ecosystem of reproducible demos, shared benchmarks, and community-iterated components. Every demo built with XR Blocks is meant to be reusable by others — turning individual prototypes into building blocks for the next one.
Google Research is explicitly inviting the HCI, AI, and XR communities to contribute to the XR Blocks ecosystem. The benchmark dataset (VCXR-60) and the Gemini Gem configuration are both available alongside the framework.
// The Core Shift
Vibe Coding XR marks a meaningful step toward spatial computing being limited not by technical expertise, but by creativity. The barrier isn't going away entirely — complex, production XR still needs engineers. But the prototyping gap between "I have an idea for an XR experience" and "I have a thing I can show someone" just got dramatically smaller.

