top of page

Vibe Coding XR: Accelerating AI + XR prototyping with XR Blocks and Gemini

Why XR Has Always Been Hard — Until Now

Building extended reality (XR) apps has historically meant choosing between two bad options: wrestle with Unity or Unreal for weeks to get something barely running, or stitch together fragmented perception pipelines, sensor SDKs, and WebXR primitives by hand. Neither path rewards the kind of rapid, exploratory prototyping that moves ideas forward.

That barrier is now cracking open. Vibe Coding XR — a rapid prototyping workflow published by Google Research in March 2026 — combines the open-source XR Blocks framework with Gemini's reasoning capabilities inside Gemini Canvas to turn plain-English prompts into fully interactive, physics-aware WebXR applications.


What Is XR Blocks?

XR Blocks is an open-source WebXR SDK built on top of three.js, TensorFlow, and Gemini. Its core mission is "minimum code from idea to reality." Instead of writing raw WebXR scene management, you work with high-level, composable modules that handle the hard parts — depth, physics, gestures, spatial UI — so you can focus on what your XR experience actually does.

Presented at ACM UIST 2025, XR Blocks was designed specifically to close the gap between AI research tooling (JAX, PyTorch, TensorFlow with mature benchmarks) and XR development, which has remained fragmented and high-friction by comparison.

Core Modules

  • USER

Hand tracking & gesture recognition — pinch, grasp, and custom gesture models wired to your scene objects automatically.

  • WORLD

Environmental perception — depth-aware physics, geometry occlusion, lighting estimation. Your virtual objects behave like they belong in the real space.

  • INTERFACE

Spatial UI — menus, labels, and HUD elements that anchor correctly in 3D space on both simulated desktop and real headsets.

  • AI

Gemini integration — embed Gemini Live, LiteRT on-device models, and TensorFlow Lite inference directly into your XR scene with minimal wiring.

  • AGENTS

Agentic behaviors — context-aware assistants and proactive suggestion engines that respond to user intent in spatial context.

The Vibe Coding XR Workflow

The workflow pairs the XR Blocks framework with a custom Gemini Gem (called XR Blocks Gem) loaded into Gemini Canvas. You describe what you want in natural language; Gemini translates that into structured XR Blocks code; and you get a deployable WebXR app that runs in both a desktop Chrome simulator and on Android XR headsets.

01 - Open Gemini Canvas with the XR Blocks Gem

Go to gemini.google.com, load the XR Blocks Gem, and select "Pro Mode" for best one-shot success rates.

02 - Write your prompt in plain English

Describe the spatial experience, interactions, objects, physics, and any AI behaviors you want. Be specific but natural — this is vibe coding, not boilerplate specification.

03 - Gemini generates XR Blocks code

Gemini maps your intent to XR Blocks modules — world perception, gesture bindings, spatial UI, and physics. The output is clean, readable JavaScript using the XR Blocks API.

04 - Preview in desktop simulator

Test your XR experience immediately in Chrome's WebXR emulator. No headset required for the initial iteration loop.

05 - Deploy to Android XR

The same code ships unchanged to Android XR headsets (Galaxy XR). No separate build pipeline; the abstraction layer handles platform differences.

// Pro Tip

Google's own team says: use Pro Mode for the highest reliability. It consistently outperforms other modes in one-shot success on the VCXR-60 benchmark dataset.

Real Prompt Examples (What Gets Built)

These four examples were all generated by Gemini via the Vibe Coding XR workflow — no hand-written XR code, no game engine setup:

📐

Math Tutor in XR

Euler's theorem visualized in 3D. Pinch to highlight vertices, edges, and faces across multiple geometry examples.

"Visualize Euler's theorem in geometry. Explain vertices, edges, and facets with highlighting using different examples."

⚖️

Physics Lab

Grab and drop labeled weights onto a balance scale. Real physics, real haptic feedback.

"Create an interactive physics experiment: use different objects with weights to balance a scale."

🏐

XR Volleyball

Textured volleyballs launched from a ring, colliding with both your hands and room geometry.

"Let me play volleyball with hands and collide with my environment. Volleyballs launched from a red ring, easy to bounce."

🦕

XR Dino Game

The Chrome dinosaur game rebuilt in mixed reality, voxelized in your space. Went from concept to running app in minutes.

"Create the Chrome Dino game in XR. Voxelized dino in front of user, cacti rushing in, add audio."

Which Gemini Model Should You Use?

Not all Gemini models are equal for XR Blocks code generation. Google evaluated multiple models against the VCXR-60 benchmark dataset in March 2026. Here's what the numbers say:

// VCXR-60 Benchmark · Gemini Preview Models · March 2026

Model

Mode

Success Rate

Avg. Gen Time

Best For

Gemini 2.5 Pro

Pro Mode

> 95%

~60–90s

Highest accuracy, complex prototypes

Gemini 2.5 Flash

Low Thinking

87.4%

~17s

Speed priority, rapid iteration

Gemini 2.5 Flash

Pro Mode

~91%

~35s

Good balance of speed + quality

// Choosing a Model

For complex, novel XR interactions — use Gemini 2.5 Pro in Pro Mode. The 95%+ one-shot success rate means far fewer debug-and-retry cycles. Use Flash when you need quick iteration on simpler prompts and 17-second turnaround matters.

Understanding the XR Blocks Architecture (For Developers)

XR Blocks uses a Reality Model — a set of high-level composable abstractions that sit between your prompt and the raw WebXR/three.js engine layer. Unlike a World Model trained end-to-end, the Reality Model gives you replaceable, auditable modules. This is what makes Gemini-generated code predictable and debuggable rather than opaque.

The central concept is the Script — the narrative and logical heart of any XR Blocks app. A Script wires together input events, AI calls, world state, and UI updates into a coherent experience loop. When Gemini generates XR Blocks code, it's really generating a Script that calls the right modules in the right order.

The architectural philosophy draws explicitly from Python's Zen: readability counts. Every XR Blocks API is designed to be understood at a glance, which is exactly what makes LLM code generation for it so reliable.

⚡ Quick Start: Your First XR Blocks App

Three paths to get hands-on immediately:

Option A — Vibe Coding (No code knowledge needed)

// 1. Go to gemini.google.com

// 2. Load the XR Blocks Gem

// 3. Type your prompt and hit enter

"Create a solar system in XR where I can

grab planets with my hands and see their

orbital data as floating labels."

Option B — Fork the GitHub repo

cd xrblocks

# Browse /templates and /samples

# Each folder is a standalone XR app

npm install && npm run dev

Option C — Start from an XR Blocks template

// Minimal XR Blocks scene

import { XRScene, World, User, AI } from 'xrblocks';

 

const scene = new XRScene();

const world = scene.addModule(new World({ physics: true }));

const user  = scene.addModule(new User({ hands: true }));

const ai    = scene.addModule(new AI({ model: 'gemini-2.5-flash' }));

 

user.on('pinch', (object) => ai.describe(object));

Where This Is Heading

The XR Blocks team has a clear roadmap: the current xrblocks.js web SDK is the first step. Future versions are planned to extend to native platforms via LLM-powered compilers — meaning the same prompt-to-XR pipeline that works in the browser today will eventually compile down to native Android XR and other hardware targets.

The larger vision is closing the virtuous cycle that exists in AI research but not yet in XR: a thriving ecosystem of reproducible demos, shared benchmarks, and community-iterated components. Every demo built with XR Blocks is meant to be reusable by others — turning individual prototypes into building blocks for the next one.

Google Research is explicitly inviting the HCI, AI, and XR communities to contribute to the XR Blocks ecosystem. The benchmark dataset (VCXR-60) and the Gemini Gem configuration are both available alongside the framework.

// The Core Shift

Vibe Coding XR marks a meaningful step toward spatial computing being limited not by technical expertise, but by creativity. The barrier isn't going away entirely — complex, production XR still needs engineers. But the prototyping gap between "I have an idea for an XR experience" and "I have a thing I can show someone" just got dramatically smaller.

 

Recent Posts

See All
bottom of page