Notes on designing for Vision Pro

Anatomy Vision is an app I built for viewing anatomical models using Apple Vision Pro. These are my notes for designing for Vision Pro apps that use RealityKit.

Lighting affects the interface!

A user might be using a Vision Pro in a dark room. SwiftUI components (that is, windows and UI widgets) are always rendered with some ambient lighting, but RealityKit objects don’t have ambient lighting by default. A user in a dark room might be simply unable to see your virtual object in the dark! So I had to add some ambient lighting to the pedestal that the body stands on.

Lighting conditions also seem to impact performance. Under complex lighting conditions, dragging the skeleton model around the room gets laggy, which I suspect it is due to lighting calculations on the physically-based material. (But more on performance below.)

3D drag intuition works in spherical, not Cartesian, coordinates

Moving the models around uses visionOS’ 3D drag gesture. (Actually, the DragGesture gives data in six degrees of freedom, since visionOS also provides pose data.)

I found that when people want to pull the body closer to them, they pinch and drag their hand directly towards their face — in spherical coordinates, they’re changing r but θ or φ. In Cartesian coordinates, this usually means pulling up towards their face, away from the ground, increasing the y value.

It turns out users pull directly towards their face, up from the ground, even when they think they’re keeping their hand at a constant y. So I just ignore the y value when applying the drag gesture to the body. (Maybe it would be cool to actually read the change in r and map that distance to the x-z plane, but ignoring the y value seems to work fine and it’s a one-liner.)

People are lazy: they don’t want to get up and move around

I thought the augmented reality experience of having an anatomical model in your living room would be incredible. You could walk around it, lean in, get up close and personal.

Despite my expectations, every test user I observed was sitting on a couch while using their Vision Pro. Instead of standing up and moving around the model for a different perspective, they asked me to add a rotation control.

Okay, fair, I can’t really blame them for this. But it is pretty funny.

Performance: touches, collisions, and convex hulls

The meshes for the skeleton and other models are very high-resolution, since they were created by scanning a body donated to science. As a result, performance is sometimes an issue — I mentioned that above with regards to lighting.

Since I wanted to support the interaction where reaching out and touching a part of the body would pop up its label, I had to create a convex hull for every touchable part. But creating a convex hull is very slow for high-resolution meshes with tens of thousands of vertices — calculating the convex hull for one scapula (shoulder blade) takes 20 seconds running on a Vision Pro!

I wanted to precompute the convex hulls, but visionOS doesn’t allow you to pass in arbitrary mesh data to use as a convex hull. It wants to generate the convex hulls at runtime, every time. Presumably this is so that visionOS can guarantee that the convex hulls actually are convex hulls and not just random meshes.

So Anatomy Vision actually ships with two copies of every model. One is the model that you see and interact with. The other is a decimated model, that is, a simplified mesh with up to three orders of magnitude fewer vertices. Using this decimated mesh to generate the convex hulls takes less than a tenth of a second.

I generated the decimated mesh with Blender’s Python scripting API, followed by manual tweaks on a few multifaceted bones that really can’t be accurately depicted with fewer vertices.

(Aside: I think this is the first time I have ever used the word “multifaceted” in the completely literal sense!)

Privacy limitations: viewer orientation, hover states

Apple, we are repeatedly told in Vision Pro developer tutorials, values privacy. So you can’t get pixel data from the cameras, nor can you access visionOS’ gaze tracking data.

You can mark an entity as having a hover state, but you don’t get hover events, nor can you customize the hover state (eg by adding a custom shader) — in fact, the hover state visual effect is applied by visionOS outside of your process!

You also can’t get the orientation of the user — not only do you not get gaze tracking, but you can’t even see which direction the user’s head is pointing, or where they’re standing in relation to your virtual objects. This ends up being surprisingly troublesome!

Concretely, in Anatomy Vision, when the user taps a body part to get a popup with its name, I’d like to show the popup oriented towards the user, such that the user can read them. Can’t do it!

You can anchor entities to the user’s head position (for a HUD-like display experience), but you can’t get the position of the head anchor (unlike other anchors, visionOS returns a bogus value for the head anchor position).

You do get an anchor position for the user position at the time they opened your Immersive View. And if you assume that your users are lazy and won’t move around much (see above), maybe it’s safe to just use that as an approximation (which is what I ended up doing). Still, it feels like a gap in visionOS and RealityKit.