Developing for the Apple Vision Pro
Our team has been developing virtual reality projects since 2016, so it’s safe to say we have our fair share of experience. When Apple launched the Apple Vision Pro, we wasted no time picking up the headset. Given the headset’s high cost and relatively recent arrival, few developers have delved deeply into creating for it. After several months of hands-on testing, we’ve gained a solid understanding of the benefits, limitations, and unique aspects of developing applications for the Vision Pro.
Our platform of choice for the past several years has been Meta Quest, utilizing Quest 2 and Quest 3 headsets, with Unity as our main engine. Apple collaborated with Unity to introduce the visionOS operating system for their new headset. Our background with Unity has helped us navigate these new tools, but there’s still much to learn. It’s clear visionOS is a fledgling platform with room to grow.
Types of Apple Vision Pro Applications
The Apple Vision Pro supports three types of applications:
An application can only belong to one of these types. This designation is set in the project settings and cannot be altered during runtime.
Nuances of development for Apple Vision Pro
The rest of this article delves into the specific challenges we encountered and lessons learned while adapting Top Tier K9’s virtual reality dog training experience for the Meta Quest to a mixed reality experience for the Apple Vision Pro. While these takeaways can apply generally, we suggest reading the project’s case study first.
Speech Recognition
In the dog training experience, users verbally give commands to a virtual dog, like “sit” and “stay”. We used a third-party voice recognition library that can function on the device (without connecting to the internet). However, this library does not work yet on the Apple Vision Pro. There are no alternatives available at the moment. While unfortunate, this is not unexpected. Newer operating systems often experience a lack of 3rd party support out of the gate.
To move forward, we pivoted to the native speech recognition library, which powers Siri. While this seemed like a promising solution, it came with its own set of challenges. First, this library only supports English, while the previous option accommodated multiple languages, limiting our user base.
Additionally, we ran into issues with how Unity handles system events, such as when the app is minimized or closed. This caused the native recognition to malfunction. We’re actively waiting for updates from Unity and Apple, and we hope to see a resolution soon.
Using Shaders
Shaders are microprograms that control how pixels on a screen are rendered. They are critical for creating realistic, immersive visuals. Shaders can be categorized into two groups: built-in (included as part of the engine) and custom (written or modified by a developer, in this case, us).
Custom shaders are further divided by how they are created: written in code or created using Shader Graph, a visual programming tool. All shaders compile into low-level, platform-specific code during the build process, which is entirely dependent on the engine and difficult to influence.
Custom shaders for the Apple Vision Pro must be created using Shader Graph. To comply with this requirement, we had to rewrite existing custom shaders. The migration to Shader Graph can be time-intensive and complex and scales with the number of custom shaders. A new build wouldn’t have this big an issue, but migrating an existing application requires a significant time investment.
As we worked on this project, we faced a few bumps. Some built-in shaders caused unexpected visual glitches. For instance, we had issues with how textures looked and trouble with transparent objects. Our digital dog’s fur didn’t turn out quite right, which was disappointing. Fixing these problems was tricky since they occurred during the compilation phase.
User Input
The Apple Vision Pro uses eye movement and hand gestures instead of controllers. For instance, a simple pinch while looking at an object acts like a “click”. This creates a sleek, futuristic experience yet presents significant challenges.
Unity uses a special Input Module for input control, which acts as a layer between the operating system’s input and our application. Unfortunately, the existing Unity Input Module has limited support for visionOS, so most standard functions need to be created from scratch. The lack of comprehensive documentation and open-source resources makes even basic tasks – like positioning a UI panel to face the user- require special commands.
Animations
Our project features an animated dog that the user interacts with and trains with basic commands. Modern engines, like Unity, have a system for optimizing characters (in this case our canine friend). When a character, like our animated dog, moves out of the camera’s view, the engine reduces animation quality to save CPU and GPU resources. When the character comes back into view, the animations resume as usual. This optimization is typically automatic.
However, Unity struggles to accurately determine if the character is in the user’s line of sight with the Apple Vision Pro’s unique camera setup. This can lead to animations appearing inconsistent, even when users are looking right at the dog. To fix this, we found that disabling the optimization works well. Since our project focuses on just one animated character, this solution is manageable. However, this limitation could pose a significant challenge in situations with multiple animated objects.
Sound
In 3D games, sounds are placed in a three-dimensional space. This allows users to pinpoint where sounds are coming from, perceive their distance, and experience the effects of different environments – like the echoes of a small room or the openness of a vast field. This spatial awareness becomes even more essential in virtual reality, where immersion is key to a captivating experience.
Unity’s Spatialized Sound System handles 3D audio, providing a foundation for creating these rich soundscapes. While some platforms, like Meta Quest, have specialized sound systems, Apple currently relies on Unity’s built-in system. Unfortunately, this has led to some challenges on Apple devices. Sounds can sometimes be misdirected – like hearing a sound from the left when it should come from the right.
We’re actively monitoring these issues and awaiting updates from Unity and Apple to enhance the audio experience.
Obtaining data about the real world
Creating a truly engaging mixed reality experience hinges on seamless interactions between the digital and physical worlds. To achieve this, our application needs to gather real-world data, like furniture and walls. For our virtual dog to sit on the ground, we must identify potential obstacles, such as tables and chairs.
The Apple Vision Pro uses cameras and lidar to scan and process environmental data. Unity obtains this information in real time, allowing us to incorporate it into our application. As the real world changes, the data updates continuously, helping us navigate around obstacles. However, while we can avoid these objects, we can’t interact with them, meaning our virtual dog can’t play with a ball that bounces off real walls.
Additionally, constant environmental data updates make it challenging to establish fixed locations for real-world objects. As a result, our virtual dog cannot navigate around furniture like tables.
Occlusion in Mixed Reality
In mixed-reality headsets, virtual objects display over the real-world image captured by the device’s camera, creating a blend of the digital and physical. Occlusion occurs when real objects block the view of digital ones, allowing for more realistic interactions.
Without occlusion systems, digital objects appear on top of real ones without regard for their actual positions. For example, our virtual dog might mistakenly show up floating on top of a real table, breaking the immersion.
Meta Quest uses the Depth API plugin, which utilizes a depth camera to display real objects in front of virtual ones properly. The Apple Vision Pro excels with the visual presentation of windowed applications but unfortunately does not function as well in full-size mixed-reality mode.
Unity for Apple Vision utilizes ARFoundation to generate planes based on camera data (such as the floor, table, and wall). However, the dynamic generation of planes requires constant recalculations, which impacts the placement of digital objects. For instance, if you attach a flag to a wall, and then walk closer, the scene recalculates and a portion of the flag may intersect with the wall.
Interface interactions
We built the original dog training application for Oculus Quest 3 with no intention of porting it to other platforms, so we used gestures built into Meta’s SDK to control the dog. When we decided to add support for the Apple Vision Pro, we had to redesign the control system.
We switched to a solution provided by Unity, which allowed us to integrate Apple’s hand control algorithm into our existing system. This change enabled us to create a more universal and flexible system that can work on different devices. As a result, we ensured that our application remains compatible with new technologies and devices while maintaining user-friendly controls.
Controllers
Controllers offer a convenient and precise way to interact with virtual objects. While many headsets support both controllers and hand gestures, the Apple Vision Pro focuses solely on head gestures.
Shifting from Oculus Quest controllers to touchless interactions with the Vision Pro required us to fundamentally rethink our interaction system. We developed a method for users to virtually “grab” objects using their hands, translating gestures into actions within the virtual space. This transformation led to a more intuitive and natural user experience, making it easier and more engaging for users to interact with their environment.
Documentation
Since the visionOS platform is still in its early stages, the documentation for its system and development tools is quite limited. This made the learning process a journey of trial and error, experimenting with existing applications, and reverse engineering.
Summarizing our experience with the Apple Vision Pro
The Apple Vision Pro is an exciting new device that challenges us to step outside our comfort zones and find innovative solutions to new obstacles. Its operating system is still evolving, demanding creative problem-solving, but it offers incredible potential for the future. We’re optimistic that, as time goes on, Unity and visionOS will refine their systems, and more third-party tools will emerge to enhance development.
Are you considering converting an existing application for the Apple Vision Pro or creating a new app for this groundbreaking headset? We’d love to hear from you!
Recommended for You
Check out related insights from the team
Get empowered, subscribe today
Receive industry insights, tips, and advice from Saritasa.