3d developmentApple vision proAugmented realityImmersive technologyMixed realityVirtual reality

Insights

Developing for the Apple Vision Pro

Saritasa November 7, 2024

Our team has been developing virtual reality projects since 2016, so it’s safe to say we have our fair share of experience. When Apple launched the Apple Vision Pro, we wasted no time picking up the headset. Given the headset’s high cost and relatively recent arrival, few developers have delved deeply into creating for it. After several months of hands-on testing, we’ve gained a solid understanding of the benefits, limitations, and unique aspects of developing applications for the Vision Pro.

Our platform of choice for the past several years has been Meta Quest, utilizing Quest 2 and Quest 3 headsets, with Unity as our main engine. Apple collaborated with Unity to introduce the visionOS operating system for their new headset. Our background with Unity has helped us navigate these new tools, but there’s still much to learn. It’s clear visionOS is a fledgling platform with room to grow.

Types of Apple Vision Pro Applications

The Apple Vision Pro supports three types of applications:

Windowed: These applications run in a flat window and are essentially regular iOS applications.

Fully Immersive VR: This type constitutes a complete virtual reality application where the user only sees and interacts with a virtual world.

PolySpatial MR: This category refers to a mixed reality application where the user can see both the real world and virtual objects.

An application can only belong to one of these types. This designation is set in the project settings and cannot be altered during runtime.

Nuances of development for Apple Vision Pro

The rest of this article delves into the specific challenges we encountered and lessons learned while adapting Top Tier K9’s virtual reality dog training experience for the Meta Quest to a mixed reality experience for the Apple Vision Pro. While these takeaways can apply generally, we suggest reading the project’s case study first.

Speech Recognition

In the dog training experience, users verbally give commands to a virtual dog, like “sit” and “stay”. We used a third-party voice recognition library that can function on the device (without connecting to the internet). However, this library does not work yet on the Apple Vision Pro. There are no alternatives available at the moment. While unfortunate, this is not unexpected. Newer operating systems often experience a lack of 3rd party support out of the gate.

To move forward, we pivoted to the native speech recognition library, which powers Siri. While this seemed like a promising solution, it came with its own set of challenges. First, this library only supports English, while the previous option accommodated multiple languages, limiting our user base.

Additionally, we ran into issues with how Unity handles system events, such as when the app is minimized or closed. This caused the native recognition to malfunction. We’re actively waiting for updates from Unity and Apple, and we hope to see a resolution soon.

Using Shaders

Shaders are microprograms that control how pixels on a screen are rendered. They are critical for creating realistic, immersive visuals. Shaders can be categorized into two groups: built-in (included as part of the engine) and custom (written or modified by a developer, in this case, us).

Custom shaders are further divided by how they are created: written in code or created using Shader Graph, a visual programming tool. All shaders compile into low-level, platform-specific code during the build process, which is entirely dependent on the engine and difficult to influence.

Custom shaders for the Apple Vision Pro must be created using Shader Graph. To comply with this requirement, we had to rewrite existing custom shaders. The migration to Shader Graph can be time-intensive and complex and scales with the number of custom shaders. A new build wouldn’t have this big an issue, but migrating an existing application requires a significant time investment.

As we worked on this project, we faced a few bumps. Some built-in shaders caused unexpected visual glitches. For instance, we had issues with how textures looked and trouble with transparent objects. Our digital dog’s fur didn’t turn out quite right, which was disappointing. Fixing these problems was tricky since they occurred during the compilation phase.

User Input

The Apple Vision Pro uses eye movement and hand gestures instead of controllers. For instance, a simple pinch while looking at an object acts like a “click”. This creates a sleek, futuristic experience yet presents significant challenges.

Unity uses a special Input Module for input control, which acts as a layer between the operating system’s input and our application. Unfortunately, the existing Unity Input Module has limited support for visionOS, so most standard functions need to be created from scratch. The lack of comprehensive documentation and open-source resources makes even basic tasks – like positioning a UI panel to face the user- require special commands.

Animations

Our project features an animated dog that the user interacts with and trains with basic commands. Modern engines, like Unity, have a system for optimizing characters (in this case our canine friend). When a character, like our animated dog, moves out of the camera’s view, the engine reduces animation quality to save CPU and GPU resources. When the character comes back into view, the animations resume as usual. This optimization is typically automatic.

However, Unity struggles to accurately determine if the character is in the user’s line of sight with the Apple Vision Pro’s unique camera setup. This can lead to animations appearing inconsistent, even when users are looking right at the dog. To fix this, we found that disabling the optimization works well. Since our project focuses on just one animated character, this solution is manageable. However, this limitation could pose a significant challenge in situations with multiple animated objects.

Sound

In 3D games, sounds are placed in a three-dimensional space. This allows users to pinpoint where sounds are coming from, perceive their distance, and experience the effects of different environments – like the echoes of a small room or the openness of a vast field. This spatial awareness becomes even more essential in virtual reality, where immersion is key to a captivating experience.

Unity’s Spatialized Sound System handles 3D audio, providing a foundation for creating these rich soundscapes. While some platforms, like Meta Quest, have specialized sound systems, Apple currently relies on Unity’s built-in system. Unfortunately, this has led to some challenges on Apple devices. Sounds can sometimes be misdirected – like hearing a sound from the left when it should come from the right.

We’re actively monitoring these issues and awaiting updates from Unity and Apple to enhance the audio experience.

Obtaining data about the real world

Creating a truly engaging mixed reality experience hinges on seamless interactions between the digital and physical worlds. To achieve this, our application needs to gather real-world data, like furniture and walls. For our virtual dog to sit on the ground, we must identify potential obstacles, such as tables and chairs.

The Apple Vision Pro uses cameras and lidar to scan and process environmental data. Unity obtains this information in real time, allowing us to incorporate it into our application. As the real world changes, the data updates continuously, helping us navigate around obstacles. However, while we can avoid these objects, we can’t interact with them, meaning our virtual dog can’t play with a ball that bounces off real walls.

Additionally, constant environmental data updates make it challenging to establish fixed locations for real-world objects. As a result, our virtual dog cannot navigate around furniture like tables.

Occlusion in Mixed Reality

In mixed-reality headsets, virtual objects display over the real-world image captured by the device’s camera, creating a blend of the digital and physical. Occlusion occurs when real objects block the view of digital ones, allowing for more realistic interactions.

Without occlusion systems, digital objects appear on top of real ones without regard for their actual positions. For example, our virtual dog might mistakenly show up floating on top of a real table, breaking the immersion.

Meta Quest uses the Depth API plugin, which utilizes a depth camera to display real objects in front of virtual ones properly. The Apple Vision Pro excels with the visual presentation of windowed applications but unfortunately does not function as well in full-size mixed-reality mode.

Unity for Apple Vision utilizes ARFoundation to generate planes based on camera data (such as the floor, table, and wall). However, the dynamic generation of planes requires constant recalculations, which impacts the placement of digital objects. For instance, if you attach a flag to a wall, and then walk closer, the scene recalculates and a portion of the flag may intersect with the wall.

Interface interactions

We built the original dog training application for Oculus Quest 3 with no intention of porting it to other platforms, so we used gestures built into Meta’s SDK to control the dog. When we decided to add support for the Apple Vision Pro, we had to redesign the control system.

We switched to a solution provided by Unity, which allowed us to integrate Apple’s hand control algorithm into our existing system. This change enabled us to create a more universal and flexible system that can work on different devices. As a result, we ensured that our application remains compatible with new technologies and devices while maintaining user-friendly controls.

Controllers

Controllers offer a convenient and precise way to interact with virtual objects. While many headsets support both controllers and hand gestures, the Apple Vision Pro focuses solely on head gestures.

Shifting from Oculus Quest controllers to touchless interactions with the Vision Pro required us to fundamentally rethink our interaction system. We developed a method for users to virtually “grab” objects using their hands, translating gestures into actions within the virtual space. This transformation led to a more intuitive and natural user experience, making it easier and more engaging for users to interact with their environment.

Documentation

Since the visionOS platform is still in its early stages, the documentation for its system and development tools is quite limited. This made the learning process a journey of trial and error, experimenting with existing applications, and reverse engineering.

Summarizing our experience with the Apple Vision Pro

The Apple Vision Pro is an exciting new device that challenges us to step outside our comfort zones and find innovative solutions to new obstacles. Its operating system is still evolving, demanding creative problem-solving, but it offers incredible potential for the future. We’re optimistic that, as time goes on, Unity and visionOS will refine their systems, and more third-party tools will emerge to enhance development.

Are you considering converting an existing application for the Apple Vision Pro or creating a new app for this groundbreaking headset? We’d love to hear from you!

Contributors

Saritasa

Builders of Better

Saritasa believes in the transformative power of technology to empower businesses and drive innovation. Founded in 2005, the company boasts a dedicated team of over 200 talented professionals, including developers, designers, project managers, and business experts, who work collaboratively to deliver custom software solutions tailored to clients' unique needs. Articles written by Saritasa are a collaborative effort across various teams and departments for the most up-to-date, accurate insights.

Subscribe to our Insights

Recommended for You

Check out related insights from the team

immersive technology

virtual reality

5 min read

Introducing VR Foundations: The Fast-Track to Immersive Experiences

After years of conversations with companies across industries, we’ve identified two key challenges that prevent businesses from adopting virtual reality: timeline and cost.