Apple Spatial-Computing Patents: A Week of AR Grants Mapped

Of the 41 U.S. patents issued to Apple in the week of May 5, 2026, a cluster reads as a connected stack for a head-worn computer — anchoring virtual objects to a room, placing a digital assistant in space, and reconstructing the physical world from a camera.

Apple reports its hardware almost entirely through segment lines and treats its unreleased product roadmap as a black box, so the most concrete public record of where its device engineering is pointed is often the patent docket. In the week of May 5 to May 11, 2026, the U.S. Patent and Trademark Office issued 41 patents to Apple Inc. A large share are cellular and wireless-protocol grants assigned to the company's modem and connectivity work, but a separate cluster stands apart: roughly a dozen grants whose claims describe the software and sensing layer of a head-worn spatial-computing device. Read together, they map not the glass or the chassis but the code that sits between a headset's cameras and what a wearer sees.

The most foundational of the group concern how a device anchors virtual content to a physical room. US12620178B1, "Supplementing depth information for anchoring," claims a method that obtains a representation of a three-dimensional environment made of planes defined in two-dimensional space, then anchors that representation and a computer-generated object to a physical anchor point using pose data that supplies the missing depth. It is a claim on making spatial anchoring work when the scene model itself lacks depth — a recurring constraint for any pass-through headset. US12620179B2, "Digital assistant object placement," reaches the interaction layer, claiming a method that initiates an assistant session in a computer-generated-reality environment by positioning a digital-assistant object at a location outside the currently displayed portion of the environment and signaling where it is. The assistant, in other words, becomes a thing placed in space rather than a voice from nowhere.

Reconstructing the world from images

A second group covers turning captured imagery into three-dimensional structure. US12620187B2 claims user interfaces and methods for generating a three-dimensional virtual representation of a physical object from a set of captured images, including generating a point cloud and a mesh reconstruction and showing capture progress to the user. US12620177B2 reaches farther, claiming a method that takes flat video content, identifies the actors and environmental elements in a scene and the spatial relationships among them, and synthesizes a synthesized-reality reconstruction of that scene by driving digital assets through extracted action sequences. US12620182B2, "Method and device for presenting an audio and synthesized reality experience," ties audio to space, displaying synthesized-reality content in association with an environment when temporal and environmental criteria are met, including content tied to a 3D point cloud of the room or to spoken words detected in it.

The representation of the 3D environment does not include z space (e.g., depth) information.— Supplementing depth information for anchoring, US12620178B1

Rendering, presence, and the interface

The third group concerns what the display does once the world is understood. US12620052B1, "Warping an image with a combination of warping functions," claims identifying the subset of pixels a user is currently focusing on and warping that region with one function based on device movement while warping the rest with a different function — a granted claim on focus-aware, motion-compensated rendering. US12620184B2 covers switching an application interface between an immersive mode, in which only that application's content is shown, and a non-immersive mode in which it shares the view with other content and a home menu. US12620298B2 claims feedback for adjusting device position on the wearer, and US12620155B2 covers representations of participants in real-time communication sessions, including self-view avatars and updates to a participant's avatar in a session — the presence layer for shared spatial experiences.

The classification data underlines the coherence. The cluster concentrates in G06T 19/006 and G06T 19/00 (augmented-reality and mixed-reality rendering), which appear across the anchoring, assistant-placement, and synthesized-reality grants, alongside G06F 3/013 (eye-tracking input), which Apple's own facet shows as its single most common subclass in the week's grants, and G06F 3/0346 (spatial pointer input) and G06T 2200/24 (rendering for head-mounted devices). The grants are consistently about a device that tracks the eye, understands the room, and composes content into it.

The split within the week's 41 grants is itself informative. The largest single block is wireless and cellular: a long run of grants — among them US12621850B2 on sidelink transmission coordination and US12621650B2 on multi-SIM device capability — reflects the company's long-running modem and connectivity engineering. The spatial-computing cluster is a distinct concentration sitting beside it, and the contrast matters for reading the week: these are not scattered AR mentions buried in unrelated devices but a connected set of grants whose claims reference the same computer-generated-reality environment, the same head-mounted display, and the same eye- and motion-tracking inputs. Counting the cluster rather than the whole week is the right unit, because it isolates the part of the portfolio that describes one product rather than the company's full engineering surface. The eye-tracking subclass G06F 3/013 appearing as the single most common classification across all 41 grants is the clearest quantitative tell that the head-worn work is a meaningful share of what issued.

It is worth stating precisely what these grants do and do not establish. Each is a claim on a specific method or interface, not on spatial computing as a category. The anchoring patent US12620178B1 covers a particular technique for working without depth data, not anchoring in general; the flat-video reconstruction in US12620177B2 describes one pipeline for driving digital assets from extracted action sequences. The value of reading them as a set is that they describe compatible layers of the same system — sensing, world modeling, assistant interaction, and rendering — rather than unrelated ideas. None of these grants is a shipping feature, and a single week's issuances say nothing about how Apple's broader portfolio compares to other companies building in this area. But the throughline is unmistakable: where last year's coverage in this category often centered on optics and chassis, this week's spatial-computing grants are concentrated on the software stack that turns a head-worn camera array into a device that places content in a room. For a reader tracking what Apple is converting into enforceable coverage, the week's record points to the layer above the lens.

Apple's Week of Grants Maps the Software Layer of a Spatial-Computing Device

Reconstructing the world from images

Rendering, presence, and the interface

Comments