Google LLM-Assistant Patents: A Week of Grants Maps the Stack

Most of the patents issued to Google in the week of April 14, 2026 sit on one system — a digital assistant built on large language models — covering how it splits work between a local and a remote model, talks to third-party LLMs, streams output, and reaches into other apps.

Google funds its AI work as a multi-year capital program, and the patents that issued to the company in the week of April 14 to April 20, 2026 show where some of that program lands at the level of an actual product feature. The U.S. Patent and Trademark Office granted Google 21 patents that week. Several are datacenter and silicon grants — an on-chip interconnect for memory-channel controllers, a self-repairable chip for silent data corruption, a method for training models to tolerate hardware error — but the largest consumer-facing cluster sits on one system: a digital assistant built on large language models, and the connective layers that make it work.

The most architecturally telling is US12602539B2, "Proactive assistance via a cascade of LLMs." It describes a digital assistant that uses a local large language model running on the user device to compute a confidence score for a remote-model prompt, and only when that confidence clears a threshold does it generate and send a prompt to a remote LLM, then present content based on the remote response. That is a granted claim on a specific way of splitting work between an on-device model and a cloud model — the routing decision that determines when a request stays local and when it leaves the device.

Reaching outward: other models and other apps

Two grants extend the assistant beyond Google's own models. US12603099B2, "Self-adjusting assistant LLMs enabling robust interaction with business LLMs," covers an assistant that selects one or more third-party 'business' LLMs to fulfill an action and, for each, accesses an adapter module to restructure the user's natural-language query into a prompt specifically formulated for that model. US12603092B2 reaches sideways into the device, describing an automated assistant that identifies synonymous terms in an application's interface and biases its speech processing toward them, so it can operate apps that were never pre-configured to interface with it. Together these claim the assistant as a hub that brokers between the user, outside models, and unmodified apps.

The method includes determining, using a local large language model (LLM) executing on the user device, a remote LLM prompt confidence. The method includes determining that the remote LLM prompt confidence satisfies a threshold.— Proactive assistance via a cascade of LLMS, US12602539B2

Latency, output, and multimodal input

A third strand covers how generated content reaches the user. US12602408B2 claims generating natural-language output from an LLM segment-by-segment and rendering an earlier segment while later ones are still being generated, to reduce the latency of evaluating the whole output before showing it. US12602424B2 covers proactively generating multimedia and dialog content through structured LLM queries and deciding when to issue an additional query to continue a stream. On the input side, US12602429B2 claims a multimodal search system that processes a video query's image frames into embeddings and combines them with the associated audio to return results. The cluster reaches from how a request enters the assistant to how its answer is composed and streamed back.

The classification data reinforces the read. The week's assistant grants concentrate in the language and speech classes — G10L 15/22 (voice control) appears across several, alongside G06F 40/40 and G06F 16/3329 on natural-language generation and retrieval. The coverage is consistently about the assistant's behavior and its interfaces, not the underlying model weights.

It is worth being precise about what each grant does and does not establish. The cascade patent claims a confidence-thresholded routing decision between a named local model and a remote one, not local-plus-cloud inference as a category. The business-LLM patent covers an adapter that reformats a query per target model, not interoperability in general. The streaming patent claims a particular segment-by-segment generate-and-render behavior, one latency technique rather than low-latency output broadly. Each is a granted claim on one implementation, and the value of reading them together is that they describe compatible pieces of a single assistant system: the routing layer, the brokering layer to outside models and apps, and the output layer that streams the result.

What the map shows

Twenty-one grants in a single week is a modest count, and several sit on datacenter silicon rather than the assistant. But the consumer-facing majority share an unusual coherence: they are not scattered features but adjacent layers of one system. Three of the assistant grants — the cascade router, the business-LLM broker, and a content-aware navigation patent, US12601607B2, which adjusts spoken navigation instructions around other media playing nearby — even share the same pair of named inventors, which underscores that this is a deliberately built-out area rather than incidental output. None of these grants is a shipping product, and a single week says nothing about how the full portfolio compares with rivals'; but the throughline — local-to-remote routing, third-party-model brokering, app control, and streamed output — is the same assistant seen from several sides, which is the shape of coverage accumulating around a system Google is converting from research into enforceable claims, one layer at a time.

Google's Week of Grants Maps the Plumbing of an LLM-Driven Assistant

Reaching outward: other models and other apps

Latency, output, and multimodal input

What the map shows

Comments