- Full integration of Tensor ML SDK with LiteRT to optimize the deployment of AI models on Pixel devices.
- Access to a Model Garden with over 100 models optimized for the Google Tensor TPU.
- Advanced support for multimodal language and vision models through the Gemma 4 architecture.
- Ability to perform high-speed inferences and privacy thanks to the Tensor G5 hardware.
If you're into app development and artificial intelligence, you've probably noticed that the game is changing. Google has decided to open the doors to its most powerful hardware with the development software of the Tensor ML SDK, a tool that allows developers to get the most out of the tensor processing unit (TPU) of Pixel devices, making AI not dependent on the cloud but flying directly on the device.
The most exciting thing is that this SDK has left its experimental phase behind and entered Beta stageThis means it's no longer just for a select few; any programmer can start creating AI experiences that are interactive, private, and, above all, incredibly fast, leveraging Google's Tensor SoC architecture to run tasks that previously seemed impossible to do locally.
A unified workflow thanks to LiteRT
To prevent developers from going crazy with complex configurations, Google has integrated the Tensor SDK with LiteRTThis framework acts as an abstraction layer that eliminates the need to deal with vendor-specific SDKs or complicated compilers, offering a simplified and consistent API for deploying machine learning models at the edge.
The process is basically divided into three key stages. First there is the compilation of modelswhere you can transform your PyTorch projects or TFLite based on TensorFlow in binaries optimized using LiteRT Torch. Then comes the deployment, using Play Feature Delivery and so-called AI Packs to efficiently distribute the libraries and compiled models within the application.
Finally, we arrive at the execution of the inference. Thanks to the LiteRT RuntimeYou can get your model running on the TPU with just a few lines of code. Best of all, the system is smart: if for some reason the TPU isn't available, you can configure backup mechanisms so that the load is automatically transferred to the CPU or GPU, ensuring that the app never freezes.
The Model Garden: A catalog of possibilities
There's no need to start from scratch, as the Beta SDK includes a model garden Impressive. It's a library with over 100 models, both classic ML and generative AI, including versions of Gemma 3 1BIn addition to a huge number of pre-compiled models that you can download directly from the Hugging Face community on LiteRT.
If you're looking to create text functions, small language models like Function Gemma They allow you to execute local actions within the app, while EmbeddingGemma adds advanced semantic capabilities. On the visual side, the SDK allows you to implement object detection and depth mappingwhich is pure gold for camera applications that need to react to the user's environment in real time.
They haven't forgotten about audio and accessibility either. It's now possible to run the end-to-end voice recognitionThis guarantees transcriptions with extremely low latency and translation tools that work offline, maintaining data privacy since nothing leaves the phone.
Technical optimization and hardware support
To get the most out of it, it's essential to know which hardware is supported. Currently, the ecosystem is focused on the family. Pixel 10including the Pro, Pro XL and Pro Fold models, all of which are equipped with the SoC Google Tensor G5To make AI run smoothly, Google recommends using specific optimization flags during compilation.
For example, in the LiteRT Python flow, it is very common to use the flag google_tensor_truncation_type=”half” to adjust performance and resource usage. In the case of extensive language models (LLM), exporting requires detailed parameters, such as the configuration of quantization_recipe and the activation of support for large models through the AOT configuration dictionary.
It is important to mention that, although the NNAPI existed previously, it has become obsolete as of Android 15The strategy now is to pass everything through the LiteRT delegates, where support for the Pixel TPU has become central to replacing older implementations and gaining a foothold. Energy Efficiency higher.
The Gemma 4 revolution and Multimodal AI
Let's talk about the latest: the arrival of Gemma 4 12BUnlike other models that attach an image encoder to a language model, Gemma 4 processes images natively. This simplified architecture not only reduces VRAM consumption but also allows for... cross-modal reasoning much more fluid and coherent.
With a context window of 256K tokensThis model can handle long conversations with multiple images without losing track. Furthermore, being distributed under the license Apache 2.0It is extremely flexible for commercial use and redistribution, allowing multimodal AI to run on modern laptops using 4 or 8 bit quantizations.
Google's goal is clear: it wants developers to adopt its open weighting models to dominate the ecosystem. By making it easier for AI to be local and powerfulThey reduce dependence on external APIs and create a cohesive community around Tensor hardware and LiteRT software.
The development ecosystem for Pixel has taken a qualitative leap by uniting the hardware of G5 tensioner With the versatility of LiteRT and the power of models like Gemma 4, thanks to the transition to Beta of the SDK and the availability of a massive catalog of pre-optimized models, creating applications that process vision, speech, and language privately and ultra-fast is now an accessible reality for any programmer.


