According to Beating, Google has released Gemma 4 12B, a new model size in the Gemma 4 family designed to run multimodal AI agents locally on consumer laptops with 16GB of RAM. The 12B-parameter model uses an encoder-free multimodal architecture that supports text and image inputs, filling a performance gap between the smaller and larger models in the family lineup.
Google simultaneously upgraded its LiteRT-LM local inference tool with OpenAI API compatibility, allowing developers to connect tools like Continue, Aider, and Open WebUI directly to a locally-running Gemma 4 12B instance without relying on cloud-based models.