Gate News message, April 23 — Google researchers, including He Kaiming and Xie Saining, published a paper introducing Vision Banana, a general-purpose vision understanding model created through lightweight instruction fine-tuning of the company’s Nano Banana Pro (Gemini 3 Pro Image) image generation model. The key innovation unifies outputs of all vision tasks as RGB images, enabling segmentation, depth estimation, and surface normal prediction through image generation without task-specific architectures or loss functions.
In semantic segmentation, Vision Banana outperformed the specialized model SAM 3 by 4.7 percentage points on Cityscapes; in referring expression segmentation, it surpassed SAM 3 Agent. However, it lagged behind SAM 3 in instance segmentation. For 3D tasks, metric depth estimation achieved 0.929 average accuracy across four standard datasets, exceeding Depth Anything V3’s 0.918, using only synthetic data without real depth information or camera parameters at inference. Surface normal estimation achieved state-of-the-art results on three indoor benchmarks.
Fine-tuning involved minimal vision task data mixed into original image generation training, preserving the model’s generation capabilities—performance matched the original Nano Banana Pro in generation quality tests. The paper proposes that image generation pretraining in vision parallels text generation pretraining in language: models learn the internal representations needed for image understanding during generation, with instruction fine-tuning merely releasing this capability.
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to
Disclaimer.
Related Articles
OpenAI Launches MRC Network Protocol With AMD, Intel, NVIDIA; Supports 100,000+ GPUs
According to OpenAI's announcement on May 6, the company partnered with AMD, Broadcom, Intel, Microsoft, and NVIDIA to launch Multipath Reliable Connection (MRC), an open network protocol for large-scale AI training cluster GPU interconnection. The protocol splits single data transmissions across
GateNews11m ago
Hut 8 Shares Soar 30% Pre-Market After Signing $9.8B AI Data Center Lease Deal
According to The Block, Hut 8 Corp.'s shares jumped over 30% in pre-market trading after the company signed a $9.8 billion lease agreement for an artificial intelligence data center campus in Nueces County, Texas, designed to NVIDIA's compute architecture. The unnamed tenant will use the
GateNews30m ago
CleanSpark CTO: AI/HPC Infrastructure Requires More Network Resources Than Bitcoin Mining
According to a CoinDesk interview, CleanSpark Chief Technology Officer Taylor Monnig stated that transitioning from Bitcoin mining to AI/HPC infrastructure requires more redundancy and less improvisation. "A single rack's network fiber exceeds that of an entire Bitcoin mining facility," Monnig
GateNews35m ago
Public Acquires AI Investment Platform Treasury App
According to Foresight News, investment app Public announced the acquisition of AI-powered investment platform Treasury App on May 6. The acquisition amount was not disclosed. The deal aims to strengthen Public's AI-driven brokerage operations, which currently supports stocks, bonds, and
GateNews1h ago
MiroMind Halts MiroThinker Service in Greater China Starting May 12
According to BlockBeats, MiroMind, the AI research company founded by Shanda Group founder Chen Tianqiao, will suspend its MiroThinker service (web and mobile app versions) in mainland China, Hong Kong, and Macau starting May 12, 2026. The suspension date and recovery timeline were announced via
GateNews1h ago
ChatGPT launches Excel and Google Sheets: GPT-5.5 logs in directly to spreadsheets, with a three-way showdown between Copilot and Gemini
OpenAI has launched ChatGPT for Excel and ChatGPT for Google Sheets plugins, powered by GPT-5.5, with the core focus on explaining as it works. The features cover analysis, automatically writing formulas, updating spreadsheets, and step-by-step explanations of the reasoning process, enabling users to handle and understand tasks directly within their spreadsheets. It expands the three-way competition with Copilot and Gemini, signaling a new landscape for enterprise productivity AI. Taiwan users need to install via AppSource/Workspace Marketplace, and should also consider data privacy and whether ChatGPT Plus is required.
ChainNewsAbmedia2h ago