Google Adds Conversational Image Segmentation to Gemini 2.5

Gemini 2.5 now supports natural-language image segmentation, letting developers query images with plain text prompts. The feature understands complex relationships, conditional logic, and multilingual queries, streamlining visual AI workflows without custom models. It is available through Google AI Studio and the Gemini API, targeting creative, compliance, and insurance use cases.

Georg S. Kuklick

•

July 21, 2025

Google has expanded Gemini 2.5 with a new conversational image segmentation feature, allowing developers to analyze images using natural language prompts. The update enables queries like “find the person holding the umbrella” or “highlight food that is vegetarian,” bypassing the need for specialized computer-vision pipelines. It also supports multilingual inputs, in-image text detection, and high-level reasoning like identifying abstract areas to clean up.

This addition positions Gemini as a more versatile tool for visual workflows. Developers in creative industries can simplify media editing tasks, while safety engineers can quickly validate compliance by querying visual scenes. Insurance companies can use it for more efficient damage assessments. Google recommends using the gemini-2.5-flash model with JSON mask outputs and adjusted compute settings for best performance.

By integrating this feature into a single API via Google AI Studio and the Gemini API, Google further blurs the line between text and vision applications. This move strengthens its positioning in the multi-modal AI market, offering a more accessible and flexible alternative to traditional vision models.

Alibaba’s Qwen3‑235B Instruct Model Gets Major Upgrade With Longer Context and Sharper Reasoning

Google’s Veo 3 Brings Sound-On Video Generation to Gemini API