Google’s Veo 3 Brings Sound-On Video Generation to Gemini API

The latest version of Google’s video model, Veo 3, is now available via the Gemini API and Vertex AI. It can generate short, cinematic videos with synchronized audio including dialogue and sound effects. This marks Google’s first generative model to combine visuals and sound, expanding its capabilities in AI-powered content creation.

Georg S. Kuklick

•

July 19, 2025

Veo 3 introduces native audio generation for AI videos, pushing beyond the silent clips of earlier models. It produces eight-second, 720p videos from text prompts, complete with realistic physics, expressive motion, and lip-synced speech. Developers can now access it in paid preview mode through the Gemini API and Google Cloud’s Vertex AI platform. Pricing starts at $0.75 per second for video with audio, with a more affordable “Veo 3 Fast” version expected soon.

The launch positions Google as a contender in multi-modal AI content generation, targeting creative developers, game studios, and production teams. Early partners like Cartwheel and Volley are using Veo 3 to build 3D animation tools and in-game cut-scenes. This move signals Google's effort to open new developer workflows for generative media while strengthening its ecosystem against competitors like OpenAI’s Sora and Runway.

Google Adds Conversational Image Segmentation to Gemini 2.5

OpenAI Researcher Confirms GPT‑5 Is Nearing Release, IMO Gold-Level Math Model Kept Under Wraps