Google’s Veo 3 Brings Sound-On Video Generation to Gemini API
The latest version of Google’s video model, Veo 3, is now available via the Gemini API and Vertex AI. It can generate short, cinematic videos with synchronized audio including dialogue and sound effects. This marks Google’s first generative model to combine visuals and sound, expanding its capabilities in AI-powered content creation.
Georg S. Kuklick
•
July 19, 2025
Veo 3 introduces native audio generation for AI videos, pushing beyond the silent clips of earlier models. It produces eight-second, 720p videos from text prompts, complete with realistic physics, expressive motion, and lip-synced speech. Developers can now access it in paid preview mode through the Gemini API and Google Cloud’s Vertex AI platform. Pricing starts at $0.75 per second for video with audio, with a more affordable “Veo 3 Fast” version expected soon.
The launch positions Google as a contender in multi-modal AI content generation, targeting creative developers, game studios, and production teams. Early partners like Cartwheel and Volley are using Veo 3 to build 3D animation tools and in-game cut-scenes. This move signals Google's effort to open new developer workflows for generative media while strengthening its ecosystem against competitors like OpenAI’s Sora and Runway.
Never miss an update!
Subscribe for news, curated content, and special offers.
By clicking Subscribe Now you're confirming that you agree with our Terms & Conditions.
Built with ♥️ in Berlin, New York, and Vienna.
© 2025 Neo Digital Magazines llc. All rights reserved.